Model Storage
To deploy your model, you must first store it in a storage type supported by Alauda AI. Supported storage types include:
- S3 Object Storage: The most commonly used mode. It downloads data before the main container starts via a Storage Initializer (InitContainer).
- Persistent Volume Claim (PVC): Mounts data stored on a persistent volume before the main container starts via a Storage Initializer.
- Open Container Initiative (OCI) containers: Also known as modelcars in KServe. This approach achieves second-level loading using the container runtime's layered caching capability via a Sidecar.
TOC
Using S3 Object Storage for model storageAuthentication ConfigurationS3 Key Configuration ParametersDeploy Inference ServiceUsing OCI containers for model storageUsing PVC for model storageUploading model files to a PVCPrerequisitesProcedureVerificationNext StepsDeploy Inference ServiceUsing S3 Object Storage for model storage
This is the most commonly used mode. It implements credential management through a Secret annotated with specific S3 configuration parameters.
Authentication Configuration
It is recommended to create separate ServiceAccount and Secret for each project.
S3 Key Configuration Parameters
- Replace
YOUR_BASE64_ENCODED_ACCESS_KEYwith your actual Base64-encoded AWS access key ID. - Replace
YOUR_BASE64_ENCODED_SECRET_KEYwith your actual Base64-encoded AWS secret access key. - Replace
your_s3_service_ip:your_s3_portwith the actual IP address and port of your S3 service. - Set
serving.kserve.io/s3-usehttpsto "1" if your S3 service uses HTTPS, or "0" if it uses HTTP.
Deploy Inference Service
- Replace
Qwen2.5-0.5B-Instructwith your actual model name. aml.cpaas.io/runtime-type: vllmspecifies the code runtime type. For more information about custom inference runtimes, see Extend Inference Runtimes.- Replace
aml-vllm-0.11.2-cpuwith the runtime name that is already installed in your platform (corresponding to a ClusterServingRuntime CRD instance). storageUri: s3://models/Qwen2.5-0.5B-Instructspecifies the S3 bucket URI where the model is stored.serviceAccountName: sa-modelsspecifies the service account with permissions to access the S3 credentials secret.
Using OCI containers for model storage
As an alternative to storing a model in an S3 bucket or PVC, you can store models in Open Container Initiative (OCI) containers. Deploying models from OCI containers is also known as modelcars in KServe. This approach is ideal for offline environments and enterprise internal registries such as Quay or Harbor.
For detailed instructions on packaging and deploying models using OCI containers, see Using KServe Modelcar for Model Storage.
Using PVC for model storage
Uploading model files to a PVC
When deploying a model, you can serve it from a preexisting Persistent Volume Claim (PVC) where your model files are stored. You can upload your local model files to a PVC in the IDE that you access from a running workbench.
Prerequisites
-
You have access to the Alauda AI dashboard.
-
You have access to a project that has a running workbench.
-
You have created a persistent volume claim (PVC).
-
The workbench is attached to the persistent volume (PVC).
For instructions on creating a workbench and attaching a PVC, see Create Workbench.
-
You have the model files saved on your local machine.
Procedure
Follow these steps to upload your model files to the PVC within your workbench:
-
From the Alauda AI dashboard, click Workbench to enter the workbench list page.
-
Find your running workbench instance and click the Connect button to enter the workbench.
-
In your workbench IDE, navigate to the file browser:
- In JupyterLab, this is the Files tab in the left sidebar.
- In code-server, this is the Explorer view in the left sidebar.
-
In the file browser, navigate to the home directory. This directory represents the root of your attached PVC.
Note Any files or folders that you create or upload to this folder persist in the PVC.
-
Optional: Create a new folder to organize your models:
- In the file browser, right-click within the home directory and select New Folder.
- Name the folder (for example, models).
- Double-click the new models folder to enter it.
-
Upload your model files to the current folder:
- Using JupyterLab:
- Click the Upload button in the file browser toolbar.
- In the file selection dialog, navigate to and select the model files from your local computer. Click Open.
- Wait for the upload to complete.
- Using code-server:
- Drag the model files directly from your local file explorer and drop them into the file browser pane in the target folder within code-server.
- Wait for the upload process to complete.
- Using JupyterLab:
Verification
Confirm that your files appear in the file browser at the path where you uploaded them.
Next Steps
When deploying a model from a PVC, set the storageUri in the format pvc://<pvc-name>/<optional-path>. For example:
pvc://model-pvc— loads from the root of the PVC.pvc://model-pvc/models/Qwen2.5-0.5B-Instruct— loads from a specific subdirectory.
Deploy Inference Service
- Replace
Qwen2.5-0.5B-Instructwith your actual model name. aml.cpaas.io/runtime-type: vllmspecifies the code runtime type. For more information about custom inference runtimes, see Extend Inference Runtimes.- Replace
aml-vllm-0.11.2-cpuwith the runtime name that is already installed in your platform (corresponding to a ClusterServingRuntime CRD instance). storageUri: pvc://model-pvcspecifies the PVC name where the model is stored.