Use Kubeflow Pipelines
Kubeflow Pipelines (KFP) is a platform for building and deploying portable, scalable machine learning (ML) workflows based on Docker containers. The KFP SDK allows you to define and manipulate pipelines and components using Python.
TOC
PrerequisitesInstall KFP SDKConfigure KFP to Run with your Object StorageQuick Start ExampleManage Pipelines in the UIAccess the Pipelines DashboardUpload a PipelineCreate a RunInspect Run DetailsRecurring RunsPrerequisites
Install KFP SDK
Start a Jupyter Notebook (or Workbench) in your namespace and install the KFP SDK:
Configure KFP to Run with your Object Storage
When you installed Kubeflow with an external S3/MinIO storage service, you need to add a "KFP Launcher" configmap to setup storage used by current namespace or user. You can checkout Kubeflow document https://www.kubeflow.org/docs/components/pipelines/operator-guides/configure-object-store/#s3-and-s3-compatible-provider for more details. If no configuation is set, the pipeline runs may still accessing the default service address like "minio-service.kubeflow:9000 " which may not be correct.
Below is a simple sample for you to start:
For example, you should setup below values in this configmap to point to your own S3/MinIO storage
defaultPipelineRoot: where to store the pipeline intermediate data endpoint: s3/MinIO service endpoint. Note, should NOT start with "http" or "https" disableSSL: whether disable "https" access to the endpoint region: s3 region. If using MinIO, any value will be fine credentials: AK/SK in the secrets
After add this configmap, the newly started Kubeflow Pipeline Runs will automatically read this configration, and save stuff that is used by Kubeflow Pipeline.
Quick Start Example
A pipeline is a description of an ML workflow, including all of the components in the workflow and how they combine in the form of a graph.
Below is a simple example of defining a pipeline that prints "Hello, World!" using the KFP SDK.
For more details about how to define and run pipelines, please refer to the official KFP documentation: https://www.kubeflow.org/docs/components/pipelines/user-guides/
Manage Pipelines in the UI
You can also manage pipelines, experiments, and runs directly from the Kubeflow Dashboard.
Access the Pipelines Dashboard
- Log in to the Kubeflow central dashboard.
- Click Pipelines in the sidebar menu.
Upload a Pipeline
If you have compiled your pipeline to a YAML file (e.g., pipeline.yaml from the example above), you can upload it:
- Click Pipelines -> Upload Pipeline.
- Upload a file: Select your
pipeline.yaml. - Pipeline Name: Give it a name (e.g.,
Hello World Pipeline). - Click Create.
Create a Run
To execute the pipeline you just uploaded:
- Click on the pipeline name to open its details.
- Click Create Run.
- Run Name: Enter a descriptive name.
- Experiment: Select an existing experiment or create a new one. Experiments help group related runs.
- Run Parameters: Enter values for any pipeline arguments (e.g.,
recipient:World). - Click Start.
Inspect Run Details
Once the run starts, you will be redirected to the Run Details page.
- Graph: Visualize the steps (components) of your pipeline and their status (Running, Succeeded, Failed).
- Logs: Click on a specific step in the graph to view its container logs in the side panel. This is crucial for debugging.
- Inputs/Outputs: View the artifacts passed between steps or produced as final outputs.
- Visualizations: If your pipeline generates metrics or plots, they will appear in the Run Output or Visualizations tab.
Recurring Runs
You can schedule pipelines to run automatically at specific intervals:
- In the Pipelines list, identify your pipeline.
- Click Create Run but choose Recurring Run as the run type (or navigate to Experiments (KFP) -> Create Recurring Run).
- Trigger: Set the schedule (e.g., Periodic, Cron).
- Parameters: Configure the inputs that will be used for every scheduled execution.
- Click Start.