MLflow with Cudo Compute
MLflow is a platform to streamline machine learning development, including tracking experiments, packaging code into reproducible runs, and sharing and deploying models. MLflow can be used with many popular ML frameworks including:
- Pytorch MLflow can track your experimental runs to create a repeatable auditable registry of models.
Quick start guide
- MLflow UI server
- MLflow runner for training ML models
In this deployment of MLflow we will set up one Cudo Compute VM to serve the MLflow UI/Web app and store models and metrics from runs. We will then use a second Cudo Compute VM to perform training, you can run as many of these as you like concurrently. They only need to run for the duration of training.
Optionally you can use your local machine to run the web app if you are able to configure your network so that you have a port publicly accessible.
MLflow UI server
Start a VM on Cudo Compute, this can be CPU only no GPU. Use the
Ubuntu Minimal 20.04 image. This VM should remain
running for the duration of your work. Pick something with 8GB RAM or more.
Get the IP address of the VM. Enter replace the address in
tracking_ip below with the IP address of the VM and then
run the commands below.
tracking_ip=xx.xx.xx.xx \ tracking_port=5000 \ ssh -o "StrictHostKeyChecking no" [email protected]$tracking_ip << EOF apt-get update apt-get install lsof DEBIAN_FRONTEND=noninteractive apt-get install python3.10 python3-pip -y which python pip install click==8.0 'urllib3<=1.25' pip install mlflow kill $(lsof -t -i:$tracking_port) mlflow server --host $tracking_ip --port $tracking_port --backend-store-uri sqlite:///mlruns.db --default-artifact-root ./mlruns & EOF
All of your data is stored in
~/mlruns directory and
MLflow UI server on a local machine Make sure port 5000 of your local machine is publicly accessible.
conda create mlflow_env conda activate mlflow_env conda install -c conda-forge mlflow -y mlflow server --host PUBLIC_IP_ADDRESS --port 5000
MLflow runner for training ML models
Start another VM on Cudo Compute, this can be CPU only or a GPU machine. Use
Ubuntu 22.04 + Nvidia drivers + Docker image.
The script below pulls a docker container for MLflow, then MLflow pulls a GitHub repository and runs it. The GitHub
repository is configured with MLflow projects. So when MLflow runs it creates a conda environment and installs the
necessary python packages. Then it runs the model training.
The training script logs its output to the
Get the IP address from your Cudo Copmute VM that is used for training and replace
runner_ip with it
Get the IP address from your Cudo Copmute VM that is used for the MLFlow UI and replace
tracking_ip with it
tracking_ip=xx.xx.xx.xx \ tracking_port=5000 \ runner_ip=yy.yy.yy.yy \ ssh -o "StrictHostKeyChecking no" [email protected]$runner_ip << EOF docker run --rm -e MLFLOW_TRACKING_URI=http://$tracking_ip:$tracking_port \ cudoventures/mlflow-runner \ mlflow run https://github.com/mlflow/mlflow-example.git -P alpha=5.0 EOF
tracking_ip=xx.xx.xx.xx \ tracking_port=5000 \ runner_ip=yy.yy.yy.yy \ ssh -o "StrictHostKeyChecking no" [email protected]$runner_ip << EOF docker run --gpus all --rm -e MLFLOW_TRACKING_URI=http://$tracking_ip:$tracking_port \ cudoventures/mlflow-runner \ mlflow run https://github.com/mlflow/mlflow-example.git -P alpha=5.0 EOF
Go to http://tracking_ip:5000 to see the MLflow UI, you should be able to see your training results.
Want to learn more?
You can learn more about using MLflow on Cudo Compute by contacting us. Or you can just get started right away!