Introduction
Have you ever lost track of your ML experiments? Or maybe you’ve lost the hyperparameters and metrics from your best model? You’re not alone. Managing machine learning experiments can be challenging, especially when you’re working on multiple projects or collaborating with a team. That’s where MLflow comes in.
MLflow is an open-source platform for managing the end-to-end machine learning lifecycle. MLflow comes with four main features:
- Tracking: Record and query experiments to track metrics and parameters. It’s like Git for your ML code, so you can easily compare and reproduce your experiments.
- Projects: Package code into reproducible runs. It’s like Docker for your ML code, so you can easily share and run your projects.
- Models: Manage and deploy models from a variety of ML libraries. It’s like Docker Hub for your ML models, so you can easily deploy and serve your models.
- Registry: Store, annotate, discover, and manage models in a central repository. It’s like Docker Registry for your ML models, so you can easily version and share your models.
In this article, we’ll deploy MLflow with Docker to track and manage your ML experiments. But hey, why with Docker? Isn’t it easier to install MLflow with pip install mlflow
? Well, deploying MLflow with Docker has several advantages:
- Isolation: You can run MLflow in a container without worrying about dependencies or conflicts with your local environment.
- Portability: You can easily share your MLflow environment with your team or deploy it to the cloud.
- Reproducibility: You can run the same MLflow environment on different machines without worrying about compatibility issues.
So, let’s get started!
TL;DR
We provide a GitHub repository with all the code and configurations needed to deploy MLflow with Docker. Please check the ruhyadi/mlflow-docker repository for more details.
Prerequisites
Before we begin, make sure you have the following prerequisites:
- Docker: Install Docker on your machine by following the instructions on the official Docker website.
- MLflow: Familiarize yourself with MLflow by reading the official MLflow documentation.
- Python: Basic knowledge of Python programming language. We’ll use Python to create MLflow experiments.
Setting Up The Services
We’ll use Docker Compose to set up the MLflow, Minio, and PostgreSQL services. Minio is an open-source object storage server compatible with Amazon S3. We’ll use Minio as the artifact store for MLflow. PostgreSQL is an open-source relational database management system. We’ll use PostgreSQL as the backend store for MLflow.
Step 1: Project Directory
Create a new directory for this project including the following files:
Inside the mlflow-docker
directory, initialize a new Git repository with the following commands:
Step 2: MLflow dockerfile
Create the dockerfile.mlflow
file with the following content:
We’re using the official MLflow Docker image from the GitHub Container Registry. The dockerfile.mlflow
file installs the boto3
and psycopg2-binary
packages, which are required for MLflow to work with Amazon S3 (Minio) and PostgreSQL.
Step 3: Python dockerfile
Next, create the dockerfile.python
file with the following content:
We’ll use the dockerfile.python
file to create a Python environment to run MLflow experiments.
Step 4: Docker Compose
Next, create the docker-compose.yaml
file with the following content:
We’re using docker-compose
to define the MLflow, Minio, and PostgreSQL services. The MLflow service is configured to use Minio as the artifact store and PostgreSQL as the backend store. We’re also exposing the MLflow web interface on port 5000
.
Running The Services
With the project directory set up and the services defined in the docker-compose.yaml
file, you can now run the MLflow, Minio, and PostgreSQL services using Docker Compose with the following command:
This command will build the MLflow Docker image, start the Minio and PostgreSQL services, and run the MLflow service. You can access the MLflow web interface at http://localhost:5000
and the Minio console at http://localhost:8900
.
MLflow Web Interface | Minio Console |
---|---|
Creating Minio Bucket
Because we’re using Minio as the artifact store, we need to create a bucket named mlflow
in the Minio console. You can do this by logging into the Minio console with the access key mlflow
and the secret key mlflow123
and creating a new bucket named mlflow
.
Creating and Running MLflow Experiments
Now that the MLflow service is up and running, let’s create an MLflow experiment using Python. Create the mlflow_experiment.py
file with the following content:
This Python script creates an MLflow experiment named iris-classification
, loads the Iris dataset, trains a random forest classifier, and predicts on the test set. The MLflow experiment logs the hyperparameters, metrics, and artifacts to the MLflow service running in the Docker container.
In order to run the MLflow experiment, we need to build the Python Docker image and run the Python script in the container. You can use the following command to build the Python Docker image:
And then run the Python script in the container using the following command:
This command mounts the current directory as a volume in the Python container, sets the environment variables required for MLflow to connect to Minio, and runs the mlflow_experiment.py
script in the container.
The output should look something like this:
You can view the MLflow experiment in the MLflow web interface at http://localhost:5000
. The experiment should be named iris-classification
and contain the hyperparameters, metrics, and artifacts logged by the Python script.
MLflow Experiment | MLflow Artifact Store |
---|---|
Serving MLflow Models
Once you’ve trained and logged your MLflow models, you can serve them using the MLflow web interface. MLflow provides a REST API for serving models, but in this article, we’ll use python to invoke the model.
Create a new Python script named mlflow_pred.py
with the following content:
This Python script loads the MLflow model named iris-classification
from the Production
stage, creates a sample input, and predicts using the model. You can run the Python script in the Python container using the following command:
The output should look something like this:
The Python script should print the sample input and the prediction made by the MLflow model.
Conclusion
In this article, we explored how to use MLflow with Docker to track and manage your ML experiments. We set up the MLflow, Minio, and PostgreSQL services using Docker Compose, created an MLflow experiment using Python, and served the MLflow model using Python. By deploying MLflow with Docker, you can easily track, manage, and serve your ML experiments in an isolated, portable, and reproducible environment.