Part 1: Install Mlflow on Local Machine
Part 2: Train example model and keep in Mlflow Local Machine
Part 3: Expose example api on Local Machine
Part 4: API Transform for Model API
Part 5: Install Mlflow on GKE Cluster with helm
Part 6: Keep Model in Mlflow remote Cluster
Part 7: Serve Model API on Cluster
Create GKE Cluster
- You can create any GKE cluster or use your exist GKE
Overview Mlflow In GKE
As you know I like to install every opensource with helm and syn with ArgoCD that already have integrate vault plugin. So let start
Requirement
- When run Mlflow in kubernetes we need a database. So I choose postgresql
- We will keep Model in GCS
- You can see how to install ArgoCD with my Article : https://dounpct.medium.com/argocd-argocd-vault-plugin-20d28f03316c
- Chart here : mlflow 0.7.19 · community-charts/community-charts (artifacthub.io)
Install Postgresql
All source code can be found here dounpct/argocd-deployment (github.com)
- create kind: AppProject in project.yaml in folder applications
apiVersion: argoproj.io/v1alpha1
kind: AppProject
metadata:
name: database
spec:
clusterResourceWhitelist:
- group: '*'
kind: '*'
destinations:
- namespace: '*'
server: https://kubernetes.default.svc
sourceRepos:
- '*'
- create apps-postgresql.yaml in folder applications
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: postgresql
spec:
project: database
source:
path: 'postgresql'
repoURL: 'https://github.com/dounpct/argocd-deployment.git'
targetRevision: master
plugin:
env:
- name: INIT_ARGS
value: "helm dep update"
- name: ARG_PARAMETERS
value: "helm template postgresql -n postgresql . -f values/values.yaml "
destination:
server: https://kubernetes.default.svc
namespace: postgresql
syncPolicy:
syncOptions:
- CreateNamespace=true
- Chart.yaml
apiVersion: v2
name: helm-postgresql
description: A Helm chart for Kubernetes
# A chart can be either an 'application' or a 'library' chart.
#
# Application charts are a collection of templates that can be packaged into versioned archives
# to be deployed.
#
# Library charts provide useful utilities or functions for the chart developer. They're included as
# a dependency of application charts to inject those utilities and functions into the rendering
# pipeline. Library charts do not define any templates and therefore cannot be deployed.
type: application
# This is the chart version. This version number should be incremented each time you make changes
# to the chart and its templates, including the app version.
# Versions are expected to follow Semantic Versioning (https://semver.org/)
version: 1.0.1
# This is the version number of the application being deployed. This version number should be
# incremented each time you make changes to the application. Versions are not expected to
# follow Semantic Versioning. They should reflect the version the application is using.
appVersion: 1.16.0
dependencies:
- name: postgresql
version: 12.2.3
repository: https://charts.bitnami.com/bitnami
cd postgresql
helm dep update
- add vaules/values.yaml
postgresql:
global:
postgresql:
auth:
postgresPassword: <path:projects/362159383816/secrets/postgresql-admin-password#postgresql-admin-password>
database: "mlflow"
- test render postgresql in same concept with ArgoCD and should no error
helm template postgresql -n postgresql . -f values/values.yaml | argocd-vault-plugin generate -
- Commit code to git and let ArgoCD deploy
- Great
Install Mlflow with ArgoCD
All source code can be found here dounpct/argocd-deployment (github.com)
- create kind: AppProject in project.yaml in folder applications
apiVersion: argoproj.io/v1alpha1
kind: AppProject
metadata:
name: mlflow
spec:
clusterResourceWhitelist:
- group: '*'
kind: '*'
destinations:
- namespace: mlflow
server: https://kubernetes.default.svc
sourceRepos:
- '*'
- create apps-ml.yaml in folder applications
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: mlflow
spec:
project: mlflow
source:
path: 'mlflow'
repoURL: 'https://github.com/dounpct/argocd-deployment.git'
targetRevision: master
plugin:
env:
- name: INIT_ARGS
value: "helm dep update"
- name: ARG_PARAMETERS
value: "helm template mlflow -n mlflow . -f values/values.yaml "
destination:
server: https://kubernetes.default.svc
namespace: mlflow
syncPolicy:
syncOptions:
- CreateNamespace=true
- create forlder mlflow that have same structure with folder argocd , templates , values/vaules.yaml , Chart.yaml values.yaml
- check version charts mlflow
helm repo add community-charts https://community-charts.github.io/helm-charts
helm repo update
helm search repo community-charts/mlflow
- Chart.yaml
apiVersion: v2
name: helm-mlflow
description: A Helm chart for Kubernetes
# A chart can be either an 'application' or a 'library' chart.
#
# Application charts are a collection of templates that can be packaged into versioned archives
# to be deployed.
#
# Library charts provide useful utilities or functions for the chart developer. They're included as
# a dependency of application charts to inject those utilities and functions into the rendering
# pipeline. Library charts do not define any templates and therefore cannot be deployed.
type: application
# This is the chart version. This version number should be incremented each time you make changes
# to the chart and its templates, including the app version.
# Versions are expected to follow Semantic Versioning (https://semver.org/)
version: 0.0.1
# This is the version number of the application being deployed. This version number should be
# incremented each time you make changes to the application. Versions are not expected to
# follow Semantic Versioning. They should reflect the version the application is using.
appVersion: 0.0.1
dependencies:
- name: mlflow
version: 0.7.19
repository: https://community-charts.github.io/helm-charts
cd mlflow
helm dep update
- Add secret.yaml
apiVersion: v1
kind: Secret
type: Opaque
metadata:
name: google-credentials-creds
data:
google_credentials.json: <path:projects/362159383816/secrets/google_credentials_json#google_credentials_json | base64encode>
---
apiVersion: v1
kind: Secret
type: Opaque
metadata:
name: basic-auth
data:
auth: <path:projects/362159383816/secrets/tdg_ingress_basic_authen#tdg_ingress_basic_authen | base64encode>
user: <path:projects/362159383816/secrets/tdg_ingress_basic_authen_user#tdg_ingress_basic_authen_user | base64encode>
password: <path:projects/362159383816/secrets/tdg_ingress_basic_authen_password#tdg_ingress_basic_authen_password | base64encode>
- Add ingress.yaml with basic Authen
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: ingress-with-auth-mlflow
annotations:
# type of authentication
nginx.ingress.kubernetes.io/auth-type: basic
# name of the secret that contains the user/password definitions
nginx.ingress.kubernetes.io/auth-secret: basic-auth
# message to display with an appropriate context why the authentication is required
nginx.ingress.kubernetes.io/auth-realm: 'Authentication Required'
spec:
ingressClassName: nginx
rules:
- host: mlflow-tracking.domain.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: mlflow
port:
number: 5000
- add vaules/values.yaml
mlflow:
service:
# -- Specifies what type of Service should be created
type: ClusterIP
# -- Default Service port
port: 5000
# -- Default Service name
name: http
# -- Additional service annotations
annotations: {}
backendStore:
# -- Specifies if you want to run database migration
databaseMigration: true
# -- Add an additional init container, which checks for database availability
databaseConnectionCheck: true
postgres:
# -- Specifies if you want to use postgres backend storage
enabled: true
# -- Postgres host address. e.g. your RDS or Azure Postgres Service endpoint
host: "postgresql.postgresql.svc.cluster.local" # required
# -- Postgres service port
port: 5432 # required
# -- mlflow database name created before in the postgres instance
database: "mlflow" # required
# -- postgres database user name which can access to mlflow database
user: "postgres" # required
# -- postgres database user password which can access to mlflow database
password: <path:projects/362159383816/secrets/postgresql-admin-password#postgresql-admin-password> # required
# -- postgres database connection driver. e.g.: "psycopg2"
driver: ""
artifactRoot:
# -- Specifies if you want to enable proxied artifact storage access
proxiedArtifactStorage: true
gcs:
# -- Specifies if you want to use Google Cloud Storage Mlflow Artifact Root
enabled: true
# -- Google Cloud Storage bucket name
bucket: "mlflow_gke_test_20230314" # required
# -- Google Cloud Storage bucket folder. If you want to use root level, please don't set anything.
path: "" # optional
# -- Extra environment variables
extraEnvVars:
GOOGLE_APPLICATION_CREDENTIALS: "/app/config/google/google_credentials.json"
# MLFLOW_S3_IGNORE_TLS: true
# MLFLOW_S3_UPLOAD_EXTRA_ARGS: '{"ServerSideEncryption": "aws:kms", "SSEKMSKeyId": "1234"}'
# AWS_DEFAULT_REGION: my_region
# MLFLOW_S3_ENDPOINT_URL: http://1.2.3.4:9000
# AWS_CA_BUNDLE: /some/ca/bundle.pem
# MLFLOW_GCS_DEFAULT_TIMEOUT - Sets the standard timeout for transfer operations in seconds (Default: 60). Use -1 for indefinite timeout.
# MLFLOW_GCS_UPLOAD_CHUNK_SIZE - Sets the standard upload chunk size for bigger files in bytes (Default: 104857600 ≙ 100MiB), must be multiple of 256 KB.
# MLFLOW_GCS_DOWNLOAD_CHUNK_SIZE - Sets the standard download chunk size for bigger files in bytes (Default: 104857600 ≙ 100MiB), must be multiple of 256 K
# -- Extra Volumes for the pod
extraVolumes:
- name: google-credentials-creds
secret:
secretName: google-credentials-creds
# -- Extra Volume Mounts for the mlflow container
extraVolumeMounts:
- name: google-credentials-creds
mountPath: /app/config/google
Note we use : MLflow Tracking — MLflow 2.4.1 documentation
It mean we use Tracking Server to be proxued to handle with both database (Postgres) and artifact storage access (GCS)
- Test render mlflow in same concept with ArgoCD and should no error and becuase I connect with Google secret manager
export GOOGLE_APPLICATION_CREDENTIALS="path-to-your-google-service-account-key/key.json"
export AVP_TYPE=gcpsecretmanager
helm template mlflow -n mlflow . -f values/values.yaml | argocd-vault-plugin generate -
- Commit code to git and let ArgoCD deploy
- Map ingress DNS to Nginx External load balancer
- Open MLflow’s Tracking Server and fill user password
- Note for connect to AWS S3 Bucket (minio)
mlflow:
artifactRoot:
# -- Specifies if you want to enable proxied artifact storage access
proxiedArtifactStorage: true
s3:
# -- Specifies if you want to use AWS S3 Mlflow Artifact Root
enabled: true
# -- S3 bucket name
bucket: "mlflow" # required
# -- S3 bucket folder. If you want to use root level, please don't set anything.
path: "" # optional
# -- AWS IAM user AWS_ACCESS_KEY_ID which has attached policy for access to the S3 bucket
awsAccessKeyId: "" # (awsAccessKeyId and awsSecretAccessKey) or roleArn serviceaccount annotation required
# -- AWS IAM user AWS_SECRET_ACCESS_KEY which has attached policy for access to the S3 bucket
awsSecretAccessKey: "" # (awsAccessKeyId and awsSecretAccessKey) or roleArn serviceaccount annotation required
# -- Extra environment variables
extraEnvVars:
# GOOGLE_APPLICATION_CREDENTIALS: "/app/config/google/google_credentials.json"
MLFLOW_S3_IGNORE_TLS: true
# MLFLOW_S3_UPLOAD_EXTRA_ARGS: '{"ServerSideEncryption": "aws:kms", "SSEKMSKeyId": "1234"}'
# AWS_DEFAULT_REGION: my_region
MLFLOW_S3_ENDPOINT_URL: https://minio-ml-hl.minio-ml.svc.cluster.local:9000
# AWS_CA_BUNDLE: /some/ca/bundle.pem
# MLFLOW_GCS_DEFAULT_TIMEOUT - Sets the standard timeout for transfer operations in seconds (Default: 60). Use -1 for indefinite timeout.
# MLFLOW_GCS_UPLOAD_CHUNK_SIZE - Sets the standard upload chunk size for bigger files in bytes (Default: 104857600 ≙ 100MiB), must be multiple of 256 KB.
# MLFLOW_GCS_DOWNLOAD_CHUNK_SIZE - Sets the standard download chunk size for bigger files in bytes (Default: 104857600 ≙ 100MiB), must be multiple of 256 K
# -- Extra secrets for environment variables
extraSecretNamesForEnvFrom:
- mlflow-secrets
- Create secret mlflow-secrets that have credencial AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY for connect minio
- Have fun!!!
— — — — — — — — — — — — — — — — — — — — — — — — — — — — —
Credit : TrueDigitalGroup
— — — — — — — — — — — — — — — — — — — — — — — — — — — — —