Mlflow with Helm and serve Train Model on kubernetes

Dounpct
8 min readJun 17, 2023

--

Part 5: Install Mlflow on GKE Cluster with helm

Create GKE Cluster

  • You can create any GKE cluster or use your exist GKE

Overview Mlflow In GKE

As you know I like to install every opensource with helm and syn with ArgoCD that already have integrate vault plugin. So let start

Requirement

Install Postgresql

All source code can be found here dounpct/argocd-deployment (github.com)

  • create kind: AppProject in project.yaml in folder applications
apiVersion: argoproj.io/v1alpha1
kind: AppProject
metadata:
name: database
spec:
clusterResourceWhitelist:
- group: '*'
kind: '*'
destinations:
- namespace: '*'
server: https://kubernetes.default.svc
sourceRepos:
- '*'
  • create apps-postgresql.yaml in folder applications
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: postgresql
spec:
project: database
source:
path: 'postgresql'
repoURL: 'https://github.com/dounpct/argocd-deployment.git'
targetRevision: master
plugin:
env:
- name: INIT_ARGS
value: "helm dep update"
- name: ARG_PARAMETERS
value: "helm template postgresql -n postgresql . -f values/values.yaml "
destination:
server: https://kubernetes.default.svc
namespace: postgresql
syncPolicy:
syncOptions:
- CreateNamespace=true
  • Chart.yaml
apiVersion: v2
name: helm-postgresql
description: A Helm chart for Kubernetes

# A chart can be either an 'application' or a 'library' chart.
#
# Application charts are a collection of templates that can be packaged into versioned archives
# to be deployed.
#
# Library charts provide useful utilities or functions for the chart developer. They're included as
# a dependency of application charts to inject those utilities and functions into the rendering
# pipeline. Library charts do not define any templates and therefore cannot be deployed.
type: application

# This is the chart version. This version number should be incremented each time you make changes
# to the chart and its templates, including the app version.
# Versions are expected to follow Semantic Versioning (https://semver.org/)
version: 1.0.1

# This is the version number of the application being deployed. This version number should be
# incremented each time you make changes to the application. Versions are not expected to
# follow Semantic Versioning. They should reflect the version the application is using.
appVersion: 1.16.0
dependencies:
- name: postgresql
version: 12.2.3
repository: https://charts.bitnami.com/bitnami
cd postgresql
helm dep update
  • add vaules/values.yaml
postgresql:
global:
postgresql:
auth:
postgresPassword: <path:projects/362159383816/secrets/postgresql-admin-password#postgresql-admin-password>
database: "mlflow"
  • test render postgresql in same concept with ArgoCD and should no error
helm template postgresql -n postgresql . -f values/values.yaml | argocd-vault-plugin generate -
  • Commit code to git and let ArgoCD deploy
  • Great

Install Mlflow with ArgoCD

All source code can be found here dounpct/argocd-deployment (github.com)

  • create kind: AppProject in project.yaml in folder applications
apiVersion: argoproj.io/v1alpha1
kind: AppProject
metadata:
name: mlflow
spec:
clusterResourceWhitelist:
- group: '*'
kind: '*'
destinations:
- namespace: mlflow
server: https://kubernetes.default.svc
sourceRepos:
- '*'
  • create apps-ml.yaml in folder applications
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: mlflow
spec:
project: mlflow
source:
path: 'mlflow'
repoURL: 'https://github.com/dounpct/argocd-deployment.git'
targetRevision: master
plugin:
env:
- name: INIT_ARGS
value: "helm dep update"
- name: ARG_PARAMETERS
value: "helm template mlflow -n mlflow . -f values/values.yaml "
destination:
server: https://kubernetes.default.svc
namespace: mlflow
syncPolicy:
syncOptions:
- CreateNamespace=true
  • create forlder mlflow that have same structure with folder argocd , templates , values/vaules.yaml , Chart.yaml values.yaml
  • check version charts mlflow
helm repo add community-charts https://community-charts.github.io/helm-charts
helm repo update
helm search repo community-charts/mlflow
  • Chart.yaml
apiVersion: v2
name: helm-mlflow
description: A Helm chart for Kubernetes

# A chart can be either an 'application' or a 'library' chart.
#
# Application charts are a collection of templates that can be packaged into versioned archives
# to be deployed.
#
# Library charts provide useful utilities or functions for the chart developer. They're included as
# a dependency of application charts to inject those utilities and functions into the rendering
# pipeline. Library charts do not define any templates and therefore cannot be deployed.
type: application

# This is the chart version. This version number should be incremented each time you make changes
# to the chart and its templates, including the app version.
# Versions are expected to follow Semantic Versioning (https://semver.org/)
version: 0.0.1

# This is the version number of the application being deployed. This version number should be
# incremented each time you make changes to the application. Versions are not expected to
# follow Semantic Versioning. They should reflect the version the application is using.
appVersion: 0.0.1
dependencies:
- name: mlflow
version: 0.7.19
repository: https://community-charts.github.io/helm-charts
cd mlflow
helm dep update
  • Add secret.yaml
apiVersion: v1
kind: Secret
type: Opaque
metadata:
name: google-credentials-creds
data:
google_credentials.json: <path:projects/362159383816/secrets/google_credentials_json#google_credentials_json | base64encode>
---
apiVersion: v1
kind: Secret
type: Opaque
metadata:
name: basic-auth
data:
auth: <path:projects/362159383816/secrets/tdg_ingress_basic_authen#tdg_ingress_basic_authen | base64encode>
user: <path:projects/362159383816/secrets/tdg_ingress_basic_authen_user#tdg_ingress_basic_authen_user | base64encode>
password: <path:projects/362159383816/secrets/tdg_ingress_basic_authen_password#tdg_ingress_basic_authen_password | base64encode>
  • Add ingress.yaml with basic Authen
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: ingress-with-auth-mlflow
annotations:
# type of authentication
nginx.ingress.kubernetes.io/auth-type: basic
# name of the secret that contains the user/password definitions
nginx.ingress.kubernetes.io/auth-secret: basic-auth
# message to display with an appropriate context why the authentication is required
nginx.ingress.kubernetes.io/auth-realm: 'Authentication Required'
spec:
ingressClassName: nginx
rules:
- host: mlflow-tracking.domain.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: mlflow
port:
number: 5000
  • add vaules/values.yaml
mlflow:  
service:
# -- Specifies what type of Service should be created
type: ClusterIP
# -- Default Service port
port: 5000
# -- Default Service name
name: http
# -- Additional service annotations
annotations: {}

backendStore:
# -- Specifies if you want to run database migration
databaseMigration: true

# -- Add an additional init container, which checks for database availability
databaseConnectionCheck: true

postgres:
# -- Specifies if you want to use postgres backend storage
enabled: true
# -- Postgres host address. e.g. your RDS or Azure Postgres Service endpoint
host: "postgresql.postgresql.svc.cluster.local" # required
# -- Postgres service port
port: 5432 # required
# -- mlflow database name created before in the postgres instance
database: "mlflow" # required
# -- postgres database user name which can access to mlflow database
user: "postgres" # required
# -- postgres database user password which can access to mlflow database
password: <path:projects/362159383816/secrets/postgresql-admin-password#postgresql-admin-password> # required
# -- postgres database connection driver. e.g.: "psycopg2"
driver: ""
artifactRoot:
# -- Specifies if you want to enable proxied artifact storage access
proxiedArtifactStorage: true
gcs:
# -- Specifies if you want to use Google Cloud Storage Mlflow Artifact Root
enabled: true
# -- Google Cloud Storage bucket name
bucket: "mlflow_gke_test_20230314" # required
# -- Google Cloud Storage bucket folder. If you want to use root level, please don't set anything.
path: "" # optional

# -- Extra environment variables
extraEnvVars:
GOOGLE_APPLICATION_CREDENTIALS: "/app/config/google/google_credentials.json"
# MLFLOW_S3_IGNORE_TLS: true
# MLFLOW_S3_UPLOAD_EXTRA_ARGS: '{"ServerSideEncryption": "aws:kms", "SSEKMSKeyId": "1234"}'
# AWS_DEFAULT_REGION: my_region
# MLFLOW_S3_ENDPOINT_URL: http://1.2.3.4:9000
# AWS_CA_BUNDLE: /some/ca/bundle.pem
# MLFLOW_GCS_DEFAULT_TIMEOUT - Sets the standard timeout for transfer operations in seconds (Default: 60). Use -1 for indefinite timeout.
# MLFLOW_GCS_UPLOAD_CHUNK_SIZE - Sets the standard upload chunk size for bigger files in bytes (Default: 104857600 ≙ 100MiB), must be multiple of 256 KB.
# MLFLOW_GCS_DOWNLOAD_CHUNK_SIZE - Sets the standard download chunk size for bigger files in bytes (Default: 104857600 ≙ 100MiB), must be multiple of 256 K

# -- Extra Volumes for the pod
extraVolumes:
- name: google-credentials-creds
secret:
secretName: google-credentials-creds

# -- Extra Volume Mounts for the mlflow container
extraVolumeMounts:
- name: google-credentials-creds
mountPath: /app/config/google

Note we use : MLflow Tracking — MLflow 2.4.1 documentation
It mean we use Tracking Server to be proxued to handle with both database (Postgres) and artifact storage access (GCS)

  • Test render mlflow in same concept with ArgoCD and should no error and becuase I connect with Google secret manager
export GOOGLE_APPLICATION_CREDENTIALS="path-to-your-google-service-account-key/key.json"
export AVP_TYPE=gcpsecretmanager
helm template mlflow -n mlflow . -f values/values.yaml | argocd-vault-plugin generate -
  • Commit code to git and let ArgoCD deploy
  • Map ingress DNS to Nginx External load balancer
  • Open MLflow’s Tracking Server and fill user password
  • Note for connect to AWS S3 Bucket (minio)
mlflow:
artifactRoot:
# -- Specifies if you want to enable proxied artifact storage access
proxiedArtifactStorage: true
s3:
# -- Specifies if you want to use AWS S3 Mlflow Artifact Root
enabled: true
# -- S3 bucket name
bucket: "mlflow" # required
# -- S3 bucket folder. If you want to use root level, please don't set anything.
path: "" # optional
# -- AWS IAM user AWS_ACCESS_KEY_ID which has attached policy for access to the S3 bucket
awsAccessKeyId: "" # (awsAccessKeyId and awsSecretAccessKey) or roleArn serviceaccount annotation required
# -- AWS IAM user AWS_SECRET_ACCESS_KEY which has attached policy for access to the S3 bucket
awsSecretAccessKey: "" # (awsAccessKeyId and awsSecretAccessKey) or roleArn serviceaccount annotation required

# -- Extra environment variables
extraEnvVars:
# GOOGLE_APPLICATION_CREDENTIALS: "/app/config/google/google_credentials.json"
MLFLOW_S3_IGNORE_TLS: true
# MLFLOW_S3_UPLOAD_EXTRA_ARGS: '{"ServerSideEncryption": "aws:kms", "SSEKMSKeyId": "1234"}'
# AWS_DEFAULT_REGION: my_region
MLFLOW_S3_ENDPOINT_URL: https://minio-ml-hl.minio-ml.svc.cluster.local:9000
# AWS_CA_BUNDLE: /some/ca/bundle.pem
# MLFLOW_GCS_DEFAULT_TIMEOUT - Sets the standard timeout for transfer operations in seconds (Default: 60). Use -1 for indefinite timeout.
# MLFLOW_GCS_UPLOAD_CHUNK_SIZE - Sets the standard upload chunk size for bigger files in bytes (Default: 104857600 ≙ 100MiB), must be multiple of 256 KB.
# MLFLOW_GCS_DOWNLOAD_CHUNK_SIZE - Sets the standard download chunk size for bigger files in bytes (Default: 104857600 ≙ 100MiB), must be multiple of 256 K

# -- Extra secrets for environment variables
extraSecretNamesForEnvFrom:
- mlflow-secrets
  • Create secret mlflow-secrets that have credencial AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY for connect minio
  • Have fun!!!

— — — — — — — — — — — — — — — — — — — — — — — — — — — — —

Credit : TrueDigitalGroup

— — — — — — — — — — — — — — — — — — — — — — — — — — — — —

--

--

Dounpct
Dounpct

Written by Dounpct

I work for TrueDigitalGroup in DevOps x Automation Team

No responses yet