Mlflow with Helm and serve Train Model on kubernetes

8 min readJun 17, 2023

Part 5: Install Mlflow on GKE Cluster with helm

Part 1: Install Mlflow on Local Machine
Part 2: Train example model and keep in Mlflow Local Machine
Part 3: Expose example api on Local Machine
Part 4: API Transform for Model API
Part 5: Install Mlflow on GKE Cluster with helm
Part 6: Keep Model in Mlflow remote Cluster
Part 7: Serve Model API on Cluster

Create GKE Cluster

You can create any GKE cluster or use your exist GKE

Overview Mlflow In GKE

As you know I like to install every opensource with helm and syn with ArgoCD that already have integrate vault plugin. So let start

Requirement

When run Mlflow in kubernetes we need a database. So I choose postgresql
We will keep Model in GCS
You can see how to install ArgoCD with my Article : https://dounpct.medium.com/argocd-argocd-vault-plugin-20d28f03316c
Chart here : mlflow 0.7.19 · community-charts/community-charts (artifacthub.io)

Install Postgresql

All source code can be found here dounpct/argocd-deployment (github.com)

create kind: AppProject in project.yaml in folder applications

apiVersion: argoproj.io/v1alpha1
kind: AppProject
metadata:
  name: database
spec:
  clusterResourceWhitelist:
  - group: '*'
    kind: '*'
  destinations:
  - namespace: '*'
    server: https://kubernetes.default.svc
  sourceRepos:
  - '*'

create apps-postgresql.yaml in folder applications

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: postgresql
spec:
  project: database
  source:
    path: 'postgresql'
    repoURL: 'https://github.com/dounpct/argocd-deployment.git'
    targetRevision: master
    plugin:
      env:
        - name:  INIT_ARGS
          value: "helm dep update"
        - name:  ARG_PARAMETERS
          value: "helm template postgresql -n postgresql . -f values/values.yaml "
  destination:
    server: https://kubernetes.default.svc
    namespace: postgresql
  syncPolicy:
    syncOptions:
      - CreateNamespace=true

Chart.yaml

apiVersion: v2
name: helm-postgresql
description: A Helm chart for Kubernetes

# A chart can be either an 'application' or a 'library' chart.
#
# Application charts are a collection of templates that can be packaged into versioned archives
# to be deployed.
#
# Library charts provide useful utilities or functions for the chart developer. They're included as
# a dependency of application charts to inject those utilities and functions into the rendering
# pipeline. Library charts do not define any templates and therefore cannot be deployed.
type: application

# This is the chart version. This version number should be incremented each time you make changes
# to the chart and its templates, including the app version.
# Versions are expected to follow Semantic Versioning (https://semver.org/)
version: 1.0.1

# This is the version number of the application being deployed. This version number should be
# incremented each time you make changes to the application. Versions are not expected to
# follow Semantic Versioning. They should reflect the version the application is using.
appVersion: 1.16.0
dependencies:
  - name: postgresql
    version: 12.2.3
    repository: https://charts.bitnami.com/bitnami

cd postgresql
helm dep update

add vaules/values.yaml

postgresql:
  global:
    postgresql:
      auth:
        postgresPassword: <path:projects/362159383816/secrets/postgresql-admin-password#postgresql-admin-password>
        database: "mlflow"

test render postgresql in same concept with ArgoCD and should no error

helm template postgresql -n postgresql . -f values/values.yaml | argocd-vault-plugin generate -

Commit code to git and let ArgoCD deploy

Great

Install Mlflow with ArgoCD

All source code can be found here dounpct/argocd-deployment (github.com)

create kind: AppProject in project.yaml in folder applications

apiVersion: argoproj.io/v1alpha1
kind: AppProject
metadata:
  name: mlflow
spec:
  clusterResourceWhitelist:
  - group: '*'
    kind: '*'
  destinations:
  - namespace: mlflow
    server: https://kubernetes.default.svc
  sourceRepos:
  - '*'

create apps-ml.yaml in folder applications

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: mlflow
spec:
  project: mlflow
  source:
    path: 'mlflow'
    repoURL: 'https://github.com/dounpct/argocd-deployment.git'
    targetRevision: master
    plugin:
      env:
        - name:  INIT_ARGS
          value: "helm dep update"
        - name:  ARG_PARAMETERS
          value: "helm template mlflow -n mlflow . -f values/values.yaml "
  destination:
    server: https://kubernetes.default.svc
    namespace: mlflow
  syncPolicy:
    syncOptions:
      - CreateNamespace=true

create forlder mlflow that have same structure with folder argocd , templates , values/vaules.yaml , Chart.yaml values.yaml
check version charts mlflow

helm repo add community-charts https://community-charts.github.io/helm-charts
helm repo update
helm search repo community-charts/mlflow

Chart.yaml

apiVersion: v2
name: helm-mlflow
description: A Helm chart for Kubernetes

# A chart can be either an 'application' or a 'library' chart.
#
# Application charts are a collection of templates that can be packaged into versioned archives
# to be deployed.
#
# Library charts provide useful utilities or functions for the chart developer. They're included as
# a dependency of application charts to inject those utilities and functions into the rendering
# pipeline. Library charts do not define any templates and therefore cannot be deployed.
type: application

# This is the chart version. This version number should be incremented each time you make changes
# to the chart and its templates, including the app version.
# Versions are expected to follow Semantic Versioning (https://semver.org/)
version: 0.0.1

# This is the version number of the application being deployed. This version number should be
# incremented each time you make changes to the application. Versions are not expected to
# follow Semantic Versioning. They should reflect the version the application is using.
appVersion: 0.0.1
dependencies:
  - name: mlflow 
    version: 0.7.19
    repository: https://community-charts.github.io/helm-charts

cd mlflow
helm dep update

Add secret.yaml

apiVersion: v1
kind: Secret
type: Opaque
metadata:
  name: google-credentials-creds
data:
  google_credentials.json: <path:projects/362159383816/secrets/google_credentials_json#google_credentials_json | base64encode>
---
apiVersion: v1
kind: Secret
type: Opaque
metadata:
  name: basic-auth
data:
  auth: <path:projects/362159383816/secrets/tdg_ingress_basic_authen#tdg_ingress_basic_authen | base64encode>
  user: <path:projects/362159383816/secrets/tdg_ingress_basic_authen_user#tdg_ingress_basic_authen_user | base64encode>
  password: <path:projects/362159383816/secrets/tdg_ingress_basic_authen_password#tdg_ingress_basic_authen_password | base64encode>

Add ingress.yaml with basic Authen

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: ingress-with-auth-mlflow
  annotations:
    # type of authentication
    nginx.ingress.kubernetes.io/auth-type: basic
    # name of the secret that contains the user/password definitions
    nginx.ingress.kubernetes.io/auth-secret: basic-auth
    # message to display with an appropriate context why the authentication is required
    nginx.ingress.kubernetes.io/auth-realm: 'Authentication Required'
spec:
  ingressClassName: nginx
  rules:
  - host: mlflow-tracking.domain.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service: 
            name: mlflow
            port: 
              number: 5000

add vaules/values.yaml

mlflow:  
  service:
    # -- Specifies what type of Service should be created
    type: ClusterIP
    # -- Default Service port
    port: 5000
    # -- Default Service name
    name: http
    # -- Additional service annotations
    annotations: {}

  backendStore:
    # -- Specifies if you want to run database migration
    databaseMigration: true

    # -- Add an additional init container, which checks for database availability
    databaseConnectionCheck: true

    postgres:
      # -- Specifies if you want to use postgres backend storage
      enabled: true
      # -- Postgres host address. e.g. your RDS or Azure Postgres Service endpoint
      host: "postgresql.postgresql.svc.cluster.local" # required
      # -- Postgres service port
      port: 5432 # required
      # -- mlflow database name created before in the postgres instance
      database: "mlflow" # required
      # -- postgres database user name which can access to mlflow database
      user: "postgres" # required
      # -- postgres database user password which can access to mlflow database
      password: <path:projects/362159383816/secrets/postgresql-admin-password#postgresql-admin-password> # required
      # -- postgres database connection driver. e.g.: "psycopg2"
      driver: ""
  artifactRoot:
    # -- Specifies if you want to enable proxied artifact storage access
    proxiedArtifactStorage: true
    gcs:
      # -- Specifies if you want to use Google Cloud Storage Mlflow Artifact Root
      enabled: true
      # -- Google Cloud Storage bucket name
      bucket: "mlflow_gke_test_20230314" # required
      # -- Google Cloud Storage bucket folder. If you want to use root level, please don't set anything.
      path: "" # optional
  
  # -- Extra environment variables
  extraEnvVars:
    GOOGLE_APPLICATION_CREDENTIALS: "/app/config/google/google_credentials.json"
    # MLFLOW_S3_IGNORE_TLS: true
    # MLFLOW_S3_UPLOAD_EXTRA_ARGS: '{"ServerSideEncryption": "aws:kms", "SSEKMSKeyId": "1234"}'
    # AWS_DEFAULT_REGION: my_region
    # MLFLOW_S3_ENDPOINT_URL: http://1.2.3.4:9000
    # AWS_CA_BUNDLE: /some/ca/bundle.pem
    # MLFLOW_GCS_DEFAULT_TIMEOUT - Sets the standard timeout for transfer operations in seconds (Default: 60). Use -1 for indefinite timeout.
    # MLFLOW_GCS_UPLOAD_CHUNK_SIZE - Sets the standard upload chunk size for bigger files in bytes (Default: 104857600 ≙ 100MiB), must be multiple of 256 KB.
    # MLFLOW_GCS_DOWNLOAD_CHUNK_SIZE - Sets the standard download chunk size for bigger files in bytes (Default: 104857600 ≙ 100MiB), must be multiple of 256 K
  
  # -- Extra Volumes for the pod
  extraVolumes:
  - name: google-credentials-creds
    secret:
      secretName: google-credentials-creds

  # -- Extra Volume Mounts for the mlflow container
  extraVolumeMounts:
  - name: google-credentials-creds
    mountPath: /app/config/google

Note we use : MLflow Tracking — MLflow 2.4.1 documentation
It mean we use Tracking Server to be proxued to handle with both database (Postgres) and artifact storage access (GCS)

Test render mlflow in same concept with ArgoCD and should no error and becuase I connect with Google secret manager

export GOOGLE_APPLICATION_CREDENTIALS="path-to-your-google-service-account-key/key.json"
export AVP_TYPE=gcpsecretmanager

helm template mlflow -n mlflow . -f values/values.yaml | argocd-vault-plugin generate -

Commit code to git and let ArgoCD deploy

Map ingress DNS to Nginx External load balancer
Open MLflow’s Tracking Server and fill user password

Note for connect to AWS S3 Bucket (minio)

mlflow:
  artifactRoot:
    # -- Specifies if you want to enable proxied artifact storage access
    proxiedArtifactStorage: true
    s3:
      # -- Specifies if you want to use AWS S3 Mlflow Artifact Root
      enabled: true
      # -- S3 bucket name
      bucket: "mlflow" # required
      # -- S3 bucket folder. If you want to use root level, please don't set anything.
      path: "" # optional
      # -- AWS IAM user AWS_ACCESS_KEY_ID which has attached policy for access to the S3 bucket
      awsAccessKeyId: "" # (awsAccessKeyId and awsSecretAccessKey) or roleArn serviceaccount annotation required
      # -- AWS IAM user AWS_SECRET_ACCESS_KEY which has attached policy for access to the S3 bucket
      awsSecretAccessKey: "" # (awsAccessKeyId and awsSecretAccessKey) or roleArn serviceaccount annotation required
    
  # -- Extra environment variables
  extraEnvVars:
    # GOOGLE_APPLICATION_CREDENTIALS: "/app/config/google/google_credentials.json"
    MLFLOW_S3_IGNORE_TLS: true
    # MLFLOW_S3_UPLOAD_EXTRA_ARGS: '{"ServerSideEncryption": "aws:kms", "SSEKMSKeyId": "1234"}'
    # AWS_DEFAULT_REGION: my_region
    MLFLOW_S3_ENDPOINT_URL: https://minio-ml-hl.minio-ml.svc.cluster.local:9000
    # AWS_CA_BUNDLE: /some/ca/bundle.pem
    # MLFLOW_GCS_DEFAULT_TIMEOUT - Sets the standard timeout for transfer operations in seconds (Default: 60). Use -1 for indefinite timeout.
    # MLFLOW_GCS_UPLOAD_CHUNK_SIZE - Sets the standard upload chunk size for bigger files in bytes (Default: 104857600 ≙ 100MiB), must be multiple of 256 KB.
    # MLFLOW_GCS_DOWNLOAD_CHUNK_SIZE - Sets the standard download chunk size for bigger files in bytes (Default: 104857600 ≙ 100MiB), must be multiple of 256 K

  # -- Extra secrets for environment variables
  extraSecretNamesForEnvFrom:
  - mlflow-secrets

Create secret mlflow-secrets that have credencial AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY for connect minio
Have fun!!!

— — — — — — — — — — — — — — — — — — — — — — — — — — — — —

Credit : TrueDigitalGroup

— — — — — — — — — — — — — — — — — — — — — — — — — — — — —

Mlflow with Helm and serve Train Model on kubernetes

Written by Dounpct

No responses yet