Mlflow with Helm and serve Train Model on kubernetes

Dounpct
5 min readAug 6, 2023

--

Part 7: Serve Model API on Cluster

In this topic I will show how to serve model with api server

Build Model Images

  • Create Dockerfile
FROM python:3.9

ENV HOME="/root"
WORKDIR ${HOME}

# RUN pip install mlflow==2.1.1 google-cloud-storage pathlib==1.0.1 lz4==3.1.3 psutil==5.9.0 typing-extensions==4.3.0 cloudpickle==2.2.1
RUN pip install google-cloud-storage

COPY requirements.txt requirements.txt
RUN pip install -r requirements.txt

ENV MODEL_URI ${MODEL_URI}
ENV SERVING_PORT ${SERVING_PORT}

# RUN apt-get install -y git
# RUN git clone --depth=1 https://github.com/pyenv/pyenv.git .pyenv
# ENV PYENV_ROOT="${HOME}/.pyenv"
# ENV PATH="${PYENV_ROOT}/shims:${PYENV_ROOT}/bin:${PATH}"

COPY serving.sh /serving.sh

CMD [ "/bin/bash", "/serving.sh" ]
# CMD [ "sh", "-c", "mlflow models serve --model-uri $MODEL_URI -h 0.0.0.0 -p $SERVING_PORT --no-conda"]
  • Create requirements.txt
mlflow==2.1.1
pathlib==1.0.1
lz4==3.1.3
psutil==5.9.0
typing-extensions==4.3.0
cloudpickle==2.2.1
  • Create serving.sh
#!/bin/sh

mlflow models serve --model-uri $MODEL_URI -h 0.0.0.0 -p $SERVING_PORT --no-conda
docker build -t mlflow .
  • Push images to gcr
docker tag mlflow:latest asia.gcr.io/gcp-devops/devops/mlflow_serving:latest
docker push asia.gcr.io/gcp-devops/devops/mlflow_serving:latest

Note: please create CI with your tools such as Jenkins, Cloud Build , GitHub Action and so on.

Deploy Images with ArgoCD

  • Create application mlflow-app
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: mlflow-app
spec:
project: mlflow
source:
repoURL: 'https://github.com/dounpct/argocd-deployment.git'
path: mlflow-app
targetRevision: master
plugin: {}
destination:
server: 'https://kubernetes.default.svc'
namespace: mlflow-app
syncPolicy:
syncOptions:
- CreateNamespace=true
  • Create folder mlflow-app
  • Create folder plugin and empty file cmp-plugin-yaml.yaml. It because mlflow-app will be only yaml folder and need cmp-plugin-yaml.yaml for letting ArgoCD know for render plugin.
  • Create ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: ingress-with-auth-mlflow-app
annotations:
# type of authentication
nginx.ingress.kubernetes.io/auth-type: basic
# name of the secret that contains the user/password definitions
nginx.ingress.kubernetes.io/auth-secret: basic-auth
# message to display with an appropriate context why the authentication is required
nginx.ingress.kubernetes.io/auth-realm: 'Authentication Required'
spec:
ingressClassName: nginx
rules:
- host: nginx-auth-test.local.net
http:
paths:
- path: /invocations
pathType: Prefix
backend:
service:
name: mlflow-serving
port:
number: 8082
  • Create secret.yaml
apiVersion: v1
kind: Secret
type: Opaque
metadata:
name: gcsfs-creds
data:
keyfile.json: <path:projects/362159383816/secrets/google_credentials_json#google_credentials_json | base64encode>
---
apiVersion: v1
kind: Secret
metadata:
name: gcr-authen
type: kubernetes.io/dockerconfigjson
data:
.dockerconfigjson: <path:projects/362159383816/secrets/GCR_DOCKERCONFIGJSON#GCR_DOCKERCONFIGJSON | base64encode>
---
apiVersion: v1
kind: Secret
type: Opaque
metadata:
name: basic-auth
data:
auth: <path:projects/362159383816/secrets/tdg_ingress_basic_authen#tdg_ingress_basic_authen | base64encode>
user: <path:projects/362159383816/secrets/tdg_ingress_basic_authen_user#tdg_ingress_basic_authen_user | base64encode>
password: <path:projects/362159383816/secrets/tdg_ingress_basic_authen_password#tdg_ingress_basic_authen_password | base64encode>
  • Go to GCS and copy gsutil URI
  • Create application.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: mlflow-serving
labels:
app: serve-ML-model-mlflow

spec:
replicas: 1
selector:
matchLabels:
app: mlflow-serving
template:
metadata:
labels:
app: mlflow-serving

spec:
containers:
- name: mlflow-serving
image: asia.gcr.io/xxxxxxxx/mlflow_serving:latest #change here
env:
- name: MODEL_URI
value: "gs://xxxxxxxxxx/artifacts/model" #change here
- name: SERVING_PORT
value: "8082"
- name: GOOGLE_APPLICATION_CREDENTIALS
value: "/etc/secrets/keyfile.json"
volumeMounts:
- name: gcsfs-creds
mountPath: "/etc/secrets"
readOnly: true
resources:
limits:
cpu: 1000m
memory: 600Mi
requests:
cpu: 500m
memory: 300Mi
imagePullSecrets:
- name: gcr-authen

volumes:
- name: gcsfs-creds
secret:
secretName: gcsfs-creds
items:
- key: keyfile.json
path: keyfile.json
---
apiVersion: v1
kind: Service
metadata:
labels:
app: mlflow-serving
name: mlflow-serving
spec:
ports:
- name: http-web
port: 8082
protocol: TCP
targetPort: 8082
selector:
app: mlflow-serving
type: ClusterIP
---
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: mlflow-serving
spec:
minReplicas: 1
maxReplicas: 3
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: mlflow-serving
targetCPUUtilizationPercentage: 60
  • Check application on GKE
  • Check ingress
kubectl get ing -n mlflow-app
  • Test API
curl -X POST -H "Content-Type:application/json"                     \
--data "{\"dataframe_split\": {\"data\":[[ \
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 1.0, 0.0]]}}" \
http://nginx-auth-test.local.com/invocations -u 'ingress_user:xxxxxxxxxxxxxxxxxxxxxx' | jq
### RESULT ###
{
"predictions": [
"yes"
]
}
  • Great as you see we already have API that can scale with HPA to serve traffic to predict request
  • Note I have mlflow both on GKE (GCS and S3 Minio) so this is for Minio
apiVersion: apps/v1
kind: Deployment
metadata:
name: mlflow-serving
labels:
app: serve-ML-model-mlflow
spec:
selector:
matchLabels:
app: mlflow-serving
template:
metadata:
labels:
app: mlflow-serving
spec:
containers:
- name: mlflow-serving
image: gcr.io/xxxxxxxxxxx/ml-training-api:v0.0.5 #change here
env:
- name: MODEL_URI
value: "s3://mlflow/1/xxxxxxxxxxxxxxxxxxx/artifacts/model" #change here
- name: SERVING_PORT
value: "8082"
- name: MLFLOW_S3_ENDPOINT_URL
value: "https://minio-ml-hl.minio-ml.svc.cluster.local:9000"
- name: MLFLOW_S3_IGNORE_TLS
value: "true"
resources:
limits:
cpu: 1000m
memory: 600Mi
requests:
cpu: 500m
memory: 300Mi
envFrom:
- secretRef:
name: mlflow-app-secrets
---
apiVersion: v1
kind: Service
metadata:
labels:
app: mlflow-serving
name: mlflow-serving
spec:
ports:
- name: http-web
port: 8082
protocol: TCP
targetPort: 8082
selector:
app: mlflow-serving
type: ClusterIP
---
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: mlflow-serving
spec:
minReplicas: 1
maxReplicas: 3
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: mlflow-serving
targetCPUUtilizationPercentage: 60
  • Have fun !!!

This is End of “Mlflow with Helm and serve Train Model on kubernetes”

— — — — — — — — — — — — — — — — — — — — — — — — — — — — —

Credit : TrueDigitalGroup

— — — — — — — — — — — — — — — — — — — — — — — — — — — — —

--

--

Dounpct
Dounpct

Written by Dounpct

I work for TrueDigitalGroup in DevOps x Automation Team

No responses yet