Please enable JavaScript.
Coggle requires JavaScript to display documents.
On-premises MLOps - Coggle Diagram
On-premises MLOps
TASK:
To orchestrate model training & deployment & monitoring using Kubeflow
Training
Horovod
A framework for efficient distributed multi-GPU training
based on MPI
Supports: Pytorch, Tensorflow, MXnet
MPI operator
Versioning
Artifact lineage
Model
Dataset
Deployment
TRTIS
Transformer
Monitoring
Components to complete
Basic ML model
Small dataset
Working K8s environment to run Kubeflow in
Tools to submit and monitor training jobs
Tools to deploy model
Valid prebuilt pipeline for testing
Kubernetes cluster
Minikube
API
via CLI
via dashboard
Access governance
Web-apps
LoadBalancer
access that bypasses k8s cluster interaction
K8s ingress
uses k8s primitive - the ingress controller
Security
All atop apps
kubectl CLI
Additional security for some use-cases:
Kubeflow -> secure sensitive data operations
Kubeflow on the K8s cluster
Components
Components that extend the Kubernetes API
Custom components must define CRDs
CRD - custom resource definition - state of the resource
that is appropriate
Access via Kubeflow API
Components that are applications that run atop of Kubernetes
Jupyter Notebooks, Hyperparameter
Training (Katib), Pipelines, and others
Access via these apps
How to choose between the two component approaches?
Python proficient users with containerization skills could
use extension CRDs. They allow for larger flexibility and reuse
Others may use Jupyter Notebooks atop and built pipes there.
Security is type-specific without clear leader
Note. This EXTENDS the K8s C.
Mind this when provisioning security