Shaunak Joshi

Back to Projects
DevOps

AI Inference Orchestrator

A Kubernetes-native control plane for AI model inference workloads, built with Go and controller-runtime. Extends the Kubernetes API with a custom AIDeployment resource that declaratively manages deployment orchestration, autoscaling, drift detection, and observability — controllable through a CLI, HTTP tool server, or natural language interface.

AI Inference Orchestrator architecture
GoKubernetescontroller-runtimeCRDPrometheusHPARBAC

Custom Resource Definition

The AIDeployment CRD lets you declaratively define model name, replica count, service type, resource limits, and HPA config — Kubernetes reconciles the rest.

Drift Detection

Controller continuously compares desired vs actual state across owned Deployment, Service, and HPA resources — safely updating without disrupting autoscaler-managed replicas.

Observability

Exposes reconciliation loop metrics and business-level inference metrics to Prometheus. Deployment readiness propagated into CRD status conditions.

Multi-Interface Control

Manage model lifecycle via aictl CLI, REST HTTP tool server, or a natural language interface — enabling both traditional ops and AI agent-driven workflows.

Controller Guarantees

  • Idempotent reconciliation — safe to run repeatedly without side effects
  • HPA-safe replica management — never overwrites autoscaler-managed replicas
  • Conflict-safe status updates via retry-on-conflict
  • Owner references — child resources garbage-collected when CR is deleted