Back to all
§ case / Lumen Labs
Multi-tenant LLM inference platform
Built a GPU-backed inference platform on GKE, auto-scaling from 0 to 200 pods with under 4s cold starts.
Designed a multi-tenant inference stack with token-based rate limiting, isolated namespaces and shared model caches. Horizontal autoscaling driven by custom metrics.