KServe

KServe

KServe is a Kubernetes-native platform for deploying both generative and predictive AI inference at scale, supporting OpenAI-compatible protocols and multi-framework deployment.

Use it when

  • You need a unified platform for both generative AI (LLMs) and predictive AI models.
  • You want OpenAI-compatible inference protocols for LLM deployment.
  • You're deploying models across multiple frameworks (TensorFlow, PyTorch, scikit-learn, etc.).
  • You need enterprise-scale workload handling with Kubernetes-native design.
  • You want intelligent request routing and advanced deployment options like canary deployments.
  • You require model explainability and advanced monitoring capabilities.
  • You need cost-efficient auto-scaling and request-based scaling.
  • You want native integration with Hugging Face models and GPU acceleration.
  • Your team has strong Kubernetes expertise and wants a CNCF-backed solution.

Watch out

  • Large model deployment timeouts: Takes longer than 5 minutes to deploy large models, causing container termination issues.
  • Auto-scaling limitations: Needs additional setup (KEDA) and doesn't support scaling to zero when idle.
  • Model transition issues: InferenceServices can get stuck in "InProgress" status indefinitely.
  • Multi-node/Multi-GPU limitations: Current design is insufficient for multi-node/multi-GPU use cases.
  • Monitoring gaps: Lacks comprehensive built-in model monitoring tools, requiring external solutions.
  • Community support: Only average community support compared to more established platforms.
  • Complex setup requirements: Requires significant Kubernetes expertise.
  • PyTorch engineering effort: Teams running PyTorch models require additional engineering work.
  • Resource management complexity: Scaling behavior issues, especially with Prometheus metrics integration.

Available in stages

Model Serving

Installation

curl -s "https://raw.githubusercontent.com/kserve/kserve/release-0.15/hack/quick_install.sh" | bash

Example stacks

Example stacks coming soon...