BentoML

BentoML

BentoML is an open platform that simplifies ML model deployment and enables you to serve your models at a production scale in minutes.

Use it when

  • You want a serving framework that supports a wide range of ML frameworks.
  • You want an end-to-end model serving solution which provides a model API server, model packaging, management, deployment automation, and offline batch serving features.
  • You want to do preprocessing and post-processing in serving endpoints.
  • You want built-in model monitoring features.
  • You want support for adaptive micro-batching.
  • You want model registry features through integration with Yatai.
  • You want to run on Google Colab.

Watch out

  • Currently, there is no multi-language support. Only Python is supported.
  • BentoML does not handle horizontal scaling. Users have to separately build Kubernetes-based solutions or use cloud platforms like AWS Lambda, AWS ECS, and Google Cloud Run to scale served models.

Available in stages

Model Registry

Installation

pip install bentoml

Example stacks

Example stacks coming soon...