BentoML is an open platform that simplifies ML model deployment and enables you to serve your models at a production scale in minutes.
Use it when
•You want a serving framework that supports a wide range of ML frameworks.
•You want an end-to-end model serving solution which provides a model API server, model packaging, management, deployment automation, and offline batch serving features.
•You want to do preprocessing and post-processing in serving endpoints.
•You want built-in model monitoring features.
•You want support for adaptive micro-batching.
•You want model registry features through integration with Yatai.
•You want to run on Google Colab.
Watch out
⚠Currently, there is no multi-language support. Only Python is supported.
⚠BentoML does not handle horizontal scaling. Users have to separately build Kubernetes-based solutions or use cloud platforms like AWS Lambda, AWS ECS, and Google Cloud Run to scale served models.