A flexible and easy-to-use tool for serving and scaling PyTorch models in production environments. Supports both eager mode and TorchScript models with built-in multi-worker scaling, metrics collection, and seamless API access for high-performance inference.