NannyML is an open-source post-deployment data science library that detects silent model failures in production and estimates ML model performance without ground truth.
Use it when
•Ground truth delays: When actual outcome labels are delayed or completely absent in production.
•Performance estimation needed: When you need to estimate model performance without waiting for target labels.
•Meaningful alerts required: When you want alerts focused on actual performance impact rather than just data drift.
•Silent failure detection: When you need to detect model performance degradation that occurs without obvious warning signs.
•Business impact tracking: When you need to tie model performance to monetary or business-oriented outcomes.
•Multi-model type support: When working with binary classification, multiclass, or regression models.
•Production model reliability: When maintaining model reliability and performance in real-world deployments is critical.
Watch out
⚠Reference dataset requirements: Requires stable reference datasets that meet evaluation metrics; common mistake is using training data as reference.
⚠False alarm potential: Can overwhelm teams with false alarms if not properly configured, though focuses on meaningful alerts.
⚠Chunk size sensitivity: Requires careful chunk size configuration - too small chunks lead to unreliable statistical results.
⚠Univariate detection limitations: May miss complex system changes when monitoring individual variables.
⚠Drift-performance misalignment: Not every data drift affects model performance, and performance degradation can result from other causes.
⚠Statistical sensitivity: Drift detection methods can be overly sensitive and require careful configuration.
⚠Multivariate complexity: Detecting multivariate drift is more complex than single variable monitoring.
⚠Outlier sensitivity: May be sensitive to extreme values leading to false alarms or missed detections.