Luigi

Luigi

Luigi is a lightweight Python workflow scheduler built at Spotify that helps you build complex pipelines of batch jobs with dependency resolution and workflow management.

Use it when

  • Complex batch processing: When you need to run thousands of tasks organized in complex dependency graphs.
  • Hadoop ecosystem integration: When working extensively with Hadoop, Hive, and Pig jobs.
  • File system operations: When you need atomic file system operations for HDFS and local files.
  • Simple dependency management: When you need lightweight workflow management with visual task tracking.
  • Python-centric workflows: When your entire workflow can be defined in Python rather than XML.
  • Traditional batch jobs: When focusing on sizable chunks of work rather than real-time processing.

Watch out

  • Spotify has moved on: Spotify itself no longer actively maintains Luigi and has migrated to Flyte for better visibility and automation.
  • Architectural constraints: Without DAG support, developing highly complex pipelines with many dependencies and branches is extremely difficult.
  • Limited scheduling: Lacks sophisticated scheduling capabilities and cloud-native features.
  • Scalability limits: Not meant to scale beyond tens of thousands of jobs.
  • Language limitations: Only supports Python, while many organizations need Java support.
  • Real-time limitations: Focus is on batch processing, so it's less useful for near real-time pipelines.
  • Basic monitoring: Limited monitoring and observability tools compared to modern alternatives.

Available in stages

Pipeline Orchestration

Installation

pip install luigi

Example stacks

Example stacks coming soon...