Apache Spark

Apache Spark

Apache Spark is a data processing framework for large datasets and distributed computing.

Use it when

  • You are working with big data (large datasets).
  • You would like to parallelize computation across multiple machines.
  • You want fast large-scale data processing.
  • You want a machine learning-specific API and many operators that facilitate transforming data.

Watch out

  • Requires clusters with higher RAM since it stores datasets in memory.
  • Higher infrastructure and setup costs.

Available in stages

Runtime Engine

Example stacks

Example stacks coming soon...