Spark over IBM Storage Scale

Apache Spark is a fast and general engine for large scale data processing.

Spark supports multiple ways (such as hdfs://, file:///) for applications running over Spark to access the data in distributed file systems.