Where Does Big Data Go To Get Data-Intensive?

Concurrent Inc. has gone to market this month with its Cascading 2.0 framework for Java-based “big data” apps running on Apache Hadoop. As an alternative API to the MapReduce programming model for large data sets, Concurrent claims to have a “growing” ecosystem of Cascading developers and partners.

Cascading is intended for use by Java developers who are building data processing and data management applications on Apache Hadoop that can be deployed on clusters running in the cloud or within private data centers. Cascading is used to streamline data processing, data filtering, and workflow optimization for large volumes of unstructured and semi-structured data.

Speaking directly to Dr. Dobb’s, Florian Leibert (who is a software engineer and developer at vacation accommodation company Airbnb) said that his company chose to use Cascading on Amazon’s Elastic MapReduce service for the heavy-duty infrastructure work of filtering and combining multiple large data files and reconstructing corrupted files. “The data is used by analysts to determine the factors driving room bookings as well as user drop-offs, to better understand user behavior and business dynamics,” he said.

Cascading is also at the core of language extensions including PyCascading, Scalding, and Cascalog (open source projects sponsored by Twitter) and tools including CloudFront LogAnalyzer (developed by Amazon).

The company’s promise to application developers is an opportunity to build and test applications on their desktops in the language of choice (Java, Jython, Scala, Clojure, or Jruby) with familiar constructs and reusable components and then “instantly deploy them” onto clusters of 100s of nodes.

Concurrent also says that Hadoop administrators can now seamlessly move and scale application deployments from development to test and production clusters regardless of cluster location or data size.

“We make it easy for developers to build powerful data processing applications for Hadoop, without requiring months spent learning about the intricacies of MapReduce,” said Chris Wensel, CEO and founder of Concurrent Inc.