Open Season On Hadoop Big Data APIs

West coast big data platform player Concurrent has introduced its Cascading 2.0 application framework. Targeting Java developers looking to use Apache Hadoop to build data processing and data management applications, Cascading 2.0 now exists as an alternative API to MapReduce.

This is an application framework that can be deployed on clusters running in the cloud or within private data centers. Cascading 2.0 is most likely to be of interest among Java programmers with a fixed eye on Hadoop — but it may also pique the interest of open source community members who share connections or association with company founder Chris Wensel, author of the Cascading open source project for data processing.

The firm boasts Razorfish and Twitter among its customer references and says that both are using Cascading to streamline data processing, data filtering, and workflow optimization for large volumes of unstructured and semi-structured data.

Cascading is also at the core of language extensions including PyCascading, Scalding, and Cascalog (open source projects sponsored by Twitter) and tools including CloudFront LogAnalyzer (developed by Amazon).

Data Scientists, Hadoop Administrators, and Application Developers

According to the company’s developer blog, the Cascading framework is designed for data scientists, Hadoop administrators, and application developers alike as they collaborate upon and then develop and deploy scalable big data applications.

“Building applications on Hadoop, despite its growing adoption in the enterprise, is notoriously difficult. We are driving the future of application development and management on Hadoop, by allowing enterprises to quickly extract meaningful information from large amounts of distributed data and better understand the business implications. We make it easy for developers to build powerful data processing applications for Hadoop, without requiring months spent learning about the intricacies of MapReduce,” said Wensel.

Key for application developers here is the opportunity to build and test applications on their desktops in the language of choice (Java, Jython, Scala, Clojure, or Jruby) with what has been described as “familiar constructs and reusable components” — this, in theory, gives them the ability to “instantly” deploy them onto clusters of hundreds of nodes.

“Microsoft is committed to compatibility with Apache Hadoop for our upcoming Hadoop-based services on Windows Server and in the Windows Azure cloud,” said Bob Baker, director and partner, channel marketing, Microsoft. “In testing, Cascading on Windows Server worked directly out of the box and we are certifying Cascading 2.0 on Windows Server to give Microsoft customers a flexible big data application development framework for Hadoop that lets them build and deploy applications for Apache Hadoop on Windows Server and Windows Azure.”