MapR Releases New Ecosystem Pack Optimized for Apache Spark

MapR Ecosystem Pack 3.0 provides enhanced integrations with Spark 2.1, as well as analytics with Hive 2.1 and business intelligence with Drill 1.10.

MapR Technologies, Inc., a converged data platform provider, has announced the release of the MapR Ecosystem Pack (MEP) program.

MEP is made up of a collection of open source ecosystem products that allow big data apps running on the MapR Converged Data Platform to have inter-project compatibility. New features of MEP Version 3.0 include new Spark connectors for MapR-DB and HBase, integration with Apache Drill, a faster version of Hive and improved security for Spark.

“The adoption of Spark and Drill continues to advance at a fast pace with enterprises worldwide,” said Will Ochandarena, senior director, product management, MapR Technologies. “With a regular cadence of ecosystem updates that make it easier to adopt for production use, our customers immediately benefit from rapid open source innovation with the reliability, scale and performance of the Converged Data Platform.”

Webinar Safeguarding Industrial Operations in the Digital Era

According to the company, other key features of the new release include:

Apache Spark 2.1.0
The Spark 2.1 release focuses on improvements in enterprise-ready stability and security including:

Scalable partition handling
Data Type APIs graduate to “stable”
More than 1200 fixes on the Spark 2.X line
Provides for secure connections using MapR-SASL in addition to Kerberos for inbound client connections to the Spark Thrift server and Spark connections to Hive Metastore
Support for impersonation on SELECT statements

Native Spark Connector for MapR-DB JSON
The Native Spark Connector for MapR-DB JSON makes it easier to build real-time or batch pipelines between data and MapR-DB while leveraging Spark or Spark Streaming within the pipeline, MapR stated. Designed to be highly efficient and simplify code development, the Native Spark Connector includes:

Two new APIs that allow you to load data from a MapR-DB JSON table to a Spark RDD or save a Spark RDD to a MapR-DB JSON table
A custom data partitioner for better performance
Data locality of MapR-DB to launch Spark executors when it reads data