Databricks announced at the Data & AI Summit, organized by the company, new contributions to open-source projects Delta Lake, MLflow, and Apache Spark.
With the launch of Delta Lake 2.0, Databricks will be passing it over the Linux Foundation and open-sourcing all the APIs associated with the release. Several competitors had complained about Delta Lake’s status, whether it was open source or proprietary, and Databricks says this move should allay these complaints.
SEE ALSO: Is the Data Cloud Alliance for Data Openness or for Google?
Delta Lake has 6,400 members with contributing developers from over 90 organizations. Contributor strength increased by 60 percent over the past year, and average lines of code commit were up 900 percent year-on-year.
MLflow 2.0 offers developers with faster execution at scale and less time to production through standardization, with production ready templates for data scientists to access without the need for production engineers.
The introduction of Spark Connect for Apache Spark aims to provide better stability and allow for remote connectivity with Spark from any device. Databricks also announced Project Lightspeed, the next generation of Spark streaming engine.
“From the beginning, Databricks has been committed to open standards and the open source community. We have created, contributed to, fostered the growth of, and donated some of the most impactful innovations in modern open source technology,” said Ali Ghodsi, co-Founder and CEO of Databricks. “Open data lakehouses are quickly becoming the standard for how the most innovative companies handle their data and AI. Delta Lake, MLflow and Spark are all core to this architectural transformation, and we’re proud to do our part in accelerating their innovation and adoption.”
The Delta Lake 2.0 Release Candidate is expected to be fully released later in the year.