Open catalog for Apache Iceberg helps organizations gain control and flexibility over their enterprise data
Snowflake (NYSE: SNOW), the AI Data Cloud company, today announced at its annual user conference, Snowflake Summit 2024, Polaris Catalog, a vendor-neutral, open catalog implementation for Apache Iceberg — the open standard of choice for implementing data lakehouses, data lakes, and other modern architectures.Polaris Catalog will be open sourced in the next 90 days to provide enterprises and the entire Iceberg community with new levels of choice, flexibility, and control over their data, with full enterprise security and Apache Iceberg interoperability with Amazon Web Services (AWS), Confluent, Dremio, Google Cloud, Microsoft Azure, Salesforce, and more.
“Organizations want open storage and interoperable query engines without lock-in. Now, with the support of industry leaders, we are further simplifying how any organization can easily access their data across diverse systems with increased flexibility and control,” said Christian Kleinerman, EVP of Product, Snowflake. “Polaris Catalog extends Snowflake’s commitment to Apache Iceberg as the open standard of choice, and signals the intent from industry leaders in enabling customers and the wider Iceberg community to harness their data through an open and neutral approach, empowering cross-engine interoperability on that data.”
Polaris Catalog Introduces New Levels of Interoperability for Apache Iceberg
Apache Iceberg emerged from incubation to a top-level Apache Software Foundation project in May 2020, and has since surged in popularity to become a leading open source data table format. With Polaris Catalog, users now gain a single, centralized place for any engine to find and access an organization’s Iceberg tables with full, open interoperability. Polaris Catalog relies on Iceberg’s open source REST protocol, which provides an open standard for users to access and retrieve data from any engine that supports the Iceberg Rest API, including Apache Flink, Apache Spark, Dremio, Python, Trino, and more.
Organizations can get started running Polaris Catalog hosted in Snowflake’s AI Data Cloud within minutes (Snowflake-hosted in public preview soon), or self-host it in their own infrastructure using containers such as Docker or Kubernetes. Since Polaris Catalog’s backend implementation will be open source, organizations can freely swap the hosting infrastructure while eliminating vendor lock-in.
Leading Organizations Join the Polaris Catalog Community
A part of what makes Apache Iceberg so powerful is its vibrant community of diverse adopters, contributors, and commercial offerings. To ensure Polaris Catalog can meet the evolving needs of the wider community and landscape, Snowflake is collaborating with the Iceberg ecosystem to drive the project forward.
This comes on the heels of Snowflake and Microsoft’s recent partnership expansion, which creates more seamless interoperability between Snowflake and Fabric. This interoperability is possible because of Snowflake’s and Microsoft’s commitment to supporting the industry’s leading open standards for storage formats – Apache Iceberg and Apache Parquet. Now with Polaris Catalog, both organizations continue to partner with a joint mission of enabling all users to harness their enterprise data, regardless of where it is stored, to create AI-powered applications at scale.
“From day one at Microsoft, we’ve been focused on empowering every user on the planet to achieve more, and this starts with a strong data foundation. Through our support and contributions to open data standards, including Delta Parquet, Apache Iceberg, and Apache XTable, we’re furthering this mission by enabling organizations with a new level of open data interoperability, so they can do more with their data,” said Arun Ulagaratchagan, Corporate Vice President, Azure Data, Microsoft. “Snowflake continues to serve as a strategic partner of ours, and we’re excited by their willingness to work with the Iceberg community on an open catalog to empower our joint customers and the wider open-source community with more flexibility and control over their open Iceberg data.”
With Snowflake’s expertise, serving as the data foundation powering thousands of global customers’ cross-cloud data and AI workloads, and the rapidly growing Iceberg community’s innovation and open source skill sets, they will continue to simplify the interoperability of data across engines together.
Snowflake Continues to Extend Open Source Commitments
Polaris Catalogfollows a slew of recent open source commitments from Snowflake, including its investments in Iceberg Tables, which allow Snowflake customers to work with data in their own storage in the Apache Iceberg format, while still benefiting from Snowflake’s ease of use, performance, and unified governance.
Snowflake also recently announced Snowflake Arctic, one of the most open, enterprise-grade large language models (LLM) on the market. As part of Snowflake’s commitment to open source, it not only released Arctic’s weights under an Apache 2.0 license, but also extensive details of how it was trained through a series of cookbooks. In addition, Snowflake supports the Streamlit open source community, which now has over 275K monthly active developers and over 6 million monthly application views. Since Snowflake acquired Streamlit in March 2022, the open source community has continued to flourish, growing over 500 percent in the past two years, as Snowflake and Streamlit continue to invest in cutting-edge open source advancements for developers.
Comments On the News from Data Platform Experts
“AWS is committed to working with partners, such as Snowflake, on open source solutions that can accelerate choice for customers,” said Chris Grusz, Managing Director, Technology Partnerships, Amazon Web Services. “We’re pleased to work with Snowflake to continue to make Apache Iceberg stay interoperable across our engines.”
“At Confluent, we’re on a mission to break down data silos to help organizations power their businesses with more real-time insights,” said Shaun Clowes, Chief Product Officer, Confluent. “With Tableflow on Confluent Cloud, organizations will be able to turn data streams from across the business into Apache Iceberg tables with one click. Together, Snowflake’s Polaris Catalog and Tableflow enable data teams to easily access these tables for critical application development and downstream analytics.”
”Customers want thriving open ecosystems and to own their storage, data and metadata. They don’t want to be locked-in,” said Tomer Shiran, Founder, Dremio. “We’re committed to supporting open standards, such as Apache Iceberg and the open catalogs Project Nessie and Polaris Catalog. These open technologies will provide the ecosystem interoperability and choice that customers deserve.”
“We are actively involved in the open source community, particularly across the data space,” said Neema Raphael, Chief Data Officer and Head of Data Engineering at Goldman Sachs. “We open sourced our data platform, Legend, which enables us to work with open source table formats like Iceberg that will provide more interoperability across query engines like Snowflake. The launch of an open source Iceberg Catalog like Polaris is an exciting next step in furthering that commitment to interoperability.”
“Apache Iceberg’s popularity has established an open storage standard that simplifies zero copy data access for organizations across their ecosystem,” said Raveendrnathan Loganathan, Executive Vice President of Software Engineering at Salesforce. “Our Salesforce Data Cloud has been built from the ground up with Open Standards Apache Parquet for files & Apache Iceberg for tables, fostering zero copy innovations to unlock trapped data, derive insights, and orchestrate actions across the Customer 360. We’re thrilled to have Snowflake as a member of our Zero Copy Partner Network, and we’re excited to see how this new open catalog standard will further zero copy access in the enterprise.”
Learn More:
- Learn more about Polaris Catalog by registering for this webinar.
- Dive into Polaris Catalog’s unique differentiation in this blog post.
- See how you can get started with Polaris Catalog hosted on Snowflake’s AI Data Cloud within minutes, here.
- Stay on top of the latest news and announcements from Snowflake on LinkedIn and Twitter / X.
Forward Looking Statements
This press release contains express and implied forward-looking statements, including statements regarding (i) Snowflake’s business strategy, (ii) Snowflake’s products, services, and technology offerings, including those that are under development or not generally available, (iii) market growth, trends, and competitive considerations, and (iv) the integration, interoperability, and availability of Snowflake’s products with and on third-party platforms. These forward-looking statements are subject to a number of risks, uncertainties and assumptions, including those described under the heading “Risk Factors” and elsewhere in the Quarterly Reports on Form 10-Q and the Annual Reports on Form 10-K that Snowflake files with the Securities and Exchange Commission. In light of these risks, uncertainties, and assumptions, actual results could differ materially and adversely from those anticipated or implied in the forward-looking statements. As a result, you should not rely on any forward-looking statements as predictions of future events.
© 2024 Snowflake Inc. All rights reserved. Snowflake, the Snowflake logo, and all other Snowflake product, feature and service names mentioned herein are registered trademarks or trademarks of Snowflake Inc. in the United States and other countries. All other brand names or logos mentioned or used herein are for identification purposes only and may be the trademarks of their respective holder(s). Snowflake may not be associated with, or be sponsored or endorsed by, any such holder(s).
Explore AITechPark for the latest advancements in AI, IOT, Cybersecurity, AITech News, and insightful updates from industry experts!