Skip to content

Search the site

Welcome to the "truly portable" future of data architecture

"With the recent incubation of Apache Polaris, an open-source lakehouse catalog implementation for tracking Apache Iceberg tables, we are moving toward a world where..."

apache polaris and apache iceberg, alex merced article the stack

Open standards are rapidly becoming the foundation for scalable business value, driving innovation, momentum and action. With the recent incubation of Apache Polaris, an open-source lakehouse catalog implementation for tracking Apache Iceberg tables, we are moving toward a world where data and its governance are truly portable, writes Alex Merced, Senior Tech Evangelist at Dremio. This means you can use a wide range of data tools without the need to duplicate data and compromise governance.

For years, enterprises relied on proprietary data warehouses like Teradata and Oracle, which, despite their robust performance, created costly vendor lock-ins that constrained innovation and flexibility. As such, moving data or integrating different technologies was not only cumbersome but also costly.

Apache Iceberg – The Disruptor

The rise of data lakes offered a new way of storing data — in its raw form on inexpensive storage. However, data lakes struggled to match traditional data warehouses' performance and management capabilities.

Enter Apache Iceberg, an open table format that enables data warehouse-like tables with all the same ACID (atomicity, consistency, isolation, durability) guarantees that traditional data warehouses offer. This gives you the performance of data warehouses with the flexibility and lower price point of a data lake – hence, a data lakehouse.

Apache Iceberg’s unique ability to provide features like time travel and schema evolution—once exclusive to expensive, proprietary data warehouses—without locking companies into a single vendor's ecosystem has set it apart. As companies increasingly realise the importance of controlling their data independently, Iceberg’s open-source nature means you can integrate it into your existing data infrastructure without being locked into a particular technology stack. It’s about embracing freedom and flexibility.

Lakehouse Catalogs Enter Stage Left

Iceberg is just one item in your data lakehouse architecture, along with your storage layer (your data lake) and lakehouse catalog (a tool that tracks your tables for other tools to discover your iceberg tables). Unlike traditional metadata, or enterprise data catalogs like Collibra or Alation, which help provide context for humans to understand available data, a lakehouse catalog serves a different purpose. It acts as a directory of where table metadata exists, enabling tools to discover and use the table. Essentially, one catalog is for human data discovery, and the other for data discovery by systems.

Indeed, catalogs are becoming more than just a listing of tables for your favourite tools. They are evolving into hubs of universal governance, where you can set access rules that can be honoured by any tool to access your tables. This is tremendously valuable because, in the past, setting access roles individually for each tool led to inconsistent governance. When catalogs are the centre of your table governance, it becomes crucial for them to be built on open standards as possible to avoid any vendor lock-in at the catalog level.

As more companies adopt Apache Iceberg and open lakehouse catalogs like Apache Polaris (incubating) and Nessie, the focus will increasingly shift towards enhancing these open standards to support a variety of specialised compute engines.

The goal is clear: create an ecosystem that maximises flexibility and minimises lock-in. For businesses, this means investing in open technologies that meet today’s needs and allow for future growth and adaptation. It’s not just about keeping up with the competition; it’s about setting the stage for the next wave of data innovation.

The Bottom Line

As we move further into the age of artificial intelligence, the importance of open data architectures will only grow. AI and machine learning algorithms thrive on data. Put simply, the more data they have, and the more varied they are, the better they perform.

To provide AI and machine learning projects the data they need you need to have data architecture that is flexible and open enough to deliver that data efficiently. Data Lakehouse using tables formats like Apache Iceberg alongside open catalogs like Apache Polaris and Nessie are opening the door to this world.

The future of data is open. As companies continue to recognise the limitations of proprietary systems, they will turn to solutions like Apache Iceberg and open lakehouse catalogs to give them the control and flexibility they need. The days of being locked into a single vendor's ecosystem are numbered. The shift towards open standards is more than a trend—it’s necessary for any business looking to thrive in the digital age. The choice is clear: adapt or be left behind.

See also: Microsoft open-sources unique “Garnet” cache-store; a Redis rival?

Latest