Composable data systems: The future of intelligent data management

In a diverse and dynamic technology landscape, how can companies create a more intelligent approach to data management? Composable data systems based on open standards may be the next big thing for infrastructure modernization.

Organizations are seeking new ways to build out today's modern data stacks, which have become increasingly diverse. Recent research of 105 joint Databricks Inc. and Snowflake Inc. customers, conducted in partnership with Enterprise Technology Research, revealed two key trends. More than a third of respondents said they use at least one additional modern data platform other than Databricks or Snowflake. And half say they continue to rely on on-premises or hybrid cloud platforms. These findings highlight the need for multi-platform approaches when creating the modern data stack.

Big data frameworks typically already include storage and compute layers, but some companies are pushing composability further by separating the application programming interface layer, according to Josh Patterson, co-founder and chief executive officer of Voltron Data Inc.

"Composability is really about freedom -- freedom to take your code and run it across a myriad of different engines but also have your data use different engines as well," Patterson added.

Patterson and Rodrigo Aramburu, co-founder and field chief technology officer of Voltron Data, spoke with theCUBE Research's Rob Strechay, principal analyst, and George Gilbert, senior analyst, during an AnalystANGLE segment on theCUBE, SiliconANGLE Media's livestreaming studio. They discussed how data platforms are being reshaped by the growing adoption of composable architectures, open standards and leading-edge execution engines.

Even companies such as Snowflake and Databricks are evolving toward more composable, open standards, according to Aramburu. Databricks, for instance, was an early evangelist for open-source Apache Arrow API as the de facto standard for tabular data representation.

"This really big movement allows companies with all these vendor products to choose the right tools for the right job," he said.

The complexity of today's data landscape, with its proliferation of data products and apps, requires a more modular data stack, according to Aramburu. To manage multiple engines, many companies have built hard-to-maintain abstraction layers with their own domain-specific languages inside the organization.

"A project like Ibis really takes [complexity] out of the hands of the independent corporate company and puts it [into] an open-source community that allows everyone to really benefit off of that labor," Aramburu said.

Companies are starting to use APIs (such as Apache Iceberg) with both Snowflake and Databricks and standardizing a common data lake across both of them. With the standardization of APIs, organizations can generate structured query language across different systems.

Along with standardized APIs, accelerated hardware is essential for modern data platforms, particularly for artificial intelligence, according to Patterson. Training large language models requires immense graphics processing unit power, which directly impacts energy consumption. Theseus, a distributed query engine developed by Voltron Data, uses GPUs to process large data volumes with less energy.

"With our current architecture using A100s ... [Theseus] is able to do really large-scale data analytics for about 80% less power," Patterson said.

Modular, interoperable and composable data systems lower the barrier to entry for adopting these AI-related technologies, according to Patterson. Another benefit is that people can use Theseus without having to change their APIs or data formats, so they can achieve faster performance with fewer servers.

"[Users] can actually shrink their data center footprint and ... save energy, or they can transfer that energy that they were using for big data into AI," Patterson added.

With composable data systems -- in addition to separate compute and data layers -- it can also have a separate computing storage layer, which enables scalability, according to Patterson. With a decomposed execution engine, multiple APIs can be supported and multiple engines can then access the data. Because everything is running on accelerated hardware, companies can see better price performance and energy performance, which opens up new possibilities at the data management level.

"It makes it possible [for organizations] to just start building domain-specific data systems that are otherwise prohibitively expensive to build," Patterson said.

With faster layers from the ground up and better networking, storage and data management, it is possible to achieve the same performance levels as the compute engine, Patterson noted. Theseus is an example of that level of performance.

"It acts as a query engine that is meant to be [original equipment manufactured] by others so they can build these domain-specific applications on top of it where you can have a much smaller footprint, faster, [and with] less energy, and you can go after business use cases that were otherwise prohibitively expensive," he added.

As data analytics improve with products such as Voltron's Theseus query engine, networking will become a lot more important, and companies will start to see higher and faster storage, Patterson predicted. High-speed networking and faster storage will also pave the way for both AI and data analytics and shrink big data problems into a smaller footprint.

"Where there is denser storage, [you have] faster storage, with more throughput," Patterson said. "I actually see a convergence of AI and big data."

Here's theCUBE's complete AnalystANGLE with Josh Patterson and Rodrigo Aramburu:

Vivid Headlines

Composable data systems: The future of intelligent data management systems - SiliconANGLE

POPULAR CATEGORY

entertainment

discovery

multipurpose

athletics