BOOK A DEMO
How OvalEdge Supports Column-Level Lineage With Dremio

How OvalEdge Supports Column-Level Lineage With Dremio

OvalEdge connects with Dremio to support multiple data governance functions. Among them, lineage building. In this blog, we’ll explain how the process works and how to enable it.

Understanding the Dremio/OvalEdge Connector

Dremio, the unified lakehouse platform built for self-service AI and analytics, enables users to perform effortless and affordable data analysis tasks. Dremio queries data from object storage platforms, such as AWS S3, Google Cloud Storage, and Azure Data Lake Storage including on-prem storage vendors like NetApp, Pure Storage, VAST, MinIO, and many more. It also has data virtualization capabilities to support data processing from numerous other databases.

The OvalEdge/Dremio connector provides Dremio customers with comprehensive, end-to-end data governance capabilities at various stages of integration. OvalEdge is democratizing access to Dremio by making it easier for end users to adopt Dremio infrastructure.

Crawling

Using the built-in OvalEdge connector, Dremio customers can crawl all of their associated files, tables, and columns, across multiple data sources and centralize the metadata in the OvalEdge data catalog.

Profiling

OvalEdge utilizes the push-down capabilities of Dremio's processing power to profile customer data using its automated profiling features. Dremio serves as the processing layer, executing queries on the data lake.

Data Quality Rules

Our connector enables Dremio customers to adopt OvalEdge's data quality management program using Dremio's push-down capabilities. This means all of the data quality checks can be performed in Dremio's system using Dremio's processing power, making it more cost-effective for the customer. 

Data Lineage

OvalEdge executes automated column-level lineage building for Dremio customers by crawling the source code from Dremio and parsing queries on Dremio.

Related Post:  How OvalEdge and Dremio Support Data Agility

What is Column-Level Lineage? 

Data lineage can be tracked in three core ways: at the system, object, and column levels. For this blog, we'll focus on column-level lineage.

OvalEdge connects and displays all table columns, file columns, and report attributes in a data estate. Column-level data lineage enables users to track and document precise data points. This process is critical for compliance and impact analysis tasks. 

Related Post: Data Lineage Explained with Examples

Tracking Lineage From Dremio PDS to VDS

OvalEdge has developed a new connector, Dremio IceBerg, which helps end users better understand the data flow between Dremio PDS (Direct IceBerg objects) and VDS (views created on top of IceBerg objects). 

Lineage From Dremio PDS to VDS

Data flow between Dremio PDS  to VDS

Framwork for Lineage From Dremio PDS to VDS

Step-by-Step Guide to Column-Level Lineage

1. Navigate to the Administration module and click on Connectors.

2. Click the + icon (New Connector). The Add Connector pop-up will appear. Search for Dremio IceBerg.

Search for Iceberg connector in OvalEdge

Search for Dremio IceBerg

The specific connector details page will be displayed.

Configure the Iceberg connector

Fill connector details for Dremio IceBerg

3. Fill in the required details for Dremio, Dremio IceBerg requires the same details. Validate, save, and crawl the newly created IceBerg connector to access PDS.

4. Create and crawl the Dremio connector to access VDS.

5. Now, build the lineage for Dremio VDS. Build lineage for Dremio

Build the lineage for Dremio VDS

6. When a user initiates build lineage for Dremio views, the OvalEdge algorithm identifies the IceBerg objects from the IceBerg connector and builds the lineage.

 Selecting a Dremio view and initiating build lineage

Selecting a Dremio view and initiating build lineage

Sample Code

Sample view code

7. Verify table-level lineage.

Verify Column Level Lineage

Verify table-level lineage

As no direct support is available from Dremio APIs to extract column-level lineage, we enhanced our lineage algorithm to pick up columns from the view code itself and succeeded in building the lineage from PDS columns (IceBerg Columns) to VDS columns (Dremio View Columns).

What's Coming Next?

Moving forward, OvalEdge will support Dremio's access controls, PII identification, and data privacy enforcement policies. OvalEdge will push down Dremio's PII detection capabilities with processing taking place on the Dremio platform.

Regarding access controls, Dremio customers will benefit from OvalEdge's access management capabilities, which are administered through OvalEdge and integrated through the connector. Finally, data retention policies, critical for regulatory compliance, can also be managed in OvalEdge.

Conclusion

Column mapping

Column-level Mapping 

With its comprehensive support for Dremio, OvalEdge drives crucial governance processes that enable Dremio customers to operate securely and efficiently on the platform. The evolving partnership between Dremio and OvalEdge continues to grow, with new and enhanced features on the horizon.

Book a call with us to find out

  1. How OvalEdge supports Dremio customers in executing column-level lineage.
  2. Why integrated data governance tools enhance the Dremio platform.
  3. How OvalEdge and Dremio continue to build innovative solutions for their joint customers.