OvalEdge connects with Dremio to support multiple data governance functions. Among them, lineage building. In this blog, we’ll explain how the process works and how to enable it.
Dremio, the unified lakehouse platform built for self-service AI and analytics, enables users to perform effortless and affordable data analysis tasks. Dremio queries data from object storage platforms, such as AWS S3, Google Cloud Storage, and Azure Data Lake Storage including on-prem storage vendors like NetApp, Pure Storage, VAST, MinIO, and many more. It also has data virtualization capabilities to support data processing from numerous other databases.
The OvalEdge/Dremio connector provides Dremio customers with comprehensive, end-to-end data governance capabilities at various stages of integration. OvalEdge is democratizing access to Dremio by making it easier for end users to adopt Dremio infrastructure.
Using the built-in OvalEdge connector, Dremio customers can crawl all of their associated files, tables, and columns, across multiple data sources and centralize the metadata in the OvalEdge data catalog.
OvalEdge utilizes the push-down capabilities of Dremio's processing power to profile customer data using its automated profiling features. Dremio serves as the processing layer, executing queries on the data lake.
Our connector enables Dremio customers to adopt OvalEdge's data quality management program using Dremio's push-down capabilities. This means all of the data quality checks can be performed in Dremio's system using Dremio's processing power, making it more cost-effective for the customer.
OvalEdge executes automated column-level lineage building for Dremio customers by crawling the source code from Dremio and parsing queries on Dremio.
Related Post: How OvalEdge and Dremio Support Data Agility
Data lineage can be tracked in three core ways: at the system, object, and column levels. For this blog, we'll focus on column-level lineage.
OvalEdge connects and displays all table columns, file columns, and report attributes in a data estate. Column-level data lineage enables users to track and document precise data points. This process is critical for compliance and impact analysis tasks.
Related Post: Data Lineage Explained with Examples
OvalEdge has developed a new connector, Dremio IceBerg, which helps end users better understand the data flow between Dremio PDS (Direct IceBerg objects) and VDS (views created on top of IceBerg objects).
Data flow between Dremio PDS to VDS
1. Navigate to the Administration module and click on Connectors.
2. Click the + icon (New Connector). The Add Connector pop-up will appear. Search for Dremio IceBerg.
Search for Dremio IceBerg
The specific connector details page will be displayed.
Fill connector details for Dremio IceBerg
3. Fill in the required details for Dremio, Dremio IceBerg requires the same details. Validate, save, and crawl the newly created IceBerg connector to access PDS.
4. Create and crawl the Dremio connector to access VDS.
5. Now, build the lineage for Dremio VDS.
Build the lineage for Dremio VDS
6. When a user initiates build lineage for Dremio views, the OvalEdge algorithm identifies the IceBerg objects from the IceBerg connector and builds the lineage.
Selecting a Dremio view and initiating build lineage
Sample view code
7. Verify table-level lineage.
Verify table-level lineage
As no direct support is available from Dremio APIs to extract column-level lineage, we enhanced our lineage algorithm to pick up columns from the view code itself and succeeded in building the lineage from PDS columns (IceBerg Columns) to VDS columns (Dremio View Columns).
Moving forward, OvalEdge will support Dremio's access controls, PII identification, and data privacy enforcement policies. OvalEdge will push down Dremio's PII detection capabilities with processing taking place on the Dremio platform.
Regarding access controls, Dremio customers will benefit from OvalEdge's access management capabilities, which are administered through OvalEdge and integrated through the connector. Finally, data retention policies, critical for regulatory compliance, can also be managed in OvalEdge.
Column-level Mapping
With its comprehensive support for Dremio, OvalEdge drives crucial governance processes that enable Dremio customers to operate securely and efficiently on the platform. The evolving partnership between Dremio and OvalEdge continues to grow, with new and enhanced features on the horizon.
Book a call with us to find out
|