Organizations are heavily investing in a data governance initiative to help ensure their data delivers business value. A successful data governance initiative depends on careful planning and the right people, including business executives and data stewards.
However, using the appropriate technology and tools is also extremely important for effective and sustainable data governance. Companies sometimes get overwhelmed with the variety of solutions in the market. However, the process of buying a data governance tool can be simplified once you understand your needs. Then one can decide which features and functionality are required.
The tool should be versatile and should have the following features:
If a data governance tool has a built-in integrated data catalog, it makes data discovery a smooth process for users.
Data Discovery is crucial for BI and Analytics. Most data engineers and data scientists spend 20% of their time in finding the data for their specific business problem. This process can take weeks or even months to get to the relevant data.
Data engineers/scientists should have a Google/Amazon type of solution where they can search for the data and understand it as well.
Data quality is another significant driving force in most activities of data governance.
For enterprises to attain better data quality, some software solutions are – data mining tools, data editors, data differencing utilities, data link tools, workflow, project management system, and version control.
Data cleansing or data scrubbing is also a part of the data quality initiative. It correlates, identifies, and removes the duplicate occurrences of the same data points.
If participants get rewarded for their data and tribal knowledge, they are more likely to share it. If they are recognized for their data quality they are more likely to maintain it. A data catalog which is integrated with the data governance tool lets those in the know easily share their data and knowledge and get rewarded for it.
Otherwise all data owners want to hold their data tightly. Most times, application owners, by default, become data owners because they are the only ones that have access and knowledge about it. Some organizations make the CDO (Chief Data Officer) as the owner of the data. However, his/her office most likely does not have adequate information about the data – this becomes another problem. Most people term this as corporate politics. Some organizations want to define ownership at the functional level, such as the VP of sales owning customer’s data while the Chief Procurement Officer owns a supplier’s data.
Let’s take a hypothetical scenario of the banking industry. There are two teams which aggregate customers’ transactions in their respective data warehouses – the risk analytics team and the customer insights team. Now let’s assume a third business unit (e.g., compliance) is also looking for the customers’ aggregate transactions. Now in the real corporate world the two teams would hesitate in directing the third team to the leading data source. As it would create an unnecessary workload on their data warehouse.
To solve these kind of issues, any internal data exchanges within the company should be priced. The data governance solution should offer a mechanism for this whole process – neatly documenting data sets and its owner and price of that data. The better the quality of data the higher the price.
Now let’s assume that there is a price tag for the aggregate customer’s transaction. It is coming from the compliance team’s budget. In this scenario, both business units would be willing to go the extra mile to provide access to their customer’s aggregate transaction. It is one of the easiest ways to curb corporate politics into real data sharing culture. It ultimately creates more value for the entire company.
The data governance tool should come in handy to maintain ownership and carry out data stewardship.
Data ownership is not about holding the data but about providing it’s access to other business units so that they can also benefit from it. Data stewardship is about managing the data quality in terms of accessibility, accuracy, completeness, consistency, and updating.
Teams of stewards are typically formed to carry out data security and usage policies as determined through organization data governance initiatives. In a more simplified way, they are established to protect data governance implementation. Some of the team members may include business analysts, database administrators, and business personnel that are familiar with some specific areas of data within the enterprise.
A business glossary is an essential aspect of data governance, hence the tool should be able to support the building of one.
When it comes to running a business, leaders needs to understand what’s going on in each department, be it sales or finance. How can this be possible when, in many cases, the marketing or IT unit speak a different language? Alternatively, in the case of acquisition and mergers, where there is no uniformity? These situations are where the importance of a business glossary sets in.
A business glossary helps to solve these problems by creating a common vocabulary across an entire organization. It additionally ensures the consistency of these terms by synthesizing all of the information of the organization’s data assets through an array of data dictionaries. It then rearranges it into a more understandable and straightforward format.
To create a useful business glossary, organizations should choose a data governance tool that can connect data quality, data lineage, and data definitions.
Data lineage is about understanding how and where the data has originated and its processing logic and destination. It gives visibility and also helps in tracing errors back to the root cause in a typical BI process. The data lineage is vital to create trust in the data.
Usually, we depict lineage in graphical format, so any person with data acumen can easily understand.
The solution should not only show the lineage graphically but should be able to build the lineage automatically. As creating the data lineage manually is still a time-consuming process. Some of the techniques used to build automatically are:
OvalEdge is an easy to use and versatile data governance tool and a data catalog. Its open and agile architecture lets companies customize the tool as per their business needs.
It indexes your metadata by crawling various databases and storage systems. You can organize data using tags, usage statistics, user names, and other markers – so it’s easily retrievable with everyday language.
Collibra offers an enterprise-oriented, data governance platform known to automate data operations and keeping cross-functional teams on the same page.
It offers natural language search, automation of data governance and data stewardship.
Informatica allows business and IT to collaborate with ease and provides a true enterprise data governance solution that can be used on-premise and in the cloud with traditional and big data use cases to provide flexibility. Informatica breaks down the silos and engages IT, security, and business teams to ensure the data meets compliance and is high quality.
IBM is another tool which comes with an integrated data catalog. IBM also assesses the value of data and helps identify meaningful data while securing critical data and complying with GDPR.