A step by step guide to Metadata Management
“For me, context is the key – from that comes the understanding of everything.”
– Kenneth Noland, American painter
Need for Metadata Management
Effective metadata management in an enterprise provides right context and description to data. Also, to understand and trust data, we need to understand its background – how the data originated, and how we have used it till now. Further, we need to know, what are the decisions made based on this data and how can we leverage it for a better competitive advantage.
For success in this new digital age, organizations need to create meticulous data products. Data products are not just reports or analytics but are a comprehensive solution. They present an analytical, comparative, insightful information to the right people at the right time and on the proper device. Without a complete metadata management solution, its difficult to create these data products.
With a growing amount of data, and explosion of big data technologies, CDOs (Chief Data Officers) must look at managing their data more efficiently through Metadata. As per the latest estimate, the metadata management industry would be about 7.85 billion by 2022 and would grow by 27% year after year.
What is Metadata?
Metadata is “data [information] that provides information about other data. This understanding comes from setting the data in context, allowing it to be reused and retrieved for multiple business uses and times.”
According to Indian University, ” metadata is data about data. It is descriptive information about a particular data set, object, or resource, including how it is formatted, and when and by whom it was collected. Although metadata most commonly refers to web resources, it can also be about physical or electronic resources. It may be created automatically using software or entered by hand.”
Some typical metadata elements for structured or unstructured data are:
– Title, description, and abstract
– Tags and categories
– when was it created and by whom
– Who last modified and when
– Who can access or update it
Other than that, we categorize metadata in an enterprise as:
Metadata for Structured Data
It includes – column structure of a database table, header rows of a CSV file, column definition from JSON, XML and Avro files.
It includes – security levels, privacy levels, and acronym levels.
Both IT and business need quality metadata to understand the info on hand. Without useful metadata, the organization is at risk for making the wrong decisions based on faulty data.
What is Metadata Management?
The library catalog is a classic and one of the oldest example of metadata management. To find a book one used to look for the book author or topic in the library catalog and search for the desired book. Next came Yahoo search engine, where it indexed all the metadata from various websites. Finally, the revolution happened with Google when it devised metadata by processing actual data. It gave the user an in-depth search experience like never before. It enabled the user to search within the desired context.
Enterprises metadata management is, however, still either at library catalog level (done manually) or at Yahoo level (done by using various metadata management products). Ideal metadata management program should be data-driven and derive from the context. Providing answers to all common questions like who, what, when, where & why about data is Metadata Management.
How Should We Do Effective Metadata Management?
Here are a few steps to ensure it:
Layout Policies & Procedures
Effective metadata management starts with the policies, procedures, tools and human curation of metadata. Employees are the center of metadata management. A company has to have tools for smooth interaction between employees about data and metadata. The following should be the roles for effective metadata management.
Role of CDO & Executives
Define rules for metadata management, and use some tools to enforce them. These rules should encompass various security aspects and metadata change methodology.
Role of an analyst and other data citizens
Analysts should follow the rules of metadata management. Also if they ask profound questions about data and metadata, these questions and comments can be saved. Later, this can benefit other analysts when they are researching the same data.
There should be robust tools to provide access to metadata and should enforce all the rules defined by executives. Some of the features these tools can provide are:
1. Sample Data
Here we turn the tables on data where we generate sample data to give data context to metadata. Thus we enrich our understanding of metadata.
2. Data Stats (Profiles)
Stats provide answers to some common questions like count, distinct values, top used values, null count, maximum and minimum values.
Lineage helps you understand the origination of data, and how it traveled and what are various transformation happened before it reaches to you. Further, it also enables you to realize where else this data is being used.
4. Previous Communication
Communication in the key to effective metadata management, so it’s important to tie all the conversation related to metadata in one place. Also, all the comments and remarks regarding that metadata should also be available here.
5. Relationship with Other Metadata
For MDM tool It is crucial to find a relationship amongst data so that data search becomes possible. There are various ways to achieve this – manual, human curation, automatically through metadata semantic matching or automatically through data matching.
Various metadata management tools
As per Gartner and our research, these are multiple metadata management tools available in the market:
OvalEdge is a comprehensive metadata management tool along with ETL. As per its customers, it provides the state of art UI which makes collaboration efficient. It has a patent pending relationship algorithm which finds all the relationships amongst data. To facilitate compliance, it has a provision to predefine rules and procedures at the very core.
sIts metadata management solution is the Alation Data Catalog. Despite being small, they have ample brand recognition in the market and have gained some traction with their data catalog. But their core metadata management functionalities such as data lineage and impact analysis are very limited.
Collibra has Collibra Connect for metadata management, with a use case of data governance use case and support of regulatory requirements. But Gartner customers have given a wide range of mixed reviews to Collibra for impact analysis, lineage and semantic frameworks.
Its metadata management solutions are the Metadata Manager, Business Glossary, Axon and Enterprise Information Catalog. But the challenge in front of this company is to quickly demonstrate the ability to bring the acquisition of Diaku’s Axon into a set of metadata management solutions functioning as a seamlessly integrated solution.
Some companies are still using spreadsheets for metadata management.