Just the phrase “data governance” sends many running scared. People have long rebelled against being governed. Yet data-driven organizations are learning: data governance enforces quality data, and the data will not govern itself! How can leaders win support?
Why Do You Need a Data Catalog?
I was recently asked, “Which comes first, the data governance program or the data catalog?” This is very similar to the chicken and the egg question. Not to get too philosophical, but the answer depends on your view of evolution: the evolution of governing data within your organization. Certainly, by inventorying the data assets that an organization has, they can decide what needs to be governed and to what extent. The data catalog brings efficiency to governance process.
Which comes first, the data governance program or the data catalog?
A data catalog is the tool of choice for organizations that need to build confidence in the data that is most critical. The data catalog supplies ample information about the data to assist in data search & discovery, data stewardship, data analytics, and to deliver the backbone of a data governance program. As such, some people may say that the data catalog needs to be in place first in order to implement a data governance program.
If the purpose of your data governance program is to embrace data-driven decision-making, a data catalog will enable you to be successful. A data catalog can help with consistent data quality standards or strategically managing data as an asset to achieve accurate, trusted, and secure data that delivers business intelligence (two examples from recent clients). The words that resonate throughout these statements are consistent, standard, strategic, and trusted.
Consistent and trusted data comes from an improved understanding of the data. Improvements in understanding come from information that is made available about the data. A data catalog is the place to collect, maintain, and make metadata available to the people of your organization that must trust the data to maximize its value and activate your data governance program.
Beyond the reasons just shared, Gartner describes a data catalog as a tool that is used to “maintain an inventory of data assets through the discovery, description, and organization of datasets. The catalog provides context to enable data analysts, data scientists, data stewards, and other data consumers to find and understand a relevant dataset for the purpose of extracting business value.”
Since the metadata in the data catalog needs to itself be governed, there are other organizations that believe that the practice of data governance must already be in place in order to fully implement a data catalog. I state often that “data and metadata will not govern themselves.”
The Data Catalog Requires Governance of Metadata
The implementation of a data catalog requires that specific people within your organization are held formally accountable for the metadata. They must be responsible for defining the metadata that will be collected in the tool, producing the metadata that will be made available to people within your organization, and using the metadata that is available to assist them to complete their job functions.
Starting quickly with a data catalog requires that the metadata stewards be recognized and activated. Starting quickly with a data catalog to support a data governance program requires that the metadata entered into the tool is validated, kept up to date, and made available. Without meeting these two requirements, the likelihood of sustainable success with your data governance program, and potentially the data catalog, are immediately reduced and the risk of failure is increased.
[Note: While I firmly state that “Everybody is a Data Steward,” everybody is NOT a metadata steward. Metadata stewards are specific people with metadata responsibilities in the organization.]
While many organizations see the benefits that a data catalog bring to their analytical capabilities, some organizations will tell you that it is challenging to get people to learn and use a data catalog for the first time. But it is significantly more difficult to get people to return to the data catalog if they are disenchanted with their first experience with the tool. For example, if the metadata is not up to date or the information that is being provided in incomplete, users are unlikely to return a second time. Therefore, it makes sense to assure that the data catalog is fit for use when it is initially made available. Fit for use requires that the metadata in the tool is well defined, change control is in place to keep the metadata accurate, and people are educated as to how to get the most value out of the tools.
Use a Data Catalog to Quickly and Effectively Start Your Program
Specific facets of implementing a data catalog can be done quickly and effectively. The same can be said for data governance. However, to fully implement and sustain both requires commitment, patience. and the persistent application of resources.
Successful implementation of data governance and a data catalog requires that the leadership of the organization support, sponsor, and understand the value that comes from both, and the relationship between the two. Success also requires that the disciplines associated with both are implemented, following an approach that aligns with the work culture of your organization. In previous blogs in the series, the approach was referred to as Non-Invasive Data Governance.
Facets of a data governance program that can be implemented quickly and effectively include:
- The recognition of roles and responsibilities that align with the culture of your organization.
- The application of governance to data processes that improve the definition, production, and use of data.
- The development and delivery of effective socialization and communications of governing best practices.
- The activation of data stewards to improve the understanding, quality, and protection of critical data.
Facets of a data catalog that can be implemented quickly and effectively include:
- The automation of ingesting metadata into the tool (in other words automate, don’t hesitate).
- The utilization of machine learning to improve data management, governance, and consumption.
- The delivery of an effective metadata hub combining a traditional glossary, stewardship, and a centralized marketplace for data intelligence.
- The activation of metadata stewards to improve the definition, production, and usage of metadata.
Download the ebook to learn:
- Why formalizing data governance starts with people
- How a non-invasive approach mediates ownership & promotes stewardship
- How to move from passive to active data governance
- Eight tips for becoming a data steward
- How a data catalog can help