Value of a Maturity Model
TDWI Research indicates that organizations see modernizing their warehouse as an opportunity that can lead to improvements in decision making, analytics, real-time data usage, and business operations. A maturity model can help guide business and IT professionals on their data warehouse modernization journey. It provides a framework for an enterprise to understand where it is, where it has been, and where it still needs to go to support capabilities and requirements for its modernization efforts. The model can also guide an organization at the beginning of its journey by helping it understand best practices used by companies that are further along in their deployments.
A great feature of TDWI maturity models is the interactive benchmark assessment. At the end of the survey, you will be able to quantify how mature/modern your data warehouse is in an objective way, understand your progress, and identify what it will take to get to the next level of maturity. This guide will help you understand the phases of maturity in modern data warehousing and interpret your benchmarking scores.
The Modern Data Warehouse Maturity Model assessment asks approximately 50 questions across five categories that form the dimensions of the TDWI Modern Data Warehouse Maturity Model (see Figure 1).
- DATA DIVERSITY. A data warehouse should be able to manage and support large amounts of multi structured data. Does the warehouse support a variety of data types? Can the organization make decisions based on current data in the warehouse? Is that data fresh? Is the data accessible in real time? Is the schema flexible for new data sources? Can the warehouse scale up and down easily to support varying amounts of disparate data? Can users get a holistic view of data across sources, both internal and external to company/not in data warehouse?
- INFRASTRUCTURE AGILITY. Warehouse infrastructure needs to be agile to support the various needs of the organization. How integrated (e.g., supports multiple IT components) is the data warehouse architecture in support of business use cases? Does the warehouse provide the freedom to query data from anywhere? Can the warehouse ingest data in real time? Can data be processed in the warehouse and analyzed? Does the warehouse separate compute from storage such that customers can scale each independently? Can compute be brought to the data rather than moving the data?
- ANALYTICS SUPPORT. Analytics is the key use case for the data warehouse. What is the scope of analytics supported by the warehouse? Does it support more advanced analytics such as machine learning or predictive analytics? Does the warehouse support geospatial data types and functions? Is performance acceptable? Can organizations analyze “new” forms of data in the warehouse? Does the warehouse support production analytics? Is it easy to explore data in the warehouse?
- SHARING AND COLLABORATION. Organizational data strategies that allow for sharing and collaboration are also a sign of data warehouse maturity. Does the data strategy align with the business strategy? Can the organization share its data both internally and externally? Does operational overhead stymie the data warehouse? Does the data warehouse allow users to work collaboratively by sharing data and workloads/queries?
- SECURITY AND GOVERNANCE. Governance is critical to data warehouse maturity. How easy to understand is the company’s data governance strategy in support of the data warehouse? Are policies and processes in place, e.g., for data access? Are data quality processes deployed and measured? Are there preferred standards in place? Is tooling in place to support governance? How does the organization secure the data in the warehouse and across warehouse platforms?
The Modern Data Warehouse Maturity Model consists of four stages plus an inflection point where the data warehouse becomes more modern (see Figure 2).
Data Warehouse Modernization Overview
Data warehouse platforms are constantly evolving. Almost 10 years ago, TDWI wrote about what we called “generational changes” that adapt the resulting data warehouse to changing business and technology requirements.
Back then, a majority of respondents to our surveys were using a centralized enterprise data warehouse. The reality is that many of the traditional, older warehouse environments cannot meet the requirements (e.g., iterative analytics with good performance) for sophisticated analysis of real-time data at scale. As organizations move from batch to real-time, from reporting to advanced analytics, they are extending their platforms to become more responsive to business needs.
Today, many organizations are implementing what TDWI terms a multiplatform modern data architecture to cope with changes in data and analytics. These multiplatform environments include both on-premises and cloud deployments. Our research shows that the cloud is a major growth area for data warehouse modernization; organizations are looking to move all or part of their warehouses to the cloud as well as other platforms, which form the extended warehouse environment. This evolving environment supports diverse data as well as advanced and real-time analytics. These two big capabilities are described below.
Support for Diverse Data
Although many organizations are still dealing with structured data in their data warehouse, TDWI research indicates that companies have increasing interest in disparate kinds of data. This includes text data, images, streaming data, geospatial data, and machine-generated data—to name a few.
This data comes from both internal and external sources and is becoming more critical for driving business value. For instance, many organizations are looking to analyze sensor data for predictive maintenance. They are collecting data about customers, including social media data, to better understand the customer experience and improve customer experience. The current warehouse environment may not be able to support this variety of data.
Likewise, organizations want to make decisions using a holistic view of data from both internal and external sources. That means users want a unified view of potentially distributed data. They want the data to be current, which means that the warehouse environment needs to refresh potentially large amounts of data in real time or near real time. This might involve continuous ingestion, which is quite fast and frequent compared to batch loads.