The data dividend: Fueling generative AI – McKinsey
If your data isn’t ready for generative AI, your business isn’t ready for generative AI.
This article is a collaborative effort by Joe Caserta, Holger Harreis, Kayvaun Rowshankish, Nikhil Srinidhi, and Asin Tavakoli, representing views from McKinsey Digital.
Our latest research estimates that generative AI could add the equivalent of $2.6 trillion to $4.4 trillion in annual economic benefits across 63 use cases.1 Pull the thread on each of these cases, and it will lead back to data. Your data and its underlying foundations are the determining factors to what’s possible with generative AI.
That’s a sobering proposition for most chief data officers (CDOs), especially when 72 percent of leading organizations note that managing data is already one of the top challenges preventing them from scaling AI use cases.2 The challenge for today’s CDOs and data leaders is to focus on the changes that can enable generative AI to generate the greatest value for the business.
The landscape is still rapidly shifting, and there are few certain answers. But in our work with more than a dozen clients on large generative AI data programs, discussions with about 25 data leaders at major companies, and our own experiments in reconfiguring data to power generative AI solutions, we have identified seven actions that data leaders should consider as they move from experimentation to scale:
In determining a data strategy for generative AI, CDOs might consider adapting a quote from President John F. Kennedy: “Ask not what your business can do for generative AI; ask what generative AI can do for your business.” Focus on value is a long-standing principle, but CDOs must particularly rely on it to counterbalance the pressure to “do something” with generative AI. To provide this focus on value, CDOs will need to develop a clear view of the data implications of the business’s overall approach to generative AI, which will play out across three archetypes:
The CDO has the biggest role to play in supporting the Shaper approach, since the Maker approach is currently limited to only those large companies willing to make major investments and the Taker approach essentially accesses commoditized capabilities. One key function in driving the Shaper approach is communicating the trade-offs needed to deliver on specific use cases and highlighting those that are most feasible. While hyperpersonalization, for example, is a promising generative AI use case, it requires clean customer data, strong guardrails for data protection, and pipelines to access multiple data sources. The CDO should also prioritize initiatives that can provide the broadest benefits to the business, rather than simply support individual use cases.
As CDOs help shape the business’s approach to generative AI, it will be important to take a broad view on value. As promising as generative AI is, it’s just one part of the broader data portfolio (Exhibit 1). Much of the potential value to a business comes from traditional AI, business intelligence, and machine learning (ML). If CDOs find themselves spending 90 percent of their time on initiatives related to generative AI, that’s a red flag.
The big change when it comes to data is that the scope of value has gotten much bigger because of generative AI’s ability to work with unstructured data, such as chats, videos, and code. This represents a significant shift because data organizations have traditionally had capabilities to work with only structured data, such as data in tables. Capturing this value doesn’t require a rebuild of the data architecture, but the CDO who wants to move beyond the basic Taker archetype will need to focus on two clear priorities.
The first is to fix the data architecture’s foundations. While this might sound like old news, the cracks in the system a business could get away with before will become big problems with generative AI. Many of the advantages of generative AI will simply not be possible without a strong data foundation. To determine the elements of the data architecture on which to focus, the CDO is best served by identifying the fixes that provide the greatest benefit to the widest range of use cases, such as data-handling protocols for personally identifiable information (PII), since any customer-specific generative AI use case will need that capability.
The second priority is to determine which upgrades to the data architecture are needed to fulfill the requirements of high-value use cases. The key issue here is how to cost effectively manage and scale the data and information integrations that power generative AI use cases. If they are not properly managed, there is a significant risk of overstressing the system with massive data compute activities, or of teams doing one-off integrations, which increase complexity and technical debt. These issues are further complicated by the business’s cloud profile, which means CDOs must work closely with IT leadership to determine compute, networking, and service use costs.
In general, the CDO will need to prioritize the implementation of five key components of the data architecture as part of the enterprise tech stack(Exhibit 2):
Data quality has always been an important issue for CDOs. But the scale and scope of data that generative AI models rely on has made the “garbage in/garbage out” truism much more consequential and expensive, as training a single LLM can cost millions of dollars.3 One reason pinpointing data quality issues is much more difficult in generative AI models than in classical ML models is because there’s so much more data and much of it is unstructured, making it difficult to use existing tracking tools.
CDOs need to do two things to ensure data quality: extend their data observability programs4 for generative AI applications to better spot quality issues, such as by setting minimum thresholds for unstructured content to be included in generative AI applications; and develop interventions across the data life cycle to fix the issues teams find, mainly in four areas:
Some 71 percent of senior IT leaders believe generative AI technology is introducing new security risk to their data.5 Much has been written about security and risk when it comes to generative AI, but CDOs needs to consider the data implications in three specific areas:
As enterprises increasingly adopt generative AI, CDOs will have to focus on the implications for talent. Some coding tasks will be done by generative AI tools—41 percent of code published on GitHub is written by AI.6 This requires specific training on working with a generative AI “copilot”—a recent McKinsey study showed that senior engineers work more productively with a generative AI copilot than do junior engineers.7 Data and AI academies need to incorporate generative AI training tailored to specific expertise levels.
CDOs will also need to be clear about what skills best enable generative AI. Companies need people who can integrate data sets (such as writing APIs connecting models to data sources), sequence and chain prompts, wrangle large quantities of data, apply LLMs, and work with model parameters. This means that CDOs should focus more on finding data engineers, architects, and back-end engineers, and less on hiring data scientists, whose skills will be increasingly less critical as generative AI allows people with less advanced technical capabilities to use natural language in doing basic analysis.
In the near term, talent will remain in shorter supply, and we project that the talent gap will increase further in the near future,8 creating more incentives for CDOs to build up their training programs.
Data leaders have a huge opportunity to harness generative AI to improve their own function. In our analysis, eight primary use cases have emerged along the entire data value chain where generative AI can both accelerate existing tasks and improve how tasks are performed (Exhibit 3).
Many vendors are already rolling out products, requiring CDOs to identify the capabilities for which they can rely on vendors and which they should build themselves. One rule of thumb is that for data governance processes that are unique to the business, it’s better to build your own tool. Note that many tools and capabilities are new and may work well in experimental environments but not at scale.
There are more unknowns than knowns in the generative AI world today, and companies are still learning their way forward. It is therefore crucial for CDOs to set up systems to actively track and manage progress on their generative AI initiatives and to understand how well data is performing in supporting the business’s goals.
In practice, effective metrics are made up of a set of core KPIs and operational KPIs (the underlying activities that drive KPIs), which help leaders track progress and identify root causes of issues.
A core set of KPIs should include the following:
Operational KPIs should include tracking which data are being used most, how models are performing, where data quality is poor, how many requests are being made against a given data set, and which use cases are generating the most activity and value.
This information is critical in providing a fact base for leadership to not just track progress but also make rapid adjustments and trade-off decisions against other initiatives in the CDO’s broader portfolio. By knowing which data sources are most used for high-value models, for example, the CDO can prioritize investments to improve data quality at those sources.
Effective investment, budgeting, and reallocation will depend on CDOs developing a FinOps-like capability to manage the entire new cost structure growing around generative AI. CDOs will need to track a new range of costs, including the number of generative AI model requests, API consumption charges from vendors (both quantity and size of calls), and compute and storage charges from cloud providers. With this information, the CDO can determine how best to optimize costs, such as routing requests by priority level or moving certain data to the cloud to cut down on networking costs.
The value of these metrics is only as great as the degree to which CDOs act on them. CDOs will need to establish data-performance metrics that can be reviewed in near real time and protocols to make rapid decisions. Effective data governance programs should remain in place but be extended to incorporate generative AI–related decisions.
Data cannot be an afterthought in generative AI. Rather, it is the core fuel that powers the ability of a business to capture value from generative AI. But businesses that want that value cannot afford CDOs who merely manage data; they need CDOs who understand how to use data to lead the business.
Joe Caserta is a partner in McKinsey’s New York office, where Kayvaun Rowshankish is a senior partner; Holger Harreis is a senior partner in the Düsseldorf office, where Asin Tavakoli is a partner; and Nikhil Srinidhi is an associate partner in the Berlin office.
The authors wish to thank Sven Blumberg, Stephanie Brauckmann, Carlo Giovine, Jonas Heite, Vishnu Kamalnath, Simon Malberg, Rong Parnas, Bruce Philp, Adi Pradhan, Alex Singla, Saravanakumar Subramaniam, Alexander Sukharevsky, and Kevin-Morris Wigand for their contributions to this article.
This article was autogenerated from a news feed from CDO TIMES selected high quality news and research sources. There was no editorial review conducted beyond that by CDO TIMES staff. Need help with any of the topics in our articles? Schedule your free CDO TIMES Tech Navigator call today to stay ahead of the curve and gain insider advantages to propel your business!
This article is a collaborative effort by Joe Caserta, Holger Harreis, Kayvaun Rowshankish, Nikhil Srinidhi, and Asin Tavakoli, representing views from McKinsey Digital.
Our latest research estimates that generative AI could add the equivalent of $2.6 trillion to $4.4 trillion in annual economic benefits across 63 use cases.1 Pull the thread on each of these cases, and it will lead back to data. Your data and its underlying foundations are the determining factors to what’s possible with generative AI.
That’s a sobering proposition for most chief data officers (CDOs), especially when 72 percent of leading organizations note that managing data is already one of the top challenges preventing them from scaling AI use cases.2 The challenge for today’s CDOs and data leaders is to focus on the changes that can enable generative AI to generate the greatest value for the business.
The landscape is still rapidly shifting, and there are few certain answers. But in our work with more than a dozen clients on large generative AI data programs, discussions with about 25 data leaders at major companies, and our own experiments in reconfiguring data to power generative AI solutions, we have identified seven actions that data leaders should consider as they move from experimentation to scale:
In determining a data strategy for generative AI, CDOs might consider adapting a quote from President John F. Kennedy: “Ask not what your business can do for generative AI; ask what generative AI can do for your business.” Focus on value is a long-standing principle, but CDOs must particularly rely on it to counterbalance the pressure to “do something” with generative AI. To provide this focus on value, CDOs will need to develop a clear view of the data implications of the business’s overall approach to generative AI, which will play out across three archetypes:
The CDO has the biggest role to play in supporting the Shaper approach, since the Maker approach is currently limited to only those large companies willing to make major investments and the Taker approach essentially accesses commoditized capabilities. One key function in driving the Shaper approach is communicating the trade-offs needed to deliver on specific use cases and highlighting those that are most feasible. While hyperpersonalization, for example, is a promising generative AI use case, it requires clean customer data, strong guardrails for data protection, and pipelines to access multiple data sources. The CDO should also prioritize initiatives that can provide the broadest benefits to the business, rather than simply support individual use cases.
As CDOs help shape the business’s approach to generative AI, it will be important to take a broad view on value. As promising as generative AI is, it’s just one part of the broader data portfolio (Exhibit 1). Much of the potential value to a business comes from traditional AI, business intelligence, and machine learning (ML). If CDOs find themselves spending 90 percent of their time on initiatives related to generative AI, that’s a red flag.
The big change when it comes to data is that the scope of value has gotten much bigger because of generative AI’s ability to work with unstructured data, such as chats, videos, and code. This represents a significant shift because data organizations have traditionally had capabilities to work with only structured data, such as data in tables. Capturing this value doesn’t require a rebuild of the data architecture, but the CDO who wants to move beyond the basic Taker archetype will need to focus on two clear priorities.
The first is to fix the data architecture’s foundations. While this might sound like old news, the cracks in the system a business could get away with before will become big problems with generative AI. Many of the advantages of generative AI will simply not be possible without a strong data foundation. To determine the elements of the data architecture on which to focus, the CDO is best served by identifying the fixes that provide the greatest benefit to the widest range of use cases, such as data-handling protocols for personally identifiable information (PII), since any customer-specific generative AI use case will need that capability.
The second priority is to determine which upgrades to the data architecture are needed to fulfill the requirements of high-value use cases. The key issue here is how to cost effectively manage and scale the data and information integrations that power generative AI use cases. If they are not properly managed, there is a significant risk of overstressing the system with massive data compute activities, or of teams doing one-off integrations, which increase complexity and technical debt. These issues are further complicated by the business’s cloud profile, which means CDOs must work closely with IT leadership to determine compute, networking, and service use costs.
In general, the CDO will need to prioritize the implementation of five key components of the data architecture as part of the enterprise tech stack(Exhibit 2):
Data quality has always been an important issue for CDOs. But the scale and scope of data that generative AI models rely on has made the “garbage in/garbage out” truism much more consequential and expensive, as training a single LLM can cost millions of dollars.3 One reason pinpointing data quality issues is much more difficult in generative AI models than in classical ML models is because there’s so much more data and much of it is unstructured, making it difficult to use existing tracking tools.
CDOs need to do two things to ensure data quality: extend their data observability programs4 for generative AI applications to better spot quality issues, such as by setting minimum thresholds for unstructured content to be included in generative AI applications; and develop interventions across the data life cycle to fix the issues teams find, mainly in four areas:
Some 71 percent of senior IT leaders believe generative AI technology is introducing new security risk to their data.5 Much has been written about security and risk when it comes to generative AI, but CDOs needs to consider the data implications in three specific areas:
As enterprises increasingly adopt generative AI, CDOs will have to focus on the implications for talent. Some coding tasks will be done by generative AI tools—41 percent of code published on GitHub is written by AI.6 This requires specific training on working with a generative AI “copilot”—a recent McKinsey study showed that senior engineers work more productively with a generative AI copilot than do junior engineers.7 Data and AI academies need to incorporate generative AI training tailored to specific expertise levels.
CDOs will also need to be clear about what skills best enable generative AI. Companies need people who can integrate data sets (such as writing APIs connecting models to data sources), sequence and chain prompts, wrangle large quantities of data, apply LLMs, and work with model parameters. This means that CDOs should focus more on finding data engineers, architects, and back-end engineers, and less on hiring data scientists, whose skills will be increasingly less critical as generative AI allows people with less advanced technical capabilities to use natural language in doing basic analysis.
In the near term, talent will remain in shorter supply, and we project that the talent gap will increase further in the near future,8 creating more incentives for CDOs to build up their training programs.
Data leaders have a huge opportunity to harness generative AI to improve their own function. In our analysis, eight primary use cases have emerged along the entire data value chain where generative AI can both accelerate existing tasks and improve how tasks are performed (Exhibit 3).
Many vendors are already rolling out products, requiring CDOs to identify the capabilities for which they can rely on vendors and which they should build themselves. One rule of thumb is that for data governance processes that are unique to the business, it’s better to build your own tool. Note that many tools and capabilities are new and may work well in experimental environments but not at scale.
There are more unknowns than knowns in the generative AI world today, and companies are still learning their way forward. It is therefore crucial for CDOs to set up systems to actively track and manage progress on their generative AI initiatives and to understand how well data is performing in supporting the business’s goals.
In practice, effective metrics are made up of a set of core KPIs and operational KPIs (the underlying activities that drive KPIs), which help leaders track progress and identify root causes of issues.
A core set of KPIs should include the following:
Operational KPIs should include tracking which data are being used most, how models are performing, where data quality is poor, how many requests are being made against a given data set, and which use cases are generating the most activity and value.
This information is critical in providing a fact base for leadership to not just track progress but also make rapid adjustments and trade-off decisions against other initiatives in the CDO’s broader portfolio. By knowing which data sources are most used for high-value models, for example, the CDO can prioritize investments to improve data quality at those sources.
Effective investment, budgeting, and reallocation will depend on CDOs developing a FinOps-like capability to manage the entire new cost structure growing around generative AI. CDOs will need to track a new range of costs, including the number of generative AI model requests, API consumption charges from vendors (both quantity and size of calls), and compute and storage charges from cloud providers. With this information, the CDO can determine how best to optimize costs, such as routing requests by priority level or moving certain data to the cloud to cut down on networking costs.
The value of these metrics is only as great as the degree to which CDOs act on them. CDOs will need to establish data-performance metrics that can be reviewed in near real time and protocols to make rapid decisions. Effective data governance programs should remain in place but be extended to incorporate generative AI–related decisions.
Data cannot be an afterthought in generative AI. Rather, it is the core fuel that powers the ability of a business to capture value from generative AI. But businesses that want that value cannot afford CDOs who merely manage data; they need CDOs who understand how to use data to lead the business.
Joe Caserta is a partner in McKinsey’s New York office, where Kayvaun Rowshankish is a senior partner; Holger Harreis is a senior partner in the Düsseldorf office, where Asin Tavakoli is a partner; and Nikhil Srinidhi is an associate partner in the Berlin office.
The authors wish to thank Sven Blumberg, Stephanie Brauckmann, Carlo Giovine, Jonas Heite, Vishnu Kamalnath, Simon Malberg, Rong Parnas, Bruce Philp, Adi Pradhan, Alex Singla, Saravanakumar Subramaniam, Alexander Sukharevsky, and Kevin-Morris Wigand for their contributions to this article.
This article was autogenerated from a news feed from CDO TIMES selected high quality news and research sources. There was no editorial review conducted beyond that by CDO TIMES staff. Need help with any of the topics in our articles? Schedule your free CDO TIMES Tech Navigator call today to stay ahead of the curve and gain insider advantages to propel your business!

