Data strategies for an AI-powered government – Atlantic Council
Our programs and centers deliver in-depth, highly relevant issue briefs and reports that break new ground, shift opinions, and set agendas on public policy, with a focus on advancing debates by integrating foundational research and analysis with concrete policy solutions.
When major global news breaks, the Atlantic Council’s experts have you covered—delivering their sharpest rapid insight and forward-looking analysis direct to your inbox.
New Atlanticist is where top experts and policymakers at the Atlantic Council and beyond offer exclusive insight on the most pressing global challenges—and the United States’ role in addressing them alongside its allies and partners.
A weekly column by Atlantic Council President and CEO Frederick Kempe, Inflection Points focuses on the global challenges facing the United States and how to best address them.
UkraineAlert is a comprehensive online publication that provides regular news and analysis on developments in Ukraine’s politics, economy, civil society, and culture. UkraineAlert sources analysis and commentary from a wide-array of thought-leaders, politicians, experts, and activists from Ukraine and the global community.
MENASource offers the latest news from across the Middle East, combined with commentary by contributors, interviews with emerging players, multi-media content, and independent analysis from fellows and staff.
Econographics provides an in-depth look at trends in the global economy utilizing state-of-the-art data visualization tools.
October 11, 2023
The public sector’s increasing demand for tools that can apply artificial intelligence (AI) to government data poses significant challenges for federal chief information officers (CIOs), chief data officers (CDOs), and other information technology (IT) stakeholders in the data ecosystem. The technical applications of AI built on federal data are extensive, including hyper-personalization of information and service delivery, predictive analytics, autonomous systems, pattern and anomaly detection, and more.
This community must simultaneously manage growing data lakes (on premises and cloud-based), ensure they follow best practices in governing and stewarding their data, and address demand from both within and outside government for equitable and secure access to data, while maintaining strong privacy protections.
These demands require each data owner to have a data infrastructure appropriate for AI applications. However, many federal IT systems do not yet have that infrastructure to support such applications—or a strategy to establish one—and many stakeholders may not yet recognize what data infrastructure and resources are required or whom to ask for help developing strategies and plans to make AI and machine-learning (ML) applications possible. Moreover, the resources needed are regularly not controlled by the CIO/CDOs or are often undervalued and overlooked by those who set budgets. Finally, not all agencies have the workforce with the skills necessary to build, maintain, and apply an AI/ML-ready data mesh and data fabric.
In two private webinars, the GeoTech Center explored:
Key findings, to date, can be structured into four categories:
Human capital and workforce challenges are foundational: it is critically important to integrate humans into the AI and data management process across the ecosystem and application lifecycles and obtain leadership buy-in on strategic approaches to leveraging data that balance other concerns such as security. Solutions include creating cross-functional task forces and working groups, embedding technology with operational users for immediate feedback, and rewarding (limited) risk-taking on AI projects.
There is a broad need to improve AI literacy across the enterprise, especially at the leadership level, to have meaningful conversations on how to move forward. With ML being at the forefront there is a tendency, especially out in the field, to confuse ML as the only form of AI that exists currently. To improve AI literacy, agencies need to focus on human and organizational behavior; for example, incentivizing actual uptake of a training course and making it part of everyone’s job description to learn about AI. It is also important to develop more acceptance of risk related to AI applications; users are not inherently accepting of automated systems with the potential to take on large significant aspects of their work. But they will find value in tools that augment their capabilities but do not take over their decision making.
For organizations that have not routinely leveraged data for analysis or policy insights (with or without AI), identifying and socializing mission-specific needs and insights that can be addressed helps establish an initial stakeholder community—for example, priority and/or long-standing personnel, financial, operational, or policy questions where existing or new data and AI might reveal actionable insights.
Agencies should consider:
Federal agencies maintain and/or have access to an overwhelming quantity of data—structured and unstructured, qualitative and quantitative, inputs and outputs—that create unique data governance challenges. Data is often poorly structured and not organized in a way amenable to equity assessments or application/use by AI tools. Therefore, it is important to consider up front the data management pipeline, including how to efficiently obtain, clean, organize, and deploy data sets; i.e., getting the data “right” before using it in an AI application. Similarly, when possible, proactively consider what applications might arise from a data set before collection, which will improve the subsequent usability of that data and reduce ‘application drift’ (changes in use and scope beyond the original intention).
The pipeline includes not just the technical aspects of data management but also the need to treat data management as a business problem. Moreover, data is often siloed and generally inaccessible to those outside of the organization in which it was created, preventing its use in machine learning applications outside of this closed ecosystem. Data may also be separated between networks, locations, and classifications. These silos hamper the efficient use of information.
AI relies on data, but senior leaders tend to look at AI as a capability rather than a technology that can create a capability when applied to the right data and/or problem—if agencies don’t have an application in mind, they need to start thinking about getting their data AI-ready—including thinking about getting their infrastructure ready. Digital modernization across the US government is an ongoing challenge, so infrastructure is often not being built fast enough or is being outsourced to the private sector, creating additional challenges, including privacy and security.
It is important to consider the value of curated or specialized data and the tension between quantity and quality. The challenge lies in choosing between high-precision, function-specific applications and more generalized data that can be applied to a broader range of solutions.
The White House Office of Science and Technology Policy (OSTP) is working to help agencies turn data into action by collecting data purposefully in such a way that they can more easily parse it and achieve equitable outcomes. OSTP views equitable data as data that allows for the rigorous assessment of the extent to which government programs yield fair, just outcomes for all individuals.
Some agencies are finding value in AI-generated synthetic data, that can be higher quality and more representative than human-labeled data for selected ML applications while addressing concerns about protecting privacy associated with real data (even when anonymized). However, recursive use of synthetic data—i.e., using information generated from synthetic data in repeated cycles of training—should be avoided as it leads to spurious output.
In the health sector, a major challenge continues to be the need to convert images (such as faxes, which are still widely used) into structured data suitable for AI applications.
Agencies should consider:
As for the planning stage, managing and maintaining the data pipeline is key, from getting the data, cleaning and organizing the data, to deploying the data. Treating data as a business problem is just as important as treating it as a hardware/infrastructure problem. Ontology is very important to get data right and must evolve as the uses of the data evolve. Once there is a common ontology, the data can be released to model trainers and industry partners. The order of the workflow is as follows: getting data ‘right’…then deploying models utilizing that data. However, it is difficult to get program managers to think strategically about data up front, resulting in myriad challenges down the road. “Think about data first!”
When it comes to more specialized or narrowly focused data sets, one must prioritize quality over quantity. There is a tension between solving a particular problem with high precision vs a general problem with many solutions. Quantity may be a quality all on its own that can be addressed separately. There may be pressure to “go big” or “not at all”.
During pilots it is important to integrate the application with human systems, getting it into the hands of users and continuously obtaining feedback, reexamining the data, and updating the software in real time.
Agencies should consider:
It is common in the US government to consider the commercial sector ahead of the government in adopting new technology, including AI. Although AI-enabled applications have matured enough to be readily adopted for US government applications, commercial providers require data of sufficient quality to engender trust in the insights or outputs from deployed applications. Partnerships with the private sector are needed to move the needle across the US government—the current attention and momentum in commercial and government sectors are data and AI are exciting.
A promising area to scale is leveraging large language models (LLMs) ability to write code and find bugs. ChatGPT and other LLMs can now be as effective as previous bespoke tools. (A common non-result for ChatGPT is a request for more information, which when available can lead to useful results.) These technologies will help produce tools to find and fix bugs quickly—even when only applied to “easier/shallower” bugs, this application would be a huge win.
A challenge to scaling LLMs/generative AI at this time is that hallucination rates can approach 30 percent—this rate needs to be brought down before widespread use. Although the capabilities of these systems will ultimately lead to valuable applications, getting hallucination rates down will be difficult. The promise is great, but we have not yet reached the full potential, technology-wise.
Generative AI also introduces new threats that must be acknowledged and rapidly addressed, especially for misinformation from sound/video/image production. Moreover, AI agents will be connected to the Internet—and therefore the physical world. In combination with reinforcement learning such agents could be capable of autonomously causing harm in the physical world.
Agencies should consider:
These findings and recommendations were produced by the Atlantic Council GeoTech Center following private discussions with IT, data science, and AI leaders and experts in both the public and private sectors. This effort has been made possible through the generous support of Accenture Federal Services and Amazon Web Services.
Championing positive paths forward that societies can pursue to ensure new technologies and data empower people, prosperity, and peace.
© 2024 Atlantic Council
All rights reserved.
This article was autogenerated from a news feed from CDO TIMES selected high quality news and research sources. There was no editorial review conducted beyond that by CDO TIMES staff. Need help with any of the topics in our articles? Schedule your free CDO TIMES Tech Navigator call today to stay ahead of the curve and gain insider advantages to propel your business!
When major global news breaks, the Atlantic Council’s experts have you covered—delivering their sharpest rapid insight and forward-looking analysis direct to your inbox.
New Atlanticist is where top experts and policymakers at the Atlantic Council and beyond offer exclusive insight on the most pressing global challenges—and the United States’ role in addressing them alongside its allies and partners.
A weekly column by Atlantic Council President and CEO Frederick Kempe, Inflection Points focuses on the global challenges facing the United States and how to best address them.
UkraineAlert is a comprehensive online publication that provides regular news and analysis on developments in Ukraine’s politics, economy, civil society, and culture. UkraineAlert sources analysis and commentary from a wide-array of thought-leaders, politicians, experts, and activists from Ukraine and the global community.
MENASource offers the latest news from across the Middle East, combined with commentary by contributors, interviews with emerging players, multi-media content, and independent analysis from fellows and staff.
Econographics provides an in-depth look at trends in the global economy utilizing state-of-the-art data visualization tools.
October 11, 2023
The public sector’s increasing demand for tools that can apply artificial intelligence (AI) to government data poses significant challenges for federal chief information officers (CIOs), chief data officers (CDOs), and other information technology (IT) stakeholders in the data ecosystem. The technical applications of AI built on federal data are extensive, including hyper-personalization of information and service delivery, predictive analytics, autonomous systems, pattern and anomaly detection, and more.
This community must simultaneously manage growing data lakes (on premises and cloud-based), ensure they follow best practices in governing and stewarding their data, and address demand from both within and outside government for equitable and secure access to data, while maintaining strong privacy protections.
These demands require each data owner to have a data infrastructure appropriate for AI applications. However, many federal IT systems do not yet have that infrastructure to support such applications—or a strategy to establish one—and many stakeholders may not yet recognize what data infrastructure and resources are required or whom to ask for help developing strategies and plans to make AI and machine-learning (ML) applications possible. Moreover, the resources needed are regularly not controlled by the CIO/CDOs or are often undervalued and overlooked by those who set budgets. Finally, not all agencies have the workforce with the skills necessary to build, maintain, and apply an AI/ML-ready data mesh and data fabric.
In two private webinars, the GeoTech Center explored:
Key findings, to date, can be structured into four categories:
Human capital and workforce challenges are foundational: it is critically important to integrate humans into the AI and data management process across the ecosystem and application lifecycles and obtain leadership buy-in on strategic approaches to leveraging data that balance other concerns such as security. Solutions include creating cross-functional task forces and working groups, embedding technology with operational users for immediate feedback, and rewarding (limited) risk-taking on AI projects.
There is a broad need to improve AI literacy across the enterprise, especially at the leadership level, to have meaningful conversations on how to move forward. With ML being at the forefront there is a tendency, especially out in the field, to confuse ML as the only form of AI that exists currently. To improve AI literacy, agencies need to focus on human and organizational behavior; for example, incentivizing actual uptake of a training course and making it part of everyone’s job description to learn about AI. It is also important to develop more acceptance of risk related to AI applications; users are not inherently accepting of automated systems with the potential to take on large significant aspects of their work. But they will find value in tools that augment their capabilities but do not take over their decision making.
For organizations that have not routinely leveraged data for analysis or policy insights (with or without AI), identifying and socializing mission-specific needs and insights that can be addressed helps establish an initial stakeholder community—for example, priority and/or long-standing personnel, financial, operational, or policy questions where existing or new data and AI might reveal actionable insights.
Agencies should consider:
Federal agencies maintain and/or have access to an overwhelming quantity of data—structured and unstructured, qualitative and quantitative, inputs and outputs—that create unique data governance challenges. Data is often poorly structured and not organized in a way amenable to equity assessments or application/use by AI tools. Therefore, it is important to consider up front the data management pipeline, including how to efficiently obtain, clean, organize, and deploy data sets; i.e., getting the data “right” before using it in an AI application. Similarly, when possible, proactively consider what applications might arise from a data set before collection, which will improve the subsequent usability of that data and reduce ‘application drift’ (changes in use and scope beyond the original intention).
The pipeline includes not just the technical aspects of data management but also the need to treat data management as a business problem. Moreover, data is often siloed and generally inaccessible to those outside of the organization in which it was created, preventing its use in machine learning applications outside of this closed ecosystem. Data may also be separated between networks, locations, and classifications. These silos hamper the efficient use of information.
AI relies on data, but senior leaders tend to look at AI as a capability rather than a technology that can create a capability when applied to the right data and/or problem—if agencies don’t have an application in mind, they need to start thinking about getting their data AI-ready—including thinking about getting their infrastructure ready. Digital modernization across the US government is an ongoing challenge, so infrastructure is often not being built fast enough or is being outsourced to the private sector, creating additional challenges, including privacy and security.
It is important to consider the value of curated or specialized data and the tension between quantity and quality. The challenge lies in choosing between high-precision, function-specific applications and more generalized data that can be applied to a broader range of solutions.
The White House Office of Science and Technology Policy (OSTP) is working to help agencies turn data into action by collecting data purposefully in such a way that they can more easily parse it and achieve equitable outcomes. OSTP views equitable data as data that allows for the rigorous assessment of the extent to which government programs yield fair, just outcomes for all individuals.
Some agencies are finding value in AI-generated synthetic data, that can be higher quality and more representative than human-labeled data for selected ML applications while addressing concerns about protecting privacy associated with real data (even when anonymized). However, recursive use of synthetic data—i.e., using information generated from synthetic data in repeated cycles of training—should be avoided as it leads to spurious output.
In the health sector, a major challenge continues to be the need to convert images (such as faxes, which are still widely used) into structured data suitable for AI applications.
Agencies should consider:
As for the planning stage, managing and maintaining the data pipeline is key, from getting the data, cleaning and organizing the data, to deploying the data. Treating data as a business problem is just as important as treating it as a hardware/infrastructure problem. Ontology is very important to get data right and must evolve as the uses of the data evolve. Once there is a common ontology, the data can be released to model trainers and industry partners. The order of the workflow is as follows: getting data ‘right’…then deploying models utilizing that data. However, it is difficult to get program managers to think strategically about data up front, resulting in myriad challenges down the road. “Think about data first!”
When it comes to more specialized or narrowly focused data sets, one must prioritize quality over quantity. There is a tension between solving a particular problem with high precision vs a general problem with many solutions. Quantity may be a quality all on its own that can be addressed separately. There may be pressure to “go big” or “not at all”.
During pilots it is important to integrate the application with human systems, getting it into the hands of users and continuously obtaining feedback, reexamining the data, and updating the software in real time.
Agencies should consider:
It is common in the US government to consider the commercial sector ahead of the government in adopting new technology, including AI. Although AI-enabled applications have matured enough to be readily adopted for US government applications, commercial providers require data of sufficient quality to engender trust in the insights or outputs from deployed applications. Partnerships with the private sector are needed to move the needle across the US government—the current attention and momentum in commercial and government sectors are data and AI are exciting.
A promising area to scale is leveraging large language models (LLMs) ability to write code and find bugs. ChatGPT and other LLMs can now be as effective as previous bespoke tools. (A common non-result for ChatGPT is a request for more information, which when available can lead to useful results.) These technologies will help produce tools to find and fix bugs quickly—even when only applied to “easier/shallower” bugs, this application would be a huge win.
A challenge to scaling LLMs/generative AI at this time is that hallucination rates can approach 30 percent—this rate needs to be brought down before widespread use. Although the capabilities of these systems will ultimately lead to valuable applications, getting hallucination rates down will be difficult. The promise is great, but we have not yet reached the full potential, technology-wise.
Generative AI also introduces new threats that must be acknowledged and rapidly addressed, especially for misinformation from sound/video/image production. Moreover, AI agents will be connected to the Internet—and therefore the physical world. In combination with reinforcement learning such agents could be capable of autonomously causing harm in the physical world.
Agencies should consider:
These findings and recommendations were produced by the Atlantic Council GeoTech Center following private discussions with IT, data science, and AI leaders and experts in both the public and private sectors. This effort has been made possible through the generous support of Accenture Federal Services and Amazon Web Services.
Championing positive paths forward that societies can pursue to ensure new technologies and data empower people, prosperity, and peace.
© 2024 Atlantic Council
All rights reserved.
This article was autogenerated from a news feed from CDO TIMES selected high quality news and research sources. There was no editorial review conducted beyond that by CDO TIMES staff. Need help with any of the topics in our articles? Schedule your free CDO TIMES Tech Navigator call today to stay ahead of the curve and gain insider advantages to propel your business!

