IARPA looks to next round of AI cybersecurity research – Federal News Network
IARPA wants to make sure that intel agencies can use generative AI without worrying about whether it will expose classified data.
The intelligence community already has plenty of challenges with unauthorized disclosures, and its lead research arm wants to make sure ChatGPT isn’t the next leaker to make news headlines.
That’s one of the challenges the Intelligence Advanced Research Projects Activity is considering under its next round of artificial intelligence research. IARPA’s current program for AI cybersecurity, called “TrojAI,” is wrapping up this year. The effort was launched in 2019 to develop means of detecting adversarial attacks on AI systems. It was established prior to the widespread advances in large language models that power generative artificial intelligence.
IARPA Director Rick Muller said LLMs will be a major focus area for the next program.
“What we want to be able to do is understand in the next round, what kind of training skews are brought into a large language model that might give unintended consequences? What type of hallucinations are going on?” Muller said April 22 during an event hosted by the Intelligence and National Security Alliance in Arlington, Va.
Join WTOP on Apr. 30 at 10:30 a.m. ET to hear from congressional and energy industry leaders about clean fuel options and how the energy industry can and is changing. Register today!
“And then how can we make sure that those models can be trained on classified data and not spew out that data if you ask them nicely?” Muller continued. “If you read the literature in jailbreaking large language models, sometimes it really just takes asking them in the right way.”
In the world of LLMs, “jailbreaking” refers to convincing a system to ignore its built-in safeguards. A related concern is “prompt injections” that disguise malicious instructions as benign inputs, in order to manipulate a generative AI system into leaking sensitive data or taking other nefarious actions.
Meanwhile, intelligence officials believe AI can be used to speed up intelligence gathering and analysis. An IT roadmap released by the Office of the Director of National Intelligence last year called for adopting “AI at scale” across the intelligence community.
Defense and intelligence agencies have been exploring the use of generative AI to analyze open source information. And vendors like Microsoft and Palantir have said they are working to bring large language models to classified networks as well.
IARPA’s TrojAI program, launched in 2019, has focused on building defenses against “Trojan horse”-style attacks on AI systems. The program has worked on detecting attacks across a range of vectors, from training data to the AI model itself.
The research has focused on a range of AI domains, including image recognition, natural language processing, and reinforcement learning. IARPA has published much of the research in conjunction with the National Institute of Standards and Technology.
Muller said the goal is to fill gaps in the market for AI safety.
Sign up for our daily newsletter so you never miss a beat on all things federal
“IARPA doesn’t have the billions of dollars that are required to train a foundation model,” Muller said. “What we want to be able to do is give the intelligence community tools to help them understand when these models are safe, when they’ve been compromised and so on.
While the TrojAI program is wrapping up this year, Muller said the last competition under the program focused on large language models.
“The IC wants to be able to use these tools when people’s lives are on the line,” Muller said. “And so if we’re going to train it on classified data, how do we make sure that that data isn’t compromised down the road in a way that that threatens our resources?”
Copyright © 2025 Federal News Network. All rights reserved. This website is not intended for users located within the European Economic Area.
Follow @jdoubledayWFED
source
This article was autogenerated from a news feed from CDO TIMES selected high quality news and research sources. There was no editorial review conducted beyond that by CDO TIMES staff. Need help with any of the topics in our articles? Schedule your free CDO TIMES Tech Navigator call today to stay ahead of the curve and gain insider advantages to propel your business!
The intelligence community already has plenty of challenges with unauthorized disclosures, and its lead research arm wants to make sure ChatGPT isn’t the next leaker to make news headlines.
That’s one of the challenges the Intelligence Advanced Research Projects Activity is considering under its next round of artificial intelligence research. IARPA’s current program for AI cybersecurity, called “TrojAI,” is wrapping up this year. The effort was launched in 2019 to develop means of detecting adversarial attacks on AI systems. It was established prior to the widespread advances in large language models that power generative artificial intelligence.
IARPA Director Rick Muller said LLMs will be a major focus area for the next program.
“What we want to be able to do is understand in the next round, what kind of training skews are brought into a large language model that might give unintended consequences? What type of hallucinations are going on?” Muller said April 22 during an event hosted by the Intelligence and National Security Alliance in Arlington, Va.
Join WTOP on Apr. 30 at 10:30 a.m. ET to hear from congressional and energy industry leaders about clean fuel options and how the energy industry can and is changing. Register today!
“And then how can we make sure that those models can be trained on classified data and not spew out that data if you ask them nicely?” Muller continued. “If you read the literature in jailbreaking large language models, sometimes it really just takes asking them in the right way.”
In the world of LLMs, “jailbreaking” refers to convincing a system to ignore its built-in safeguards. A related concern is “prompt injections” that disguise malicious instructions as benign inputs, in order to manipulate a generative AI system into leaking sensitive data or taking other nefarious actions.
Meanwhile, intelligence officials believe AI can be used to speed up intelligence gathering and analysis. An IT roadmap released by the Office of the Director of National Intelligence last year called for adopting “AI at scale” across the intelligence community.
Defense and intelligence agencies have been exploring the use of generative AI to analyze open source information. And vendors like Microsoft and Palantir have said they are working to bring large language models to classified networks as well.
IARPA’s TrojAI program, launched in 2019, has focused on building defenses against “Trojan horse”-style attacks on AI systems. The program has worked on detecting attacks across a range of vectors, from training data to the AI model itself.
The research has focused on a range of AI domains, including image recognition, natural language processing, and reinforcement learning. IARPA has published much of the research in conjunction with the National Institute of Standards and Technology.
Muller said the goal is to fill gaps in the market for AI safety.
Sign up for our daily newsletter so you never miss a beat on all things federal
“IARPA doesn’t have the billions of dollars that are required to train a foundation model,” Muller said. “What we want to be able to do is give the intelligence community tools to help them understand when these models are safe, when they’ve been compromised and so on.
While the TrojAI program is wrapping up this year, Muller said the last competition under the program focused on large language models.
“The IC wants to be able to use these tools when people’s lives are on the line,” Muller said. “And so if we’re going to train it on classified data, how do we make sure that that data isn’t compromised down the road in a way that that threatens our resources?”
Copyright © 2025 Federal News Network. All rights reserved. This website is not intended for users located within the European Economic Area.
Follow @jdoubledayWFED
source
This article was autogenerated from a news feed from CDO TIMES selected high quality news and research sources. There was no editorial review conducted beyond that by CDO TIMES staff. Need help with any of the topics in our articles? Schedule your free CDO TIMES Tech Navigator call today to stay ahead of the curve and gain insider advantages to propel your business!


