News Feed

Defense Priorities in the Open-Source AI Debate – CSIS | Center for Strategic and International Studies

Photo: CSIS
Brief by Masao Dahlgren
Published August 19, 2024
Generative artificial intelligence (AI) has increasingly become a U.S. Department of Defense (DOD) priority. Software powered by generative AI foundation models—generalist systems that emulate human reasoning—might process reams of raw intelligence, automate Pentagon paperwork, or allow aircraft, trucks, and ships to navigate themselves.1 Many advancements in this sector originate in commercial and academic research.2 If generative AI sees wide adoption across the DOD, this base of commercial foundation model developers will become a critical part of the defense industrial base.3 The Joint Force thus has a stake in the commercial foundation model ecosystem and how it evolves.
Indeed, DOD AI strategies hinge on continued commercial innovation in AI.4 To that end, the Pentagon has assigned new funding to acquire AI-powered systems, such as for its Replicator drones and Joint All-Domain Command and Control battle network, and new organizations to manage them, empowering the Chief Digital and AI Office (CDAO), Task Force Lima, and others.5
Amid this institutional buildup, the Pentagon should appraise proposed commercial foundation model market regulations. As with spectrum auctions or shipbuilding, civil sector policymaking will shape the DOD’s future choices.6 Policies that create a competitive ecosystem of market players could improve the supply chain for future DOD programs. Conversely, policies that accelerate consolidation—like the Jones Act or the 1993 “Last Supper”—might threaten it.7 The 2023 Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence has made unprecedented use of the Defense Production Act to regulate the commercial AI market.8 It is now worth asking where regulation of commercial markets could affect defense production.

Emerging civil society debates over AI safety—especially over open foundation models—merit particular attention. Unlike with closed models like GPT-4, developers of open foundation models like Llama, Mistral, or Qwen openly publish the models’ underlying parameters (“weights”), allowing them to be inspected, modified, and operated by end users.10 With the performance of open models approaching their closed counterparts (Figure 2), some have suggested that open model distribution could pose “extreme risks” for misuse.11 Others, meanwhile, have highlighted open models’ benefits for research, security, and national competitiveness.12 Though outcomes remain uncertain, proposals to limit the distribution of open models—such as through California Senate Bill (SB) 1047—have recently gained legislative traction.13
How the open foundation model debate is resolved would have direct implications for the defense industrial base. As detailed in later sections, there are preliminary reasons to believe that a diverse open model ecosystem might benefit the DOD. The widespread availability of high-performance, open-source foundation models could improve the DOD’s ability to (1) competitively source and sustain AI systems, (2) deploy AI securely, and (3) address novel use cases. Considering these impacts, the open model debate represents a test case for how civil society evaluates defense priorities in AI policy decisions.
Outlining these implications might also clarify, in White House Office of Science and Technology Policy director Arati Prabhakar’s words, an often “garbled conversation about the implications, including safety implications, of AI technology.”14 Indeed, in its flagship report on the subject, the Biden administration suggested that “the government should not restrict the wide availability of model weights” but that “extrapolation based on current capabilities and limitations is too difficult to conclude whether open foundation models, overall, pose more marginal risks than benefits.”15 The administration has not endorsed open model restrictions nor foreclosed future regulation. An accounting of defense industrial benefits might therefore contribute to this ongoing conversation.
Open-source software and standards are already widespread in U.S. national security applications.16 Army smartphones, Navy warships, and Space Force missile-warning satellites run on Linux-derived operating systems.17 AI-powered F-16s run on open-source orchestration frameworks like Kubernetes, which is regularly updated, maintained, and tested by industry and the broader public.18 Open-source software is ubiquitous, permeating over 96 percent of civil and military codebases, and will remain a core piece of defense infrastructure for years to come.19
What constitutes an “open” foundation model is less well defined. Developers can distribute foundation models at different levels of “openness”—from publishing white papers and basic technical information to releasing models entirely, including their underlying weights, training data, and the code used to run them.20 By contrast, developers of closed models, including GPT-4 or Claude, release fewer details or data, only allowing user access through proprietary application programming interfaces.21 In general, this brief defines “open” models as those with widely available weights, consistent with relevant categories in the 2023 AI executive order.22 Many of the risks and benefits discussed here flow from these definitions.
Claims of extraordinary risk have motivated several recent proposals surrounding open-source AI. Analysts have expressed concern that malicious users might modify open foundation models to discover cybersecurity vulnerabilities or instruct users in the creation of chemical and biological weapons.23 Others have argued that public distribution of model weights could aid adversaries in advancing their AI capabilities.24 Given these apprehensions, some observers have proposed export controls, licensing rules, and liability regimes that would limit the distribution of open foundation models.25
A competing school of thought has emphasized the societal benefits of open foundation models.26 Open distribution of weights, some argue, accelerates innovation and adoption: indeed, the key frameworks and innovations underpinning today’s large language models (LLMs), like PyTorch and the transformer architecture itself, were distributed openly.27 Others contend that the public scrutiny of model weights enables rapid discovery and repair of vulnerabilities, improves public transparency, and reduces the concentration of political and economic power as AI systems increase in importance.28
What is most clear, however, is that this risk-benefit assessment remains incomplete. The U.S. Department of Commerce’s initial assessment is inconclusive, and AI safety literature has thus far lacked clear frameworks for identifying relative risk and benefit and whether they are unique to open models.29 Despite concerns over AI models instructing untrained users in biological weapon development, for instance, recent red-teaming exercises concluded that LLM-equipped teams performed similarly to those without.30 Similar concerns over AI-assisted cyber vulnerability discovery remain unclear, with some arguing that enhanced vulnerability detection may benefit cyber defenders over attackers, or that the balance of advantage would be case-dependent.31 Malicious use, meanwhile, continues to take place with closed models.32 In brief, more research remains necessary to unpack where the relative risks and benefits lie.33 The purportedly catastrophic harms of tomorrow’s foundation models have not yet come into clear view.34
Second, the pace of technical change has been so uncertain that evaluating future benefits, harms, and policy interventions can be challenging.35 Whether a licensing regime is effective, for example, depends on how readily foundation model technologies will diffuse.36 And whether export controls benefit national security hinges on which analogy becomes relevant: Is restricting open models like restricting nuclear weapons exports, or is it akin to Cold War bans (now repealed) on public-key cryptography, a technology which now underpins online banking, e-commerce, and a multi-trillion-dollar digital economy?37 In the absence of a U.S. market presence, will Chinese open models take their place?38
Finally, questions remain on how to implement AI policy. Definitional challenges abound; early AI policy approaches, including the EU AI Act, AI executive order, and California SB 1047, apply thresholds for “systemic risk” to models exceeding a certain amount of computing power or cost used in their development.39 However, such thresholds for triggering government review, such as the 1026 floating-point-operation threshold in the AI executive order, may incompletely capture the capabilities they aim to regulate.40 How to balance resourcing for AI policy implementation against other cyber and biological threat mitigations, such as for material monitoring or new cyberdefense capabilities, remains another open question.41

A defense industrial assessment could thus contribute a valuable perspective to the AI risk debate. With AI industry trends favoring consolidation, the open foundation model ecosystem may become an increasingly important source of competition in the industrial base.43 Because end users can modify and run open models directly, they have become increasingly relevant for developing local, secure applications and embedded systems—needed by military users demanding low power usage, security, and reliability. And because open models can be publicly inspected, red teamed, and verified, they may present defense-related cybersecurity advantages.44
To date, however, the DOD has largely focused on AI adoption.45 In its flagship data, responsible AI, and adoption strategies, the DOD has focused on harnessing private sector innovations for national security end uses.46 It has embedded chief data officers in combatant commands; tested AI use cases in major experimentation initiatives, like the Global Information Dominance Experiment, Project Convergence, and others; and developed the Responsible AI Framework, emphasizing the use of traceable, transparent AI systems.47
In August 2023, the DOD established Task Force Lima, an element within the CDAO tasked with “responsibly pursu[ing] the adoption” of generalist models.48 Alongside the CDAO and Responsible AI Working Council, Lima was chartered to “accelerate” AI initiatives, “federate disparate developmental and research efforts,” and engage across the interagency on the “responsible development and use of generative AI,” with a final strategy due for release in early 2025.49
Clarifying potential AI use cases within the DOD is a valuable first step in “mak[ing] life easier for program offices who want to do AI or add AI.”50 A valuable second step would be to identify the broader trajectory of the AI industrial base. The DOD will often rely on industry expertise to develop and identify more generative AI use cases; a broad ecosystem of model and application developers will be critical for this process.51
In short, an assessment of defense industrial impacts is conspicuously missing from the broader debate on open foundation models. Arguments over model regulation are couched in national security language but should involve a broader swath of national security practitioners, including defense acquisition professionals.52 Accordingly, DOD elements, including the CDAO, should independently assess the national capacity to develop AI-powered systems and the impact of open foundation models.
Arguments over model regulation are couched in national security language but should involve a broader swath of national security practitioners.
A defense industrial accounting is needed because of preliminary evidence that open foundation models—and their supporting ecosystem—could be useful for the DOD. AI adoption remains a DOD priority, and open release has historically accelerated the rate of technology adoption.53 Open-source development may have positive competitive implications for defense acquisition. And open-source communities are accelerating developments in on-premises deployment, fine-tuning for specialized applications, model reliability, and other desirable characteristics for defense end users.54
Preliminary evidence suggests that open foundation models might benefit the defense industrial base. What is now needed is a quantitative assessment of the open ecosystem’s fiscal impacts. Beyond assessing pathways to adoption, defense policymakers should review the changing competitive landscape for foundation models and potential implications for the defense industrial base. Three recommendations follow:
There is a considerable risk in disregarding defense industrial impacts in the debate over open foundation models. Major consolidation in other sectors, such as in shipbuilding and aerospace, has come alongside major declines in defense acquisition speed and capacity.91 If open foundation models are indeed “dual-use”—and therefore critical to national security—the potential for consolidation deserves national security attention.
Civil sector policy decisions have created a shipbuilding base outproduced by China 230 to 1.92 Other deliberate choices over the nation’s industrial base have meant that demand now outpaces supply for missile defense systems, artillery shells, and AI talent.93 If production is deterrence, these are the stakes of the open-source AI debate.94
Others will not wait for the United States if it falls back.95 China and other states have made vast investments in stimulating their domestic AI industries, with top models “not far behind” Western counterparts.96 The United States’ competitors view open foundation models as a means of capturing global market share and advancing scientific and economic development.97 While the development of AI is not an arms race, it is a broader economic and social competition—one where U.S. priorities on democracy, transparency, and security should define global standards.98 The technology and value system to do so are already in place. Attention is all it needs.99
Masao Dahlgren is a fellow with the Missile Defense Project at the Center for Strategic and International Studies in Washington, D.C.
The project team held two private roundtables, on July 10 and August 1, 2024, to inform the conclusions of this brief. Participants included subject matter experts from the U.S. Department of Defense, U.S. Department of Energy, AI and defense firms, investors, universities, nonprofits, and other stakeholder groups. In addition, the author interviewed multiple government and industry experts to support the research process.
Special thanks to Patrycja Bazylczyk for her assistance in roundtable facilitation, report review, and project performance. Further acknowledgments to Shaan Shaikh, Wes Rumbaugh, Tom Karako, Cynthia Cook, Greg Sanders, Charles Yang, Kevin Li, Michelle Fang, and the many external reviewers involved in refining this report.
This brief was made possible with support from the Omidyar Network and general support to the CSIS Missile Defense Project.
Please consult the PDF for footnotes.
CSIS Briefs are produced by the Center for Strategic and International Studies (CSIS), a private, tax-exempt institution focusing on international public policy issues. Its research is nonpartisan and nonproprietary. CSIS does not take specific policy positions. Accordingly, all views, positions, and conclusions expressed in this publication should be understood to be solely those of the author(s).
© 2024 by the Center for Strategic and International Studies. All rights reserved.
The following table compiles self-reported Massive Multitask Language Understanding (MMLU) scores, one of many benchmarks used to evaluate foundation model performance. Typical scores reported are “5-shot,” where five examples are provided before models are prompted with a question. “Open” models listed include those with publicly available model weights.
While MMLU is an incomplete representation of model performance, this benchmark was selected because it is among the oldest, allowing for consistent comparison over time. Writ large, the foundation model industry increasingly suffers from a benchmarking crisis, facing issues of model overfitting—internalizing existing benchmark results—and obsolescence.100 Developing trusted benchmarks has thus become a major Department of Commerce priority.101

Center for Strategic and International Studies

1616 Rhode Island Avenue, NW

Washington, DC 20036
Tel: 202.887.0200

Fax: 202.775.3199
See Media Page for more interview, contact, and citation details.
©2024 Center for Strategic & International Studies. All Rights Reserved.


This article was autogenerated from a news feed from CDO TIMES selected high quality news and research sources. There was no editorial review conducted beyond that by CDO TIMES staff. Need help with any of the topics in our articles? Schedule your free CDO TIMES Tech Navigator call today to stay ahead of the curve and gain insider advantages to propel your business!

Leave a Reply