Senior staffers at CISA have backed the development of open source AI models, saying they can help strengthen cyber security as well as increasing competition and promoting innovation.
But unless developers disclose training data, users will be hamstrung in their ability to understand models and mitigate vulnerabilities, CISA technical advisor Jack Cable, and open source security section chief Aeva Black wrote.
They also called on open AI developers to heed the lessons of open source software to improve the security an safety of their models.
Cable and Black’s statement follows concerns in some circles that open source AI represents a massive potential threat, and AI development should be left in the hands of big tech.
But Cable and Black said that “At CISA, we see significant value in open foundation models to help strengthen cybersecurity, increase competition, and promote innovation.”
READ MORE: Linux Foundation tool reveals the truth about "open" AI models
Are open AI models secure?
The open source software world “faced similar debates during the 1990s, and we know that there are many lessons to be learned from the history of OSS,” CISA's blog advised.
But, it continued, when it came to dual-use cybersecurity tools, the general consensus is that "the benefits of open sourcing security tools for defenders outweigh the harms that might be leveraged by adversaries – who, in many cases, will get their hands on tools whether or not they are open sourced.”
Building on this, they predicted, “While we cannot anticipate all the potential use cases of AI, lessons from cybersecurity history indicate that we can stand to benefit from dual-use open source tools.”
In practical terms, they said, “operators of package depositories, such as platforms that distribute AI source code, models, weights, or training data – should work towards the items in the Principles for Package Repository Security framework.”
Beyond that, Cable and Black identified two main classes of harms. The first centres on the intent of whoever deploys the model – for example using them to orchestrate cyberattacks. Countering this requires a “multipronged risk reduction approach” drawing on existing trust and safety work. But developers should also consider “domain specific risk mitigations, such as discouraging training models to, for example, produce “non consensual intimate imagery”.
The second class is unintentional harms, such as cybersecurity vulnerabilities, which they said, can be countered through a secure by design approach. This could also include training the model “in a publicly verifiable way, or on publicly available data, thereby allowing others to more fully study the model’s behaviour and gain confidence that it does not contain vulnerabilities or backdoors.”
The staffers added that when models are released “without disclosure of training data or pre-training, even though it may be modifiable by users in some ways, users of that model have only a limited ability to understand, verify, or mitigate any vulnerabilities in the model.”
That question of what really counts as open is being tackled by the LF AI and Data Foundation, which last month unveiled its isitopen.AI tool, which lays out a three tier hierarchy of open-ness.
The top level, Class – 1 open science, means all artifacts, including weights and training data, have been disclosed. So far, no models have reached the top tier.