Security Measures of Meta and Google's AI Models Bypassed Within Minutes

Deep News05-25 21:21

Multiple software tools can remove the safety guardrails from artificial intelligence models developed by companies like Meta and Google, with thousands of modified versions already created, stripping away the original control constraints.

Using a tool found on the code platform GitHub, a reporter was able to break through the security defenses of Meta's Llama 3.3 model in less than ten minutes, without the need for specialized hardware.

Questions that the original model refused to answer due to policy violations could be answered by the modified versions.

This revelation has heightened concerns among regulators and AI companies: as open-source models become increasingly powerful, the safety measures implemented by developers are becoming harder to maintain.

Kevin Etaajav, an assistant professor of AI applications at the University of Chicago Booth School of Business, stated, "Previously, only specialized, experienced personnel could bypass these safety measures. Now, even average users can do it easily."

Researchers note that as top-tier AI models advance in capability, related security vulnerabilities are becoming more severe. In April, Anthropic indicated that its Claude Mythos model had identified widespread security flaws in mainstream operating systems and web browsers.

The widespread dissemination of tampered models poses challenges for governments and companies seeking to control AI from the development stage. These tools can be freely copied and modified, operating beyond the control of the original developers.

Major AI laboratories invest heavily in building safety barriers to prevent model misuse. However, techniques like ablation cracking can quickly strip away the safety restrictions of open-source models, allowing users to download and modify them at will.

This method is difficult to apply to closed-source models like Claude and ChatGPT, as their underlying code is not publicly available. Meanwhile, open-source models often match the performance of top closed-source products within six months to a year.

While specialized technical groups could previously bypass the protections of high-end closed-source models, now even novice internet users can easily obtain tampered models online.

OpenAI, in its open-source GPT models, uses training datasets from which dangerous data has been removed.

Etaajav disputes this approach, arguing that removing dangerous content can lead to a model having a limited understanding, unable to recognize malicious usage scenarios. Simply deleting harmful data does not ensure a model is compliant or safe.

The Alice laboratory did not inform Meta, Google, or the GitHub platform in advance before disclosing these findings to the media.

Google responded that ablation cracking is a common technical challenge for all open-source models, adding that its open-source models undergo rigorous internal safety evaluations before release to mitigate various risk scenarios.

GitHub stated that the platform strictly prohibits content that directly aids illegal attacks or the spread of malicious programs. However, source code related to malware research that has educational value or contributes positively to the cybersecurity industry is not subject to bans.

Meta did not respond to requests for comment. Informed sources indicated that companies assess security risks before releasing open-source models based on advanced AI development frameworks. Versions with significant catastrophic risks are not made public until protective measures are improved.

Disclaimer: Investing carries risk. This is not financial advice. The above content should not be regarded as an offer, recommendation, or solicitation on acquiring or disposing of any financial products, any associated discussions, comments, or posts by author or other users should not be considered as such either. It is solely for general information purpose only, which does not consider your own investment objectives, financial situations or needs. TTM assumes no responsibility or warranty for the accuracy and completeness of the information, investors should do their own research and may seek professional advice before investing.

Security Measures of Meta and Google's AI Models Bypassed Within Minutes

Comments