Permanently deleting sensitive data from large language models (LLMs) that power chatbots such as ChatGPT is extremely difficult, as is verifying whether the data has actually been deleted, scientists from the University of North Carolina study have discovered.
Worryingly, GPT-J – the training model used by the researchers for this study – is much, much smaller than the likes of GPT-3.5, the LLM powering the free version of ChatGPT. Theoretically, this means that permanently deleting sensitive data from the chatbot's language model is even trickier than it is with GPT-J.
Large Language Models: Hard to Scrub
Vaidehi Patil, Peter Hase, and Mohit Bansal authored a recent study published by the University of North Carolina, Chapel Hill, focusing on whether sensitive information can ever really be deleted by large language models such as ChatGPT and Bard.
They contend that the primary approach to deleting sensitive information from LLMs while retaining the model’s informativeness – Reinforcement Learning from Human Feedback (RLHF) – has a number of issues. Most LLMs, the researchers say, are still vulnerable to “adversarial prompts” even after RLHF.
🔎 Want to browse the web privately? 🌎 Or appear as if you're in another country?
Get a huge 86% off Surfshark with this special tech.co offer.
Even after RLHF, models “may still know… sensitive information. While there is much debate about what models truly “know” it seems problematic for a model to, e.g., be able to describe how to make a bioweapon but merely refrain from answering questions about how to do this.”
During experiments, the scientists say that even “state-of-the-art model editing methods such as ROME struggle to truly delete factual information from models like GPT-J”, an open-source LLM developed by Eleuther-AI in 2021.
By simulating white-box attacks – during which attackers know everything about the deployed model, including its parameters – the researchers were able to extract facts 38% of the time. Black-box attacks – during which only the model’s inputs are known – worked 29% of the time.
Why Data Might Be Even Harder to Remove from ChatGPT
GPT-J is a large language model similar to GPT-3, and has been fine-tuned with around 6 billion parameters.
Compared to the LLMs already being used to power popular chatbots, however, this is a very small model. It would be much easier, in theory, to scrub data from its model weights than it would be with its comparatively massive cousins.
The size difference is stark, too. GPT-3.5 is tuned with over 170 billion parameters, making it 28 times the size of the one used in the University of North Carolina study. Google's Bard is slightly smaller, trained on 137 billion parameters, but still much, much larger than GPT-J.
GPT-4, on the other hand, which is already being used by ChatGPT Plus customers, is tuned using eight different models each with 220 billion parameters – a total of 1.76 trillion parameters.
Be Careful With Your Chatbot Chat
After ChatGPT hit the market back in November 2022, OpenAI’s login page quickly became one of the most visited websites on the internet. Since then, a number of other chatbots have become well-known names, like Character AI, Bard, Jasper AI, and Claude 2.
While its capabilities and powers have been talked about at great length, less focus has been placed on discussing the privacy ramifications of these platforms, many of which are trained using your data (unless you specify otherwise).
The average user may not be thinking about the potential consequences of a hack or attack on ChatGPT creators OpenAI’s servers when they discuss personal topics with ChatGPT.
Tech workers at Samsung posted confidential source code into ChatGPT not long after its release, while in March, some ChatGPT users were shown the chat history of others using the chatbot, rather than their own.
What’s more, Cyberhaven estimated earlier this year that around 11% of the data employees were inputting into ChatGPt was either sensitive or confidential.
While we’re not suggesting giving up on using LLM-powered chatbots, it’s good to keep in mind that they’re not bulletproof, nor are your conversations with them necessarily confidential.