Researchers at ETH Zurich have just published a study on how artificial intelligence (AI) tools – such as generative AI chatbots – are accurately inferring sensitive information people, based solely on what they type online. This includes details around race, gender, age, and location. It means everything time you use ChatGPT prompts, you could be unknowingly revealing personal information about yourself.
According to the study’s authors the concern lies in hackers and fraudsters exploiting this functionality in social engineering attacks, as well as the obvious worries around data privacy.
Concerns about AI’s capabilities aren’t new, but they do seem to be rising in line with the rate of innovation. This month alone there have already been significant security issues raised, with the US Space Force banning ChatGPT and similar platforms over data security concerns. With data breaches seemingly everywhere this year, fears around new technologies such as AI are somewhat inevitable.
Privacy Regulations Aren't Enough to Stop Chatbots
The study on large language models (LLMs) set out to find if AI tools could violate an individual’s privacy by inferring personal information from things they’ve written online.
To do this, researchers created a dataset from 520 real Reddit profiles and were able to show that LLMs correctly inferred a wide range of personal attributes including location, job, gender, and race. All of which would typically be protected under privacy regulations.
Mislav Balunovic, a PhD student at ETH Zurich and one of the authors of the study, said: “The key observation of our work is that the best LLMs are almost as accurate as humans, while being at least 100x faster and 240x cheaper in inferring such personal information”.
🔎 Want to browse the web privately? 🌎 Or appear as if you're in another country?
Get a huge 86% off Surfshark with this special tech.co offer.
This opens up massive privacy concerns, particularly as the information was assumed “at a previously unattainable scale”. With such functionality, users could be targeted by hackers asking them mundane questions in an unsuspecting way.
AI Deduces User's Location Correctly
Balunovic went on to say, “Individual users, or basically anybody who leaves textual traces on the internet, should be more concerned as malicious actors could abuse the models to infer their private information.”
Researchers tested four models in total, with GPT-4 scoring 84.6% accuracy and coming out on top in inferring personal details. Meta's Llama2, Google's PalM, and Anthropic's Claude were also tested and followed closely behind.
An example of data inference from the study showed the researcher’s model had inferred that a Reddit user was from Melbourne because of their use of the term “hook turn”. This phrase is commonly used in Melbourne to describe a traffic maneuver. All in all, it highlights how benign the information is for LLMs to deduce something meaningful from it.
A slight glimmer of privacy recognition was seen when Google's PalM refused to answer around 10% of the researcher’s privacy-invasive prompts. Other models followed suit, but to a lesser extent.
However, this isn't quite enough to significantly resolve concerns. Martin Vechev, a professor at ETH Zurich and one of the study's authors, said “It's not even clear how you fix this problem. This is very, very problematic.”
Think Before You Type
With the rise of LLM-powered chatbot use in everyday life, privacy concerns aren’t a risk that will simply disappear through innovation. All users should be aware that the threat of privacy-invasive chatbots is tipping from ‘emerging’ into ‘very real’.
Earlier this year, a study found that AI could decipher text with 93% accuracy, based on the sound of typing that had been recorded over Zoom. This poses a problem for the input of sensitive data, such as passwords.
While this latest news is worrisome, it’s essential to know so that individuals can start taking proactive privacy measures themselves. Being mindful of what you’re inputting into chatbots, and knowing it likely won’t remain confidential, can help you tailor your usage and protect your data.