Elon Musk’s X already used your data to train his own artificial intelligence. Soon other companies will be able to do the same.
As of November 15, the social media site formerly known as Twitter will share user data – including posts, likes, bookmarks and reposts – with third-party platforms that can use the information to train AI models.
The company updated its privacy policy on Wednesday to detail the changes. When the policy takes effect, users will be automatically logged in until they log out.
“Depending on your settings, or if you decide to share your information, we may share or disclose your information with third parties,” the updated policy reads.
“If you do not opt out, in some cases the recipients of the information may use it for their own independent purposes in addition to those stated in X’s privacy policy, for example to train their artificial intelligence models, whether generative or be different.”
This is the latest arms race. Everyone is working towards AI supremacy.– Ritesh Kotak, cybersecurity expert
As user data becomes an increasingly valuable resource, social media platforms are in a gold mine — and selling that information to artificial intelligence companies is a lucrative business.
“This is the latest arms race. Everyone is working for AI supremacy,” said Ritesh Kotak, a cybersecurity and technology analyst based in Toronto.
“The more data sets you have, the more people involved in the data being collected from, the more accurate your model will be.”
Why sites like Reddit sell data to AI companies
The change comes just a few months after X quietly changed its privacy policy and gave itself permission to train the company Grok chatbot on user data.
But that led to an investigation by the European Union’s privacy regulator, which ended with X agrees to stop collecting user data from that region for the purpose of training Grok.
LinkedIn has that too gave himself permission to train its artificial intelligence models on user data, and Meta used public Instagram and Facebook posts to train its own AI virtual assistant.
Like
“The traditional processes they have used [to] monetization, whether through advertising or through subscription methods, do not work well,” said Shrestha.
The offers include:
- Reddit reportedly closed one such deal with Google this year, with Reuters reporting the deal is worth $60 million a year.
- Stack Overflow, an online community for developers, started charging AI companies for scraping their data to train their bots last year.
- Tumblr and WordPress reportedly made a deal with generative AI companies Midjourney and OpenAI to sell user data to train their AI tools.
Some news publishers and stock photo companies have made similar deals: Shutterstock’s licensing business generated more than, say, $100 million last year. Many others have done so too AI giants indicted for scraping their contents without permission, or warned them against doing this.
And what does it bring to the big technology companies? Social media posts are a valuable form of data because they can convey emotions and reflect how people actually speak and think, Kotak said.
“Social media posts may contain very little high-quality content from a technical perspective or from what’s happening in the world, but… [they are] rich in sentimental analysis,” he said.
Can you unsubscribe?
As of Friday, it appeared that X had not updated its settings with an option to opt out of the change before the November 15 start date. CBC News has contacted the company.
“As a user, you may simply not want your messages or personal information used to train algorithms for the rest of the world to use,” says Kotak.
“These platforms literally defaulting to using your data to train these algorithms means you no longer have a choice. Unless you go in and prevent that from happening.”
Normally users can opt out of such changes by going to the settings, privacy and securityand under the heading data sharing and personalization, enable the option ‘share data with business partners’.
But opt-outs aren’t always clear-cut, Kotak said, noting it’s an AI model cannot necessarily unlearn the data it receives when a user logs out after training has started.
“There’s no way to undo that and ensure that the data you’ve already published is actually pulled out of the learning model,” he said.
“If you don’t pay for the product, you are the product. And in this case, the data is the product.”