Apple Intelligence Reportedly Trained on YouTube Videos Without Consent

Apple Intelligence Reportedly Trained on YouTube Videos Without Consent

Soon after OpenAI unveiled ChatGPT in November 2022, it sparked a debate among creatives: what data did they use to train their AI model? It was followed by the first lawsuit where two authors alleged that the company used their work without authorization to train the AI dataset.

Today, some most influential brands are on the list for a similar reason, including Apple, Salesforce, Nvidia, and Anthropic, training their AI dataset on YouTube videos without consent or proper authorization. A report from WIRED, in collaboration with Proof News details what exactly happened.

The investigation “found that subtitles from 173,536 YouTube videos, siphoned from more than 48,000 channels, were used by Silicon Valley heavyweights, including Anthropic, Nvidia, Apple, and Salesforce.”

The supplier in question who provided the tech giants with data is EleuterAI which put together a dataset called Pile, which Apple reportedly used to train its LLM. A portion of this dataset, named YouTube subtitles, consists of subtitles from YouTube videos without permission. That’s not only unethical but also a clear violation of YouTube’s terms and conditions.

The Mac Observer reached out to Apple for comment on this story, but as of the publication time, Apple has not responded. We will update this story as soon as we receive a response from Apple.

While Apple Intelligence has been late to the AI party, as I have often said, I’ve always argued that the company has been ethical in its practices (take, for instance, when it approached publishers to make a deal to train its AI dataset on archives). However, despite these intentions, it appears that Apple Intelligence has been trained on YouTube subtitles without proper authorization, which doesn’t leave a very right impression.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.