Scandalous AI training: Apple, NVIDIA and Anthropic used YouTube without permission
Leading technology companies including Apple, NVIDIA, and Anthropic have used YouTube video transcripts to train their AI models without proper authorization, raising concerns about the ethics and legality of such practices.
Training AI models
Unauthorized use of YouTube data
According to a study by Proof News, leading tech giants such as Apple, NVIDIA, and Anthropic used subtitles from more than 173,000 YouTube videos to train their AI systems. These materials were collected from more than 48 thousand channels, violating the rules of the platform, which prohibit the collection of content without appropriate permission.
Content sources and diversity
The companies used a wide range of sources, including educational channels (Khan Academy, MIT, Harvard), leading news agencies (The New York Times, BBC, ABC News), as well as entertainment shows and popular YouTube bloggers. It is interesting that among the materials used were even those that promoted dubious theories, for example, about a flat Earth.
Reaction of content owners
Many channel owners whose videos were used to train the AI were not informed of this. Some of them express concern about the ability of AI to generate content similar to theirs, or even create exact copies.
The Role of EleutherAI and The Pile Dataset
EleutherAI, the organization that created the YouTube Subtitles dataset, has not commented on allegations of video misuse. Their collection, The Pile, contains not only subtitles from YouTube, but also material from other sources, including the European Parliament, Wikipedia and even emails from Enron employees.
Data Collection Methodology
Sid Black, founder of EleutherAI, developed a tool to automatically download subtitles from YouTube using the platform's API. It applied around 500 search queries to collect a variety of content covering topics from science to cooking.
Ethical and legal issues
Although YouTube's terms of use prohibit automated access to videos, thousands of GitHub users have approved Black's code. This raises questions about the ethics and legality of such practices in the field of AI development.
Transparency in AI development
AI companies often do not provide transparency about the data used for training their models. For example, Apple has recently been criticized for not being transparent about the data sources for their Apple Intelligence development.
YouTube as a resource for AI
YouTube, being the world's largest video repository, is an extremely valuable resource for AI training models, providing access to a huge number of transcriptions, audio, video and images. This makes the platform particularly attractive for AI developers, but also raises questions about the ethical and legal use of this data.
Glossary
- Apple is an American technology company known for its innovative products and services
- NVIDIA is a leading developer of graphics processors and artificial intelligence technologies
- Anthropic is a company specializing in the development of safe and ethical AI
- YouTube - the world's largest video sharing platform
- EleutherAI - an organization engaged in open research in the field of AI
Links
- Proof News Investigation
- GitHub Subtitle Downloader
- Criticism of Apple for opacity
- OpenAI's response to YouTube video usage
Answers to questions
Which companies have used YouTube transcripts to train AI models?
What types of content have been used to teach AI?
How did EleutherAI get access to YouTube subtitles?
What is the reaction of YouTube channel owners to the use of their content?
Why is YouTube an attractive data source for AI training?
Hashtags
Save a link to this article
Discussion of the topic – Scandalous AI training: Apple, NVIDIA and Anthropic used YouTube without permission
A Proof News investigation found that leading tech companies including Apple, NVIDIA and Anthropic used YouTube video transcripts to train their AI models without obtaining proper permissions.
Latest comments
8 comments
Write a comment
Your email address will not be published. Required fields are checked *
Oleksandr
Wow, this is just a shock! 😱 It turns out that giants like Apple and NVIDIA used our data without permission? This is a violation of privacy! I wonder how this will affect the development of AI?
Mariia
Yes, Oleksandr, that is really impressive. But let's think - isn't this inevitable in a world where data is becoming the new oil? 🤔 Maybe we need new laws to regulate the use of data in AI education?
Pietro
Mariia, you are right about the laws. But I'm more concerned about the use of conspiracy theory content. Imagine if AI starts generating fakes based on this information! 😨 This can become a real problem for society.
Sophie
Pietro, I agree with you. But don't forget that AI is just a tool. It all depends on how we use it. Maybe we need to focus more on ethical AI training and data validation? 🧐
Helmut
Phew, that AI chatter again. All this is just fashionable nonsense. We've lived just fine without these smart machines, and we'll continue to do so. It would be better to deal with real problems instead of inventing new ones.
Oleksandr
Helmut, I understand your skepticism, but AI is already here and actively developing. 🚀 Ignoring him is not an option. Sophie is right about ethical teaching. Maybe we should focus on how to make AI useful and safe for everyone?
Mariia
I agree with Oleksandr! 👍 And I'm also interested in how it will affect content creators. Imagine if AI could create videos in the style of popular YouTubers? This could change the entire industry!
Pietro
Interesting point, Mariia! 🤔 Perhaps this will lead to the emergence of new forms of creativity and collaboration between humans and AI. But it is definitely necessary to solve the issue of copyright and ethics of data use. This could really be a revolution in the content industry! 🎬🤖