OpenAI destroyed books for teaching GPT-3: Literary scandal
OpenAI destroyed huge datasets used to train its GPT-3 model in response to a lawsuit from the Authors Guild accusing the company of copyright infringement.
Lawsuit against OpenAI
The essence of the claim
Authors Guild, a legal organization that protects the rights of authors, filed a lawsuit against OpenAI. According to the lawsuit, the company used more than 100,000 copyrighted books to train artificial intelligence models, including GPT-3, without permission from copyright holders. This violates the law.
Ambiguous situation
High-quality data for training AI models is essential. Tech giants are sourcing this data from the internet, often without the consent of the content creators. The latter demand compensation for the use of their work, while companies seek to avoid additional costs. This confrontation leads to litigation.
The importance of controversial datasets
In 2020, OpenAI admitted that "books1" datasets " and "books2" accounted for 16% of all training data for GPT-3. They contained about 50 billion words from books taken from the Internet. The company stopped using these kits at the end of 2021 and removed them completely in 2022.
Closedness of OpenAI
OpenAI refuses to disclose details about the researchers who created the controversial datasets and the information about them, despite the requirements of the Authors Guild. In a statement, the company claims that current models, including ChatGPT, were not trained on this data.
Glossary
- OpenAI is a leading artificial intelligence company, the developer of GPT-3 and ChatGPT.
- GPT-3 is a powerful language model with 175 billion parameters created by OpenAI.
- The Authors Guild is the oldest and most respected professional copyright organization in the United States for writers.
Links
Answers to questions
What is the nature of the Authors Guild lawsuit against OpenAI?
What kind of organization is Authors Guild?
What information did OpenAI ask Authors Guild to provide?
Why is using books to teach AI controversial?
What percentage of the GPT-3 training data was books?
Hashtags
Save a link to this article
Discussion of the topic – OpenAI destroyed books for teaching GPT-3: Literary scandal
OpenAI removed two huge datasets “books1” and “books2” containing over 100,000 published books that were used to train the GPT-3 model. This prompted a lawsuit from the Authors Guild for copyright infringement.
Latest comments
8 comments
Write a comment
Your email address will not be published. Required fields are checked *
AndreeBellamy
You have no idea how serious this is! Using someone else's copyrighted content without permission is not only a violation of the law, but also a blow to the creative industry 😡 Writers spend years creating their work and are entitled to fair compensation.
AlexanderFischer
Andre is right. This is a matter of respect for intellectual property. Large companies like OpenAI should lead by example and not violate copyrights 💁♂️ While I understand their desire to innovate, they should find legal ways to obtain data.
MariaSolari
Look, on the one hand, I understand OpenAI's desire to use the highest quality data possible to train their models 🤖 But on the other hand, they really should have gotten permission from the authors before using their work. After all, intellectual property is sacred.
GrzegorzNowak
Hmm, what if we offer a compromise solution? 🤔 OpenAI could enter into agreements with publishers and pay royalties for the use of books in its data sets. This way, the authors will receive a reward, and AI technologies will be able to develop further.
VictorGrumpyOld
Again these modern things and technologies! 😠 In my time, people simply read books, and did not use them to train some kind of AI models. All this noise around neural networks and artificial intelligence is just big hype and a waste of time. What's wrong with ordinary reading and studying literature?
SofiaBorges
Hey, old Victor, don't be such a grump! 🙃 Technology is moving forward, and we must keep up with the times. AI can bring many benefits if it is developed responsibly and respects copyrights. You just need to find a balance between innovation and intellectual property protection.
PabloSanchez
Don't forget that AI models trained on books can ultimately help authors with their creativity 💡 Imagine how much easier a writer's job would be if AI generated plot ideas, character ideas, and even rough drafts of text. Of course, provided that the rights of the authors are protected.
AnnaPawlak
Very interesting question! 🧐 On the one hand, using copyrighted works without permission is wrong. But on the other hand, if this data helps develop AI that could benefit society in the future, isn't that worth the cost? 🤔 We need to find a reasonable compromise.