PARIS, FRANCE - JUNE 14: The Meta logo is displayed during the Viva Technology conference at Parc des Expositions Porte de Versailles on June 14, 2023 in Paris, France. Viva Technology, the biggest tech show in Europe but also in a unique digital format, for 4 days of reconnection and relaunch thanks to innovation. The event brings together startups, CEOs, investors, tech leaders and all of the digital transformation players who are shaping the future of the Internet. The annual technology conference, also known as VivaTech, was founded in 2016 by Publicis Groupe and Groupe Les Echos and is dedicated to promoting innovation and startups. (Photo by Chesnot/Getty Images)
(Photo by Chesnot/Getty Images)

Meta’s AI relies on millions of pirated books. How much longer can billionaires get away with large-scale theft?

A new exposé by The Atlantic has revealed the extent of Meta’s despicable piracy, as millions of stolen books and research papers were used to train its flagship generative AI program, Llama 3.

Recommended Videos

When Mark Zuckerberg’s Meta was dipping its toes into the waters of Gen AI for the first time, the company very briefly considered obtaining its data set legally. After all, not a single generative AI program can exist without being fed the blood, sweat, and tears of thousands of artists and writers, so why risk their wrath when you can play by the rules? (The sarcasm should be apparent there, as billionaires like Zuckerberg and OpenAI’s Sam Altman do not care one iota about who they’re stealing from and how it’s hurting them, as this latest story proves).

According to The Atlantic and newly revealed court documents, Meta employees briefly looked into officially licensing certain bodies of work, including works of fiction and research papers, to train its model. They quickly stumbled into a few roadblocks, however. Some believed this process was “unreasonably expensive” or “incredibly slow” and since, again, Generative AI cannot exist without other people’s work, they decided, with alleged permission from Zuckerberg himself, to download an entire library’s worth of content from a piracy site known as LibGen, which hosts millions (more than 7.5 million books and 81 million academic papers, with more being added every day) of stolen works. This site has been around for 17 years and has managed to evade being shut down by being hosted across numerous countries and using a peer-to-peer sharing system.

The extent of Meta’s theft is so massive that the writer of The Atlantic’s article, Alex Reisner, took it upon themselves to create an easily accessible resource for writers of all backgrounds to check if their work is part of LibGen’s pirated library. Reisner acknowledges that the data set might not be complete, as it’s difficult to know exactly how much Meta downloaded and whether LibGen’s metadata is accurate, but still. Authors all over social media are now justifiably outraged. On Threads (owned by Meta), which hosts a large writers’ and readers’ community, authors shared their heartbreak over their work being used to train something that is ultimately, as much as Gen AI bros and “tech disruptors” would like to deny it, meant to put them out of business. Art and creative human expression are meaningless to these techies—all they see is “content” and dollar signs.

“Authors have been told for years that piracy of our work was justified because of accessibility issues and that readers who pirate our work would never have bought legal copies. But the issue isn’t just a loss of sales, it’s a loss of control over our IP altogether, which has directly led to this,” writes Alexandra Bracken, a New York Times bestselling author.

“89 of my books (and 19 foreign editions) on that piracy site that Meta has scraped to feed their AI on. No words,” commented Karina Halle, another NYT bestseller and indie author.

A. K. Caggiano, a fantasy and romantic comedy writer, wrote, “Meta pushing their AI ‘help me write’ bullsh*t on the posts I create is even funnier now because WHAT DO YOU MEAN? I ALREADY WROTE IT! NOW YOU WANT ME TO PLAGIARIZE MYSELF?”

“Where’s my compensation, @meta? It would take ‘too long’ and be ‘too expensive’ for you to get the data legally, but what about the time, work, and money I spent creating and marketing those books?” Another writer, Francesca Zappia, wrote.

Thus far, lawmakers have been unable to keep pace with the rapidly changing Gen AI landscape, but clearly, something needs to change. Copyright laws are being circumvented or ignored altogether left and right. While tech autocrats like Zuckerberg get richer, those of us creating the work these companies need to train their often delusional, inaccurate, and creatively empty AI products are left with nothing, scrambling to make ends meet. Hopefully, those authors spreading the word online and discussing a class-action lawsuit will change the course of Gen AI’s history. Does anyone with even a shred of decency truly want to live in a world where every story or article we read is just regurgitated nonsense with no humanity behind it whatsoever?


The Mary Sue is supported by our audience. When you purchase through links on our site, we may earn a small affiliate commission. Learn more about our Affiliate Policy
Author
Image of El Kuiper
El Kuiper
El (she/her) is The Mary Sue's U.K. and weekend editor and has been working as a freelance entertainment journalist for over two years, ever since she completed her Ph.D. in Creative Writing. El's primary focus is television and movie coverage for The Mary Sue, including British TV (she's seen every episode of Midsomer Murders ever made) and franchises like Marvel and Pokémon. As much as she enjoys analyzing other people's stories, her biggest dream is to one day publish an original fantasy novel of her own.