**⇄ Σ = Mᄃ² ⇆** @[email protected] · 2024-07-20T04:13:33Z

⇄ Σ = Mᄃ² ⇆ @[email protected]

For years, the people building powerful artificial intelligence systems have used enormous troves of text, images and videos pulled from the internet to train their models.

Now, that data is drying up. Over the past year, many of the most important web sources used for training A.I. models have restricted the use of their data, according to a study published this week by the Data Provenance Initiative, an M.I.T.-led research group.

https://www.dataprovenance.org/consent-in-crisis-paper

Jul 20, 2024, 04:13 · 2· 1· 3

**⇄ Σ = Mᄃ² ⇆** @[email protected] · Jul 20, 2024, 04:13

**⇄ Σ = Mᄃ² ⇆** @[email protected] · Jul 20, 2024, 04:13

Jul 20, 2024, 04:13

⇄ Σ = Mᄃ² ⇆ @[email protected]

The study, which looked at 14,000 web domains that are included in three commonly used A.I. training data sets, discovered an "emerging crisis in consent," as publishers and online platforms have taken steps to prevent their data from being harvested.

The researchers estimate that in the three data sets -- called C4, RefinedWeb and Dolma -- 5 percent of all data, and 25 percent of data from the highest-quality sources, has been restricted.

**⇄ Σ = Mᄃ² ⇆** @[email protected] · Jul 20, 2024, 04:14

**⇄ Σ = Mᄃ² ⇆** @[email protected] · Jul 20, 2024, 04:14

Jul 20, 2024, 04:14

⇄ Σ = Mᄃ² ⇆ @[email protected]

https://www.nytimes.com/2024/07/19/technology/ai-data-restrictions.html

Those restrictions are set up through the Robots Exclusion Protocol, a decades-old method for website owners to prevent automated bots from crawling their pages using a file called robots.txt.

**⇄ Σ = Mᄃ² ⇆** @[email protected] · Jul 20, 2024, 04:17

**⇄ Σ = Mᄃ² ⇆** @[email protected] · Jul 20, 2024, 04:17

Jul 20, 2024, 04:17

⇄ Σ = Mᄃ² ⇆ @[email protected]

"We're seeing a rapid decline in consent to use data across the web that will have ramifications not just for A.I. companies, but for researchers, academics and noncommercial entities," said Shayne Longpre, the study's lead author, in an interview.

**Yours, Amazed, USA** @[email protected] · Jul 20, 2024, 04:15

**Yours, Amazed, USA** @[email protected] · Jul 20, 2024, 04:15

Jul 20, 2024, 04:15

Yours, Amazed, USA @[email protected]

@ecksmc I am an author working under a pseudonym in the adult fiction field, and I am not going to be publishing any completed works immediately.
I intend to sit on them until the overall climate for publishing improves.
Right now, crappy AI novels are appearing everywhere. It will take some time for the masses to work out how crappy they really are.

Resources

Developers

What is CounterSocial?

counter.social

More…