The shady world of Brave selling copyrighted data for AI training
I'm fairly certain that I was not the only person in the world who thought to himself, "Did they just yoink the entire Internet and bundle it together into a glorified copy and paste machine?" upon the release of ChatGPT.
And even though there are some concerns about the type of data that was used to train OpenAI's latest model, it seems that the overall stance of OpenAI and other companies working on similar projects is that it is fair use. Whether or not that is going to hold up in the long run, remains to be seen.
After Google published an announcement saying they're interested in exploring alternatives to robots.txt to provide broader control over AI-related content issues, I was curious to see what other search engines are doing in regard to AI, both for dealing with AI-generated content but also handling data.
Personally, I'm not a big fan of these conglomerates ingesting other people's work and then reselling it, which also leads me to the story I'm going to talk about today.
https://stackdiary.com/brave-selling-copyrighted-data-for-ai-training/
Tetrachloride
(8,443 posts)usonian
(13,550 posts)That's why El0n is walling in twitter. Now, it's his personal and secret trove of training data.
There are too many articles to post on the scraping of copyrighted works --- beyond "fair use" --- and also stripping off copyright notices in the data harvest. Lawsuits and more to come. And it's the repurposing of the works that is also at issue. Ripping off content to "create" a flood of similar and competing content.
Those are the current arguments being raised.
https://venturebeat.com/ai/what-sarah-silvermans-lawsuit-against-openai-and-meta-really-means-the-ai-beat/
A giant shitshow. These are broad strokes. Techies can scan Hacker News https://news.ycombinator.com/newest and others, for more. Lots more. HN has a search box that you can sort by popularity or date. For the most popular items at any given time: https://news.ycombinator.com/best
And then, those cyber criminals:
WormGPT - The Generative AI Tool Cybercriminals Are Using to Launch BEC Attacks | SlashNext
https://slashnext.com/blog/wormgpt-the-generative-ai-tool-cybercriminals-are-using-to-launch-business-email-compromise-attacks/
Did your boss write that email? Maybe not. Sure looks genuine.