You are viewing a single thread.
View all comments View context
12 points

AI scrapers illegally harvesting data are destroying smaller and open source projects. Copyright law is not the only victim

https://thelibre.news/foss-infrastructure-is-under-attack-by-ai-companies/

permalink
report
parent
reply
0 points

In this case they just need to publish the code as a torrent. You wouldn’t setup a crawler if there was all the data in a torrent swarm.

permalink
report
parent
reply

I’ve heard stuff like bittorent doesn’t work well when the data is often updated or changed

I might be totally wrong, I’ve only ever used it once when downloading Wikipedia

permalink
report
parent
reply
0 points

That article is overblown. People need to configure their websites to be more robust against traffic spikes, news at 11.

Disrespecting robots.txt is bad netiquette, but honestly this sort of gentleman’s agreement is always prone to cheating. At the end of the day, when you put something on the net for people to access, you have to assume anyone (or anything) can try to access it.

permalink
report
parent
reply
2 points

You think Red Hat & friends are just all bad sysadmins? Source hut maybe…

I think there’s a bit of both: poorly optimized/antiquated sites and a gigantic spike in unexpected and persistent bot traffic. The typical mitigations do not work anymore.

Not every site is and not every site should have to be optimized for hundreds of thousands of requests every day or more. Just because they can be doesn’t mean that it’s worth the time effort or cost.

permalink
report
parent
reply

Technology

!technology@lemmy.world

Create post

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


Community stats

  • 22K

    Monthly active users

  • 15K

    Posts

  • 631K

    Comments