815

The Internet Archive is under attack, with a popup claiming a ‘catastrophic’ breach(www.theverge.com)

posted 11 months ago

by

misk@sopuli.xyz

in

technology@lemmy.world

84 commentshide report

Sort:

Hot Top Controversial New Old

You are viewing a single thread.

View all comments View context

[ - ]

7fb2adfb45bafcc01c80@lemmy.world

-69 points

11 months ago

I just sent a DMCA takedown last week to remove my site. They’ve claimed to follow meta tags and robots.txt since 1998, but no, they had over 1,000,000 of my pages going back that far. They even had the robots.txt configured for them archived from 1998.

I’m tired of people linking to archived versions of things that I worked hard to create. Sites like Wikipedia were archiving urls and then linking to the archive, effectively removing branding and blocking user engagement.

Not to mention that I’m losing advertising revenue if someone views the site in an archive. I have fewer problems with archiving if the original site is gone, but to mirror and republish active content with no supported way to prevent it short of legal action is ridiculous. Not to mention that I lose control over what’s done with that content – are they going to let Google train AI on it with their new partnership?

I’m not a fan. They could easily allow people to block archiving, but they choose not to. They offer a way to circumvent artist or owner control, and I’m surprised that they still exist.

So… That’s what I think is wrong with them.

From a security perspective it’s terrible that they were breached. But it is kind of ironic – maybe they can think of it as an archive of their passwords or something.

report

reply

[ - ]

Duamerthrax@lemmy.world

45 points

11 months ago

Not to mention that I’m losing advertising revenue if someone views the site in an archive.

No one is using Internet Archive to bypass ads. Anyone who would think of doing that already has ad blockers on.

report

reply

[ - ]

7fb2adfb45bafcc01c80@lemmy.world

-12 points

11 months ago

You misunderstood. If they view the site at Internet Archive, our site loses on the opportunity for ad revenue.

report

reply

[ - ]

Duamerthrax@lemmy.world

15 points

11 months ago

I completely understood. No one is going to IA as their first stop. They’re only going there if they want to see a history change or if the original site is gone.

report

reply

[ - ]

7fb2adfb45bafcc01c80@lemmy.world

-3 points

11 months ago

Yes, some wikipedia editors are submitting the pages to archive.org and then linking to that instead of to the actual source.

So when you go to the Wikipedia page it takes you straight to archive.org – that is their first stop.

report

reply

[ - ]

ikidd@lemmy.world

9 points

11 months ago

Because if you’re referencing something specific, why would you take the chance that someone changes that page? Are you going to monitor that from then on and make sure it’s still correct/relevant? No, you take what is effectively a screenshot and link to that.

You aren’t really thinking about this from any standpoint except your advertising revenue.

report

reply

[ - ]

7fb2adfb45bafcc01c80@lemmy.world

-6 points

11 months ago

I’m thinking about it from the perspective of an artist or creator under existing copyright law. You can’t just take someone’s work and republish it.

It’s not allowed with books, it’s not allowed with music, and it’s not even allowed with public sculpture. If a sculpture shows up in a movie scene, they need the artist’s permission and may have to pay a licensing fee.

Why should the creation of text on the internet have lesser protections?

But copyright law is deeply rooted in damages, and if advertising revenue is lost that’s a very real example.

And I have recourse; I used it. I used current law (DMCA) to remove over 1,000,000 pages because it was my legal right to remove infringing content. If it had been legal, they wouldn’t have had to remove it.

report

reply

Show more comments

Show more comments

Show more comments

Show more comments

Show more comments

[ - ]

Adanisi@lemmy.zip

30 points

11 months ago

*

Deleted by creator

report

reply

[ - ]

7fb2adfb45bafcc01c80@lemmy.world

-4 points

11 months ago

*

What do you mean by “engagement”, exactly? Clicking on ads?

In SEO terms user engagement refers to how people interact with the website. Do they click on another link? Does a new blog posting interest them?

Lmao you think Google needs to go through Archive to scrape your site? Delusional.

Any activiity from Google is easier to track and I have a record if who downloaded content if it’s coming from my servers.

The mechanisms used to serve ads over the internet nowadays are nasty in a privacy sense, and a psychological manipulation sense. And you want people to be affected by them just to line your pockets? Are you also opposed to ad blockers by any chance?

I agree that many sites use advertising in a different way. I use it in the older internet sense – someone contacts me to sponsor a page or portion of the site, and that page gets a single banner, created in-house, with no tracking. I’ve been using the internet for 36 years. I’m well aware of many uses that I view as unethical, and I take great pains not to replicate them on my own site.

I disapprove of ad blockers. I approve of things that block tracking.

As far as “lining my own pockets” goes, I want to recoup my hosting costs. I spend hours researching for each article/showcase, make the content free to view, and then I’m expected to pay to share it with anyone who’s interested? I have a day job. This is my hobby, but it’s also my blood, sweat, and tears.

And how do you suggest a site which has been wiped off the face of the internet gets archived? Maybe we need to invest in a time machine for the Internet Archive?

archive.org could archive the content and only publish it if the page has been dark for a certain amount of time.

report

reply

[ - ]

Adanisi@lemmy.zip

6 points

11 months ago

*

Deleted by creator

report

reply

[ - ]

7fb2adfb45bafcc01c80@lemmy.world

0 points

11 months ago

It’s user-driven. Nothing would get archived in this case. And what if the content changes but the page remains up? What then? Fairly sure this is why Wikipedia uses archives.

That’s a good point.

Pretty sure mainstream ad blockers won’t block a custom in-house banner. And if it has no tracking, then it doesn’t matter whether it’s on Archive or not, you’re getting paid the same, no?

Some of them do block those kinds of ads – I’ve tried it out with a few. If it’s at archive.org I lose the ability to report back to the sponsor that their ad was viewed ‘n’ times (unless, ironically, if I put a tracker in). It also means that if sponsorship changes, the main drivers of traffic like Wikipedia may not see that. It makes getting new sponsors more difficult because they want something timely for seasonal ads. Imagine sponsoring a page, but Wikipedia only links to the archived one. Your ad for gardening tools isn’t reflected by one of the larger drivers of traffic until December, and nobody wants to buy gardening tools in December.

Yes, I could submit pages to archive.org as sponsorship changes if this model continues.

It was a much bigger deal when we used Google ads a decade ago, but we stopped in early 2018 because tracking was getting out of hand.

If I was submitting pages myself I’d be all for it because I could control when it happened. But there have times when I’ve edited a page and totally screwed it up, and archive.org just happened to grab it at that moment when the formatting was all weird or the wrong picture was loaded. I usually fix the page and forget about it until I see it on archive.org later.

I asked for pages like that to be removed, but archive.org was unresponsive until I used a DMCA takedown notice.

report

reply

Show more comments

[ - ]

StopJoiningWars@discuss.online

4 points

11 months ago

SEO killed the internet. You’re literally part of the reason why people go look for alternatives to viewing your website, no one wants ads.

report

reply

[ - ]

7fb2adfb45bafcc01c80@lemmy.world

-1 points

11 months ago

I don’t think you know what SEO is. I think you know what bad SEO is.

Anyhow, Wikipedia is always free to link somewhere else if they can find better content.

report

reply

Show more comments

[ - ]

MonkderVierte@lemmy.ml

24 points

11 months ago

*

Wait, people prefer the archived version? Too much ads?

report

reply

[ - ]

7fb2adfb45bafcc01c80@lemmy.world

-27 points

11 months ago

*

Removed by mod

report

reply

[ - ]

NιƙƙιDιɱҽʂ@lemmy.world

36 points

11 months ago

Did you just draw comparison between redistribution of publicly available content and…rape? Dang.

report

reply

[ - ]

theherk@lemmy.world

8 points

11 months ago

Hey, if they choose to wrap their comments in completely inane reasoning they should be allowed to.

report

reply

[ - ]

NιƙƙιDιɱҽʂ@lemmy.world

3 points

11 months ago

I 100% agree with you. I’m also allowed to call them out on their bullshit haha

report

reply

Show more comments

[ - ]

GHiLA@sh.itjust.works

1 point

11 months ago

Deleted by creator

report

reply

Show more comments

[ - ]

Adanisi@lemmy.zip

11 points

11 months ago

*

Removed by mod

report

reply

[ - ]

7fb2adfb45bafcc01c80@lemmy.world

-6 points

11 months ago

Someone asked a question and I answered honestly. I’m sorry that you can’t understand my perspective.

report

reply

Show more comments

[ - ]

MonkderVierte@lemmy.ml

5 points

11 months ago

*

Meaning, your content changes often?

I only try to understand why you seem to be especiallly affected.

report

reply

[ - ]

Red Army Dog Cooper@lemmy.ml

17 points

11 months ago

how do you expect an archive to happen if they are not allowed to archive while it is still up. How are you suposed to track changed or see how the world has shifted. This is a very narrow and in my opinion selfish way to view the world

report

reply

[ - ]

7fb2adfb45bafcc01c80@lemmy.world

-2 points

11 months ago

how do you expect an archive to happen if they are not allowed to archive while it is still up.

I don’t want them publishing their archive while it’s up. If they archive but don’t republish while the site exists then there’s less damage.

I support the concept of archiving and screenshotting. I have my own linkwarden server set up and I use it all the time.

But I don’t republish anything that I archive because that dilutes the value of the original creator.

report

reply

[ - ]

KyuubiNoKitsune@lemmy.blahaj.zone

8 points

11 months ago

What if I’m looking for something but the page has changed?

report

reply

[ - ]

7fb2adfb45bafcc01c80@lemmy.world

-4 points

11 months ago

Shouldn’t that be the content creator’s prerogative? What if the content had a significant error? What if they removed the page because of a request from someone living in the EU requested it under their laws? What if the page was edited because someone accidentally made their address and phone number public in a forum post?

report

reply

[ - ]

Landsharkgun@midwest.social

3 points

11 months ago

Nah. It just lets slimy gits claim they never said XYZ, or that such and such a thing never happened. With as volatile a storage media as internet media, hard backups are absolutely necessary. Put it this way; would you have the same complaimt about a newspaper? A TV show? Post your opinion piece to a newspaper and it’s fixed in ink forever. Yet somehow you complain when that same opinion piece is on a website? Get outta here.

report

reply

[ - ]

7fb2adfb45bafcc01c80@lemmy.world

1 point

11 months ago

Like I said, I have no problems with individuals archiving it and not republishing it.

If I take a newspaper article and republish it on my site I guarantee you I will get a takedown notice. That will be especially true if I start linking to my copy as the canonical source from places like Wikipedia.

It’s a fine line. Is archive.org a library (wasn’t there a court case about this recently…) or are they republishing?

Either way, it doesn’t matter for me any more. The pages are gone from the archive, and they won’t archive any more.

report

reply

Show more comments

Show more comments

Show more comments

[ - ]

zarkanian@sh.itjust.works

0 points

11 months ago

A couple of good examples are lifehacker.com and lifehack.org. Both sites used to have excellent content. The sites are still up and running, but the first one has turned into a collection of listicles and the second is an ad for an “AI-powered life coach”. All of that old content is gone and is only accessible through the Internet Archive.

In fact, many domains never shut down, they just change owners or change direction.

report

reply

[ - ]

7fb2adfb45bafcc01c80@lemmy.world

0 points

11 months ago

*

Again, isn’t that the site’s prerogative?

I think there should at least be a recognized way to opt-out that archive.org actually follows. For years they told people to put

User-agent: ia_archiver
Disallow:

in robots.txt, but they still archived content from those sites. They refuse to publish what IP addresses they pull content down from, but that would be a trivial thing to do. They refuse to use a UserAgent that you can filter on.

If you want to be a library, be open and honest about it. There’s no need to sneak around.

report

reply

Show more comments

[ - ]

jqubed@lemmy.world

7 points

11 months ago

About the only thing I can agree with you on here is I don’t like when people on Wikipedia archive a link and then list that as the primary source in the reference instead of the original link. Wikipedia (at least in English) has a proper method to follow for citations with links and the archived version should only become the primary if the original source is dead or has changed and no longer covers the reference.

They should also honor a DMCA takedown and robots.txt, but at least with the DMCA I’m sure there’s a backlog. Personally I’ve always appreciated the archive’s existence, though, and would think their impact is small enough that it’s better to have them than block them.

report

reply

Technology

!technology@lemmy.world

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

Community stats

16K
Monthly active users
17K
Posts
720K
Comments

Community moderators