Archive.today CAPTCHA page executes DDoS; Wikipedia considers banning site

Aatube@thriv.social · edit-2 4 months ago

Archive.today CAPTCHA page executes DDoS; Wikipedia considers banning site

TheTechnician27@lemmy.world · edit-2 4 months ago

As a longtime editor who makes heavy use of archive.today (it’s often much more effective than the Wayback Machine), I’m deeply conflicted about this, and this is disgusting behavior on the part of archive.today; regardless of what a piece of shit the blog owner is, I hope they see prison time for abusing their trust to perpetrate this DDoS.

Right now, the Wikipedia RfC seems pretty split. This is a complicated issue, so I’m going to need to read and think more before I chime in. Just wild.

VonReposti@feddit.dk · 4 months ago

I don’t really see it as a complicated issue. Archive[.]today is now an unreliable source that uses its user traffic to engage in malicious activities. By using it, Wikipedia will become unreliable by proxy.

The best course of action is to distance yourself from it as quickly as possible.

FaceDeer@fedia.io · 4 months ago

Is it really an “unreliable source”, though? The owner of the site is acting maliciously with regards to this DDOS, of course, but that doesn’t necessarily mean he’s going to act maliciously about the contents of archive.today itself.

One could make the case that the owner of archive.today was already flagrantly flouting copyright law, and therefore a criminal, and therefore “unreliable” right from the get-go. Let’s not leap to conclusions here.

Wildmimic@anarchist.nexus · 4 months ago

Using visiting clients for attacking makes the site malicious, and it’s because the owner decided it should be, not because it was hacked or got served “spicy” ads or something.

Since this jarhead has no qualm in weaponizing his site, dragging every visitor into this, and threatening the owner of a small blog with creating a whole category of AI porn just for a blog post from 2 years ago: what if he decides he could use visiting clients for other uses, like crypto mining? If my wiki had 700k links pointing there, i’d think hard about my choices, and would want to reduce my dependency on such a source.

betterdeadthanreddit@lemmy.world · 4 months ago

Haven’t seen anything to indicate that Masha Rabinovich / Denis Petrov / [whoever runs the site] is a jarhead. Where’s that coming from?

Wildmimic@anarchist.nexus · edit-2 4 months ago

deleted by creator

TheTechnician27@lemmy.world · edit-2 4 months ago

I don’t really see it as a complicated issue.

That makes sense from (what I think is) an “outsider’s” perspective. From an “insider’s” perspective*, here’s the problem:

Wikipedia has a strict verifiability policy.
- This policy states that “Each fact or claim in an article must [correspond to reliable sources]”.
- This policy is the bedrock of Wikipedia. The project is fundamentally unsustainable without it, and we’re still undoing damage from decades ago when the policy either didn’t exist or was too loosely enforced.
- I’m making a third bullet point because I cannot emphasize enough how much “just ignore it lol” cannot work and has never worked.
Hundreds of thousands of articles have citations sourced to archive.today.
- This is despite the fact that the Internet Archive is prioritized whenever possible. We even have a prolific Internet Archive bot that (when possible) automatically recovers citations.
- The Interrnet Archive complies with blanket takedown requests of a domain very easily. Even if we ignore the ones going forward because now both resources are unreliable, archive.today would have untold millions of webpages archived which the IA does not – many of which are used on Wikipedia.
- Archive.today will archive material that the Internet Archive will simply fail to archive because, on a technical level, it’s just better at capturing a static snap of an article (which is what we want). It’s especially true for paywalled articles, which the Wayback Machine is often stymied by.
This would also make the Internet Archive the only remaining avenue for archiving URLs, meaning Wikipedia effectively collapses if something happens to the IA (granted that’d already be catastrophic with archive.today, much moreso than archive.today’s hypothetical removal).
Archiving URLs isn’t just some incidental thing.
- Citations are the backbone of Wikipedia. Casual readers might find them comforting to have. Researchers will rely on them. But editors cannot operate without them. We might actually use them more than readers do, because they help us a) check what’s already there, b) better understand the subject ourselves, and c) expand out the article.
- Link rot is so much more pervasive than I think people fully grasp. When I’m writing an article, if possible, I archive every single source I use at both the Wayback Machine and archive.today, because relying on the link staying up is objectively a mistake (and relying on just one is negligent).
- The security that archives offer generally just incalculably reduces the workload and mental load for editors.

If you’ve ever tried to add a citation on Wikipedia to a sentence that says “citation needed”, you’ve rubbed up against Brandolini’s law. A corollary is that it’s much, much harder to cite an uncited statement than it is to create one. If you remove archive.today, you flood Wikipedia with hundreds of thousands of these. It’s dampened a bit by the fact that the citation metadata is still there and that some URLs will still be live, but I cannot emphasize – as an editor of nearly 10 years, with over 25,000 contributions, and who’s authored two featured articles – that you’d introduce a workload that could never be done, whose repurcussions would be felt for decades at a time when Wikipedia is already on shaky footing.

Even if you somehow poofed away all that work, there are bound to be tens of thousands of statements in articles you have to get rid of because they simply cannot be reasonably sourced anywhere else. For many, many statements, this is not incidental information independent from the rest of the article; many of these removals would require you to fundamentally restructure the surrounding prose or even the entire article.

It’s hard for me to explain that you just have to “trust me bro” that those people voting “Option C” take what archive.today did very seriously and recognize that either option is going to mean major, irreparable damage to the project. Wikipedia is a lot different from the editing side than it is on the reading one; sometimes it’s liberating, sometimes it’s horrifying, and in this case it’s “I could use a hug”.

* “Outsider” and “insider” used to denote experience editing; most anyone can do anything on Wikipedia from the get-go.

floofloof@lemmy.ca · edit-2 4 months ago

We need an open-source internet archive site that isn’t based in the USA and isn’t run by someone who’ll jeopardize the whole enterprise to attack someone’s blog. Archive.today is a great thing to exist on the Internet and I hope it continues, but we need one that we know isn’t going to host malware or vanish on us.

That said, I don’t appreciate the blogger’s urge to doxx whoever runs the archive. It’s exactly the kind of site where the admins would need security and anonymity so the US Government or another power doesn’t shut them down. If you doxx the owner you could kill the site.

bamboo@lemmy.blahaj.zone · 4 months ago

Regarding the USA point, from the article, there are many indications that the site was founded by someone from Russia:

But in October 2025, the FBI sent a subpoena to domain registrar Tucows seeking “subscriber information on [the] customer behind archive.today” in connection with “a federal criminal investigation being conducted by the FBI.” We wrote about the subpoena, and our story included a link to Patokallio’s 2023 blog post in a sentence that said, “There are several indications that the [Archive.today] founder is from Russia.”

This is the link to the 2023 blog post: https://gyrovague.com/2023/08/05/archive-today-on-the-trail-of-the-mysterious-guerrilla-archivist-of-the-internet/

nullroot@lemmy.world · 4 months ago

Honestly this situation is wild. The whole article is a hundred percent worth a read. It’s just… So bizarre. Good luck to you wiki contributers navigating this situation.

Strawberry@sh.itjust.works · 4 months ago

I think the future of wikipedia looks a bit bleak if they drop archive.today now. They need a decent archiver to function. Internet archive is good but its a single group hosted in the US, plus any site with a paywall isn’t surviving on the internet archive very well.

They’ve needed good alternative for awhile and the need is just growing. I wish public libraries could fill the gap but its probably not realistic. We’ve had legal deposit requirements for non-print media in various jurisdictions for awhile but i’m doubtful how effective it is, nor is it convenient to access or use for wikipedia.

onehundredsixtynine@sh.itjust.works · edit-2 3 months ago

deleted by creator

4 months ago

what the fuck

I’m so confused

is archive.today dead now?

So archive.today owners got doxxed and they DDoSed the Doxxer as retaliation? Is that what happened?

Wildmimic@anarchist.nexus · edit-2 4 months ago

No, the original blogpost did not dox the .today owner, it just unearthed some other alias and the general idea that the owner might sit in russia.

2 years pass.

Now Tucows (the domain registrar for .today) got a demand from the FBI for all data they have on .today, which caused news pieces where the blog post was linked.

The .today owner wanted the blog post not reachable from those news articles, and sent an email to the blog owner with the request to “take the blog post down for a few months” so that the news articles wouldn’t link there anymore. Sadly, that mail went into the spam folder and the blogger didn’t see it.

Because there was no reaction to his mail, the owner of .today put code into his captcha page, DDoS-ing the blog. The blogger and the .today-owner later did mail with each other, but the .today-owner seems to be a pretty unreasonable and rude person.

Wikipedia is now split: on the one side, .today is the actual best archive site, because it doesn’t care about copyright, censorship and employs advanced scraping techniques, which can bypass a lot of paywalls (which the internet archive does not do). This makes it great for citing sources. On the other side it’s not very trustworthy to insert code in your captcha page that makes your computer part of a DDoS attack.

So now there are 3 options for wikipedia.

a) remove all archive.today links: this would be very,very disruptive since around 700k links on wikipedia would go dead
b) phase out archive.today, so that no new links are getting added in the future - that implies looking for an alternative, which could even be the wikimedia foundation itself
c) do nothing

Hope it helps with the confusion!

inari@piefed.zip · edit-2 4 months ago

It would be pretty incredible if the Wikimedia Foundation started a project to archive the web

FaceDeer@fedia.io · 4 months ago

I think that’d go pretty far beyond Wikimedia’s mandate, but having something whose purpose was specifically archiving just the sources for their articles would be pretty awesome.

inari@piefed.zip · edit-2 4 months ago

It supports the goal of free knowledge, so I think it wouldn’t veer far off its mission

Beep@lemmus.org · edit-2 4 months ago

Got downvoted when I posted it before.