Reddit has blocked the Internet Archive’s ability to archive entire pages and comments after it was discovered that some AI companies were circumventing the platform’s Wayback Machine to obtain data, violating the platform’s content usage policy and user privacy principles.

The Internet Archive will now only be able to archive snapshots of Reddit’s main page, without comments or full threads. This significantly reduces its role as a source for preserving deleted content or researching user activity.
According to Reddit spokesperson Tim Ratschmidt, there have been cases where AI companies that are prohibited from directly collecting data from Reddit have used archived copies from the Wayback Machine to train their models. Reddit is demanding that the Internet Archive take additional technical measures against such collection, which could be a condition for lifting the restrictions in the future.
Another reason was privacy: the Wayback Machine preserves even content that users have deleted, which is against the platform’s policies.
The Internet Archive and its Wayback Machine have been useful for preserving Reddit’s history for many years, especially during mass deletions or policy changes. For example, in 2023, the archive helped preserve content after a restriction on access to the Reddit API led to the closure of some popular subreddits.
There is speculation that the restrictions also have a financial motive: by reducing free access, Reddit may incentivize AI companies to enter into licensing agreements. Similar contracts have already been signed with OpenAI and Google, with the Google deal valued at 60 million $, and in total Reddit expects more than 200 million $ in revenue from such partnerships over several years.
Blocking the Internet Archive to preserve Reddit’s content demonstrates a new stage in the fight against unauthorized data collection by platforms, especially in the context of the development of artificial intelligence. At the same time, the decision raises questions about the balance between protecting user privacy, preserving digital history, and monetizing data.
SEO text
Reddit has limited the Internet Archive’s ability to store entire pages and comments in an effort to stop indirect data collection by AI companies and protect user privacy. The Archive will now only store the main page, reducing its role in preserving deleted content. The restrictions may also be financially motivated, as Reddit has already made multi-million dollar deals with OpenAI and Google to sell data.