Reddit is restricting its availability to the Internet Archive’s Wayback Machine

The Internet Archive’s Wayback Machine is the latest victim of Reddit’s crackdown on data access. The company has begun to place new restrictions on what the archive site will be able to access in a move that will significantly limit the Wayback Machine’s ability to preserve information from Reddit.
With the change, the Wayback Machine, a project run by the nonprofit Internet Archive, will only be able to crawl Reddit’s homepage. It will no longer be able to access comments, subreddit pages, post details, profiles and other data.
The move is the latest step Reddit has taken on its quest to limit AI companies’ ability to use its data to train large language models without paying licensing fees. It’s also a notably different stance than the company took last year, when it explicitly said that it would not limit “good faith actors,” including the Internet Archive. It’s not clear what exactly has changed since then. Reddit seems to believe that AI companies are circumventing its rules by scraping data via the Wayback Machine. We’ve reached out to the Internet Archive for comment.
Data licensing has become a significant business for Reddit. The company has struck multimillion-dollar deals with OpenAI and Google that allow them to use Reddit posts to help train their AI models. At the same time, Reddit has taken an increasingly hardline stance against companies that attempt to use its data without such arrangements. Earlier this year, the company sued Anthropic, alleging it scraped Reddit for years without permission.