Bellingcat Online Investigation Kit: Archive and Download

22 April 2023 3 minutes Author: Cyber Witcher

Archiving websites

Managing files is an important part of running a website, and you can do it in effective ways. FTP (or File Transfer Protocol) may seem like the most obvious choice, but the truth is that it’s not always the best alternative. If you are not familiar with the command line interface, you need a GUI FTP client, and it is not always at hand, especially if you are using a mobile device or someone else’s computer. In addition, some browser solutions are easily more flexible and give you more options to organize your website data. Long gone are the days when websites were written by hand in HTML. Now they are dynamic and built “on the fly” using the latest JavaScript, PHP or Python frameworks. As a result, sites have become more fragile: a database failure, erroneous update or vulnerability can lead to data loss.

Drupal’s content management system has proven to be particularly challenging in this regard, with the biggest update intentionally breaking compatibility with third-party modules, implying an expensive upgrade process that customers can afford. The solution was to archive these sites: take a live, dynamic website and turn it into simple HTML files that any web server could serve forever. This process is useful for your own dynamic sites, as well as for third-party sites that are outside of your control and that you want to protect. The sad truth of my mirroring and archiving is that data is dying. By the way, amateur archivists have at their disposal tools for saving interesting content on the Internet.

Archiving and downloading

Archive.org

Very useful non-commercial library containing millions of free books, movies, software, music, websites and more.


click here

Hunch.ly

A web capture tool designed for online research with automatic document collection and annotation of all web pages.


click here

Archive.today

A free web page archiving service that stores page content, including images, but does not support dynamic content.


click here

Dumpster Diver

A tool that can analyze large volumes of data for hard-coded secrets such as keys (AWS, Azure or SSH) or passwords.


click here

DMCA

The database collects and analyzes legal requests to remove online material, helping users know their rights and understand the law.


click here

Wayback Machine Downloader

Designed to download the latest versions of a file found on the Wayback Machine, recreating the directory and page structure.


click here

Perma.cc

Helps organizations create a permanent record of the web resources they link to. Created and maintained by libraries.


click here

Gitrob

A tool that helps you find sensitive files. Gitrob will clone the repositories owned by the user. An important tool.


click here

Arweave

A new type of storage that supports data with persistent and perpetual reserves, allowing users to store data forever.


click here

Other related articles
Found an error?
If you find an error, take a screenshot and send it to the bot.