In the world of Linux operating systems, compression and archiving are important processes that allow users to efficiently manage their files and save disk space. These two concepts make it easier to store, transfer, and back up data in a Linux environment, making files smaller and easier to handle. Let’s take a look at their importance and how they affect your day-to-day work in Linux. File Compression: Compression is the process of reducing the size of a file or directory in order to reduce disk space requirements. This saves disk space and speeds up file transfer over the network. Compression can be either lossy or lossless, depending on whether data quality is lost during compression.
Archiving Files: Archiving is the process of combining multiple files and directories into a single archive file that can be easily saved, transferred and unpacked. Archives allow you to create compact packages of data, which facilitates the organization and storage of information. Space Saving: Compression and archiving help users store significantly more data on limited disk space. This is especially important for servers and systems with limited hard disk space. Ensuring Data Integrity: During data archiving and compression, the system can check data integrity, which helps detect and repair damaged files. Efficient File Sharing: Compressed and archived files take up less space and transfer faster over the network, making them ideal for data sharing. Backup: Archives make it easy to back up important data and restore it in case of loss or damage. In Linux, there are many utilities and commands for compressing and archiving files, and knowledge of these processes is essential for effective data management. Understanding compression and archiving helps users maximize resource utilization and ensure data security and availability in a Linux environment.
WHAT IS COMPRESSION?
The fascinating topic of compression could fill a whole book by itself, but for this book we only need a basic understanding of the process. Compression, as the name suggests, makes data smaller, thereby requiring less storage capacity and making data transfer easier. For your purposes as a novice hacker, it will be sufficient to classify compression as lossy or lossless. Lossy compression is very effective at reducing file size, but the integrity of the information is lost. In other words, the compressed file is not exactly the same as the original.
This type of compression is great for graphics, video, and audio files where small differences in the file are inconspicuous (.mp3, .mp4, .png .jpg, etc. are all lossy compression algorithms. If a pixel in a .png file or a single note in a .mp3 file changed, your eye or ear will hardly notice the difference, although of course music fans will say they can definitely tell the difference between an .mp3 and an uncompressed .flac file. The strengths of lossy compression are its efficiency and effectiveness. The compression ratio is very high, and this means that the resulting file is much smaller than the original.
However, lossy compression is not acceptable when you are sending files or software and data integrity is critical. For example, if you’re sending a script or document, the integrity of the source file must be preserved when it’s unpacked. This chapter will discuss this type of lossless compression, which is available from a number of utilities and algorithms. Unfortunately, lossless compression is not as efficient as lossy compression, as you can imagine, but for a hacker integrity is often much more important than compression ratio.
Usually, the first thing you do when compressing files is to combine them into an archive. In most cases, the tar command is used when archiving files. Tar stands for tape archive, a reference to the prehistoric days of computing when systems used tape to store data. The tar command creates a single file from many files, which is then called an archive, tar file, or tarball.
For example, say you had three script files like the ones we used in Chapter 8, named hackersarise1, hackersarise2, and hackersarise3. If you go to the directory that contains them and run through the long list, you can clearly see the files and the details you expect, including the size of the files, as shown here:
kali >ls-l rwxrxrx 1 root root 22311 Nov 27 2018 13:00 hackersarise1.sh rwxrxrx 1 root root 8791 Nov 27 2018 13:00 hackersarise2.sh rwxrxrx 1 root root 3992 Nov 27 2018 13:00 hackersarise3.sh
Let’s say you want to send all three of these files to another hacker you’re working on a project with. You can combine them and create a single archive file using the command in Listing 91.
kali >tar-cvfHackersArise.tarhackersarise1hackersarise2hackersarise3 hackersarise1 hackersarise2 hackersarise3 Listing 91: Creating a tarball of three files
Let’s break this command down to better understand it. The archive command is tar, and we use it here with three options. The c option means create, v (which means verbose and optional) lists the files tar will deal with, and f means write to the next file. This last option is also suitable for reading from files. Then we give the new archive the file name you want to create from the three scripts: HackersArise.tar.
In its entirety, this command will take all three files and create a single HackersArise.tar file out of them. When you make another long directory listing, you’ll see that it also contains a new .tar file, as shown below:
kali >ls-l snip rwrr 1 root root 40960 Nov 27 2018 13:32 HackersArise.tar snip kali >
Note the size of the tarball here: 40,960 bytes. When three files are archived, tar uses significant overhead to perform this operation: while the sum of the three files before the archive was 35,094 bytes, after the tarball was archived it grew to 40,960 bytes. In other words, the archiving process added over 5000 bytes. While this overhead can be significant for small files, it becomes less and less significant for larger and larger files.
We can display these files from the tarball without extracting them using the tar command with the -t list-of-contents switch, as shown below:
kali >tar-tvfHackersArise.tar rwxrxrx 1 root root 22311 Nov 27 2018 13:00 hackersarise1.sh rwxrxrx 1 root root 8791 Nov 27 2018 13:00 hackersarise2.sh rwxrxrx 1 root root 3992 Nov 27 2018 13:00 hackersarise3.sh
Here we see our three original files and their original sizes. You can then extract these files from the tarball using the tar command with the -x (extract) switch as follows:
kali >tar-xvfHackersArise.tar hackersarise1.sh hackersarise2.sh hackersarise3.sh
Since you are still using the –v switch, this command will show which files are extracted in the output. If you want to extract the files and do it “silently”, i.e. without showing any results, you can simply remove the -v (verbose) switch, as shown here:
kali >tar -xf HackersArise.tar
The files have been extracted to the current directory; you can make a long list in the catalog to check again. Note that by default, if the extracted file already exists, tar will delete the existing file and replace it with the extracted file.
We now have one archive file, but this file is larger than the sum of the original files. What if you want to compress these files for easy transport? Linux has several commands that can create compressed files. We will consider the following:
gzip, which uses the extension .tar.gz or .tgz
bzip2, which uses the extension .tar.bz2
compress, which uses the extension .tar.z
All of them are capable of compressing our files, but they use different compression algorithms and have different compression ratios. Therefore, we will consider each of them and what it is capable of.
In general, compression is the fastest, but the resulting files are larger; bzip2 is the slowest, but the resulting files are the smallest; and gzip is somewhere in the middle. The main reason why you, as a novice hacker, should know all three methods is because you will encounter different types of compression when accessing other tools. Therefore, this section shows how to deal with the main compression methods.
Let’s try gzip (GNU zip) first, as it is the most commonly used compression utility in Linux. You can zip the HackersArise.tar file by typing the following (make sure you’re in the directory where the archived file is stored):
kali >gzip HackersArise.*
Note that we used the wildcard * for the file extension; this tells Linux that the command should apply to any file that starts with HackersArise with any file extension. You will use similar notation for the following examples. When we do a long directory listing, we see that HackersArise.tar has been replaced with HackersArise.tar.gz, and the file size has been compressed to just 3,299 bytes!
kali >ls-l snip rwrr 1 root root 3299 Nov 27 2018 13:32 HackersArise.tar.gz snip
We can then unzip the same file using the gunzip command, short for GNU unzip.
kali >gunzip HackersArise.*
After unpacking, the file is no longer saved with the .tar.gz extension, but with the .tar extension. Also notice that it is back to its original size of 40,960 bytes. Try making a long list to prove it. It’s worth noting that gzip can also be used to extract .zip files.
Another widely used Linux compression utility is bzip2, which works similarly to gzip but has better compression ratios, meaning the resulting file will be even smaller. You can zip the HackersArise.tar file by typing the following:
kali >bzip2 HackersArise.*
When you do a long list, you can see that bzip2 compressed the file to only 2,081 bytes! Also note that the file extension is now .tar.bz2.
To decompress a compressed file, use bunzip2, for example:
kali >bunzip2 HackersArise.* kali >
The file will then return to its original size and its extension will revert to .tar.
Finally, you can use the compress command to compress the file. This is probably the least commonly used compression utility, but it’s easy to remember. To use it, just type compress followed by the file name, for example:
kali >compressHackersArise.* kali >ls-l snip rwrr 1 root root 5476 Nov 27 2018 13:32 HackersArise.tar.Z
Note that the compression utility reduced the file size to 5,476 bytes, more than double the size of bzip2. Also note that the file extension is now .tar. Z (with capital Z).
To extract the same file, use extract:
kali >uncompressHackersArise.*
You can also use the gunzip command with files compressed using compression.
In the world of information security and hacking, one Linux archive command stands above the rest in its usefulness. The dd command makes a bit-by-bit copy of a file, a file system, or even an entire hard drive. This means that even deleted files are copied (yes, it’s important to know that deleted files can be recovered), making it easier to find and restore. Deleted files will not be copied by most logical copy utilities such as cp.
Once the hacker has control of the target system, the dd command will allow them to copy the entire hard drive or storage device to their system. Also, those people whose job it is to catch hackers, namely forensics, will most likely use this command to make a physical copy of the hard drive with deleted files and other artifacts that may be useful in finding evidence against hacker
It is important to note that the dd command should not be used for today’s routine copying of files and storage devices, as it is very slow; other teams get the job done faster and more efficiently. It is, however, great when you need a copy of a storage device without a file system or other logical structures, such as during a forensic investigation.
The basic syntax of the dd command is as follows:
dd if=inputfile of=outputfile
So, if you wanted to make a physical copy of your flash drive, assuming the flash drive is sdb (we’ll discuss this notation in more detail in Chapter 10), you’d type the following:
kali >ddif=/dev/sdbof=/root/flashcopy 1257441=0 records in 1257440+0 records out 7643809280 bytes (7.6 GB) copied, 1220.729 s, 5.2 MB/s
Let’s break this command down: dd is your physical “copy” command; if represents your input file, with /dev/SDB representing your flash drive in the /dev directory; indicates your source file; and /root/flashcopy is the name of the file to copy the physical copy to.
There are many options available for use with the dd command and you can explore them a bit, but among the most useful are the noerror option and the bs (block size) option. As the name suggests, the noerror option keeps copying even if errors occur. The bs option allows you to define the block size (number of read/write bytes per block) of copied data. The default value is 512 bytes, but can be changed to speed up the process. This is usually set to the device’s sector size, most commonly 4 KB (4.096 bytes). With these options, your command will look like this:
kali >ddif=/dev/mediaof=/root/flashcopybs=4096conv:noerror
As mentioned, it’s worth doing a bit more research on your own, but this is a good introduction to the command and its common uses.
Linux has a number of commands that allow you to combine and compress files for easier transfer. For combining files, tar is the command of choice, and you have at least three file compression utilities—gzip, bzip2, and zip—all with varying degrees of compression. The dd team goes above and beyond. This allows for the creation of a physical copy of storage devices without logical structures such as a file system, allowing the recovery of artifacts such as deleted files.
We used materials from the book “LINUX BASICS FOR HACKERS” written by William Pollock