Advanced dictionary creation techniques

4 May 2023 15 minutes Author: Cyber Witcher

Digital dictionary from A to Z

One of the most useful programs on a PC and smartphone in my opinion is an electronic dictionary. In the old days when I was teaching a foreign language, every word had to be looked up in a paper dictionary. I performed this trivial operation hundreds of times, and some harmful words had to be seen again and again because I managed to forget their meaning. How unfortunate it was! Is it the case now, the translation is in front of my eyes on the monitor screen. Search history, in case the word you are looking for has not moved from short-term memory to long-term memory. Unlike climbing, when digitizing a dictionary, the most difficult step is not the last, but the first. I recommend keeping everything in plain text files, as advanced search and error correction, tagging, sorting conversions, and other text array operations are impossible to do with a binary file. At these steps, it is important to decide on the structure of the dictionary entries. At these steps, it is important to decide on the structure of the dictionary entries.

In the simplest case, there will be only two fields: key and value. This is enough, but if you need to highlight different elements of articles, then you need to mark all such elements in a certain way. We will consider not only the tools we are already familiar with, but also several new ones. For some tasks, we will use not only specialized tools – some actions are easier to do with the help of standard Linux utilities or our own scripts. Since we won’t cover the basics of dictionary generation here, we’ll start with a list of sources where you can read the basics. It is recommended that you read them if you have not already done so.

A rule-based attack

Rule-based attack – modifies an existing dictionary according to a specified set of rules. If you want to change the behavior of a mask using a rule-based attack, you must first create a mask dictionary and then work with it. The easiest way is to use the Mentalist GUI program. An attack based on the rules of John the Ripper is much more powerful than hashcat, for this attack I recommend choosing John from these two programs.

Generation of dictionaries based on information about a person

If the password is based on the user’s data, for example, a combination of first name, last name, date of birth, children’s names, phone number, the same data of the closest relatives, then such a password can be considered weak. The tools discussed above are not very suitable for building such dictionaries based on information about the user – except for the combinatorial attack in Hashcat, but it only accepts 2 dictionaries at a time. It is this problem that the CUPP utility solves.

Installing CUPP on Kali Linux

Installing CUPP in BlackArch

Run the program in interactive mode and enter known user data:

Example of generated passwords:

If you need a translation of the questions, you will find it on the program card page.

Compilation of word lists and username lists based on website content

Let’s get acquainted with another tool – CeWL. This program crawls the specified site (you can specify the depth of the crawl) and all words found on the site’s pages are sorted in the order of their use. Why is such a dictionary needed? The author suggests using it for brute force. In addition, the program can search e-mail addresses, as well as get the names of the creators of office documents – Word and PDF files are supported. This data can be used to create a list of usernames. Also included with the program is the FAB utility, which extracts author names from already downloaded Word and PDF documents – they can also be used as usernames for brute force.

Installing CeWL in Kali Linux

Installation in BlackArch

Starting the collection of words from the pages of the site, using only the pages to which links will be found at the specified address (-d 1), to compile a dictionary that will be saved in the file dic.txt (-w dic.txt ):

Starting the collection of words from the pages of the site, using the pages, links to which will be found at the specified address, as well as the downloaded pages (-d 2), to compile a dictionary that will be saved to the specified file (-w dic.txt), while for each word the frequency with which it occurs will be shown (-c), a list of found email addresses will also be compiled (-e), which will be saved to the specified file (-email_file emails.txt) and will be created a list based on the information found in the meta tags of the documents (-a), this list will be saved to the specified file (-meta_file meta.txt):

Running FAB, during which all *.doc documents in the /home/mial/Downloads/ directory will be checked, the field containing the name of the author of the document will be removed from the meta information of these documents, the data will be displayed on the screen:

How to create a mask dictionary with variable length

Let’s consider the generation of lists of words of different lengths from the example of Hashcat and maskprocessor. In order to generate passwords of different lengths, there are the following options:

The -i option is optional. If it is used, it means that the length of the password candidates should not be fixed, it should increase by the number of characters. The –increment-min option is also optional. It defines the minimum length of password candidates. If the -i option is used, the –increment-min defaults to 1. And the –increment-max option is optional. It determines the maximum length of password candidates. If the -i option is specified but the –increment-max option is omitted, it defaults to the mask length.

Правила використання опцій збільшення маски:Rules for using mask enlargement options:

  • the -i option must be specified before using –increment-min and –increment-max

  • the value of the –increment-min option can be less than or equal to the value of the –increment-max option, but cannot exceed it

  • the length of the mask can be greater than or equal to the number of characters set by the –increment-max option, BUT the length of the mask cannot be less than the character length set by the –increment-max option.

So, the run command to generate passwords that are between six and ten characters long is:

The maskprocessor has the following magnification option:

The following command will build a dictionary of numbers from 1 to 9999:

When nothing is known about the password (all characters)

If you want to run a full scan, when the password can contain uppercase and lowercase Latin letters, as well as numbers and a password length from 1 to 12, you need to use the following options and mask:

To display all password candidates or save them in the dictionary:

If you need to run a full scan when the password can contain uppercase and lowercase Latin letters, numbers, as well as symbols !”#$%&'()*+,-./:;<=>?@[]^_` { |}~ and the password length is from 1 to 12, then you need to use the following options and mask:

To display all password candidates or save them in the dictionary:

Creating dictionaries that must use symbols and strings

In the comments to articles about generating passwords by masks, sometimes people ask how to create a dictionary containing certain symbols or words, and they can be anywhere. In fact, masks are not suitable for this. The task can be solved using a rule-based Attack, especially when it comes to individual symbols or groups of symbols – the solution to such cases has already been linked above. But when it comes to strings, a rule-based attack becomes either too complex and confusing due to the need to create a large number of rules, or even simply impossible. Let’s consider several examples.

Suppose it is known that a password consisting of any characters (uppercase and lowercase letters, as well as numbers) must contain the word “Alexey”, which can be anywhere in the password and in any case . To solve this problem, instead of creating an insane number of rules, you can create a dictionary with all variants and simply filter out words that contain a string, for example:

In my opinion, this is the best solution. It is also suitable if you do not want to create a dictionary, but want to use a mask attack – many brute force programs are able to accept password candidates from standard input. Another option – the word you are looking for can be in any case, but exactly located at the beginning of the password:

By the way, the last example is not particularly successful – since we know that initially only 2 characters are possible – “A” or “a”, it is better to use a set of characters that includes these two characters. Similarly, for others – at least four known characters (according to the number of possible user sets). How to create a dictionary that must contain the characters “e”, “g”, “D” and “t”? To do this, use the view command:

In it, you can add a chain with grep and filter passwords with any number of required characters. How to create a dictionary in which passwords in any place and in any case contain the word Alexey or the word MiAl? Use the view command:

The number of rows to be searched can be any:

An example of a command that creates a dictionary in which password candidates consist of only numbers, but the password must contain the sequence “12345” located anywhere:

I think the idea is clear – instead of trying to create an impossible mask, we create everything possible and filter out what we need.

How to create combined dictionaries

Combined dictionaries are usually called dictionaries that include both a username and a password, separated by a certain character (usually a colon or a tab character). But in this case, I mean dictionaries made up of words from different dictionaries by combining them. But we will also return to “normal” combined dictionaries. The point is that every word from the second dictionary is added to each word from the first dictionary.

Dictionary 1 (dict1.txt)

Dictionary 2 (dict2.txt)

Launching a combo attack (-a 1):


For some reason, it seemed to me that the words should also be combined in the reverse order (that is, the word from the second dictionary comes first), but, as you can see, this does not happen. Therefore, to obtain the described effect, you need to launch the attack again, swapping the dictionaries:

How to combine more than two dictionaries

Next, an example of a combination of three dictionaries is shown – the point is that each new word obtained is composed of one word from each of the three dictionaries:

How to combine 4 or more dictionaries in a similar way? It’s hard for me to imagine that this would be useful in a real-world situation, but for that you’d probably have to write your own script to automate the algorithm shown above. If you know programs that can do this, write in the comments. And… here I mentioned the program combinator3. It comes in the hashcat-utils package. This command is used to combine three dictionaries (to combine two dictionaries, use combinator). Using:

This program is able to combine 3 specified dictionaries, but again – if the dictionary is the third, then the words from it will always be at the end. To get all possible combinations of three words in any order, you need to use the following commands:

How to create all possible combinations for a short list of strings

The combipow utility generates all “unique combinations” from a short input list. This program is also included in hashcat-utils. Using:

An example of the contents of a dictionary named wordlist:

Running combipow with this dictionary:

Will give the following results:

Enter the title text

The PrinceProcessor program implements the PRINCE algorithm. You can learn more about this algorithm on the page of the program card. It also describes the essence of the program and its options. PrinceProcessor usage examples. To create all possible chains from the contents of the dict1.txt file:

Using words from the specified dictionary (dict1.txt), make chains with a minimum length of 2 elements (-elem-cnt-min=2) and a maximum length of 2 elements (-elem-cnt-max=2), i.e. in each chain there will be only 2 words:

A hybrid attack is a combination of a combinatorial attack and a mask attack

This attack combines a dictionary attack and a mask attack – it takes a dictionary and a mask as input and produces a hybrid password. If your example.dict contains:


Generate the following password candidates:

It works in the opposite direction too! Options:

The following password candidates are generated:

All the possibilities of a hybrid attack can be realized with the help of a rule-based attack – so if you like it more, then use it.

How to create a combined dictionary containing username and password separated by character

Now we return to combined dictionaries containing both username and password. As an example, look at the fragment of the dictionary (auth_basic.txt file) of the Router Scan by Stas’M program – in it, credentials are separated by a tab character:

And this is an example of a combined dictionary in which the username and password are separated by a colon:

To create a combined dictionary, use the view command:

In this team:

  • users.txt and passwords.txt – dictionaries from which usernames and passwords will be taken and all possible combinations will be compiled.

  • РОЗДІЛЬНИК – character that will separate the login and password

For example, in the following command, the delimiter is a colon:

If you need to insert a tab character as a separator, then click Ctrl-v + Tab:

By the way, if you try to understand the above hashcat command, you will find that the Combinatorial attack is used at the same time and the Rule-Based Attack rule is added. Let’s consider a special case: how to create a file with a paired dictionary of logins and passwords of the following type: login is always constant, then tab and password.

Of course, as the first dictionary, you can create a file with one text field – login. But there is another option using the most powerful sed command:

In this team:

  • superadmin — string to be inserted before each password

  • t — a tab character that will separate the login and password

  • pass.txt — file from which passwords are read

  • login_pass.txt — a new file where passwords will be saved

If you do not want to create a new file, but want to modify an existing one, then remove the redirection and add the -i option:

How to extract usernames and passwords from a combined dictionary into regular dictionaries

If we need to extract only usernames and/or only passwords from the combined dictionary. For this, we will use the (also powerful) awk program.

To remove usernames:

To retrieve passwords:

In these teams:

  • ROZDILYUVACH – this is a symbol that distributes logins and passwords. If you need to enter a tab character, write “t”.

  • GLOSSARY.txt – combinations of the dictionary for which we need lists of words

In principle, the commands only differ in $1 (the first field before the separator) and $2 (the second field after the separator).

How, with the help of Hashcat, you can generate a dictionary of MD5 hashes of all six-digit numbers from 000000 to 999999

Hashcat can work for regional tables, but only for Wi-Fi. Then, for the help of PHP, you can change the number of rows:

Execution time – 1-4 seconds. During this time, all md5 hashes for lines 000000…999999 will be generated. Save the above code to file md5-rb-gen.php, run like this:

To save the hashes of the file:

An interesting observation about the speed of achieving the task. The next two commands do the same thing:

But, on the average computer, execution of orders three times a year. FP appearing faster than native Linuh tsommands.

Doubling words

How to create a dictionary of 12 character words consisting only of decimal digits (?d) of the format abcdefabcdef, that is, a six-digit number written twice? You can use an Attack based on rules, or you can write a small Bash script (all words in the user.txt file are written twice):

For our task of doubling six-digit numbers, we can use the following command, which will generate six-digit numbers and write each number twice:

How to create a dictionary with a list of dates

How to create a list of dates according to the template DD-MM-YYYY, that is, corresponding to the mask ?d?d-?d?d-?d?d?d?d but so that the search is not in the range 00-99, but 01-31, 01 -12 and 1900-2021 respectively? The pydictor program can create such dictionaries.

But it is even easier to make a dictionary like this (it will be saved in the dates.txt file):

If you want to do without creating a dictionary, then pass the output of the previous commands to the hashcat standard input:

How to break the generated dictionaries into parts

Is it possible somehow in the maskprocessor to divide the generated dictionary into several parts? For example, in parts of 1GB. Yes, it is possible to divide the output of the mask processor, as well as ready-made dictionaries into parts. In Linux, it is convenient to use the split utility for this, for example:

Other related articles
Found an error?
If you find an error, take a screenshot and send it to the bot.