Open source business intelligence (OSINT) is an integral part of today’s business information gathering strategy. The use of open data sources, such as the Internet, social networks, public databases and news portals, allows companies to obtain valuable information about their competitors, markets, potential customers and other aspects of business. The purpose of open source business intelligence is to collect, analyze and interpret a variety of data, which allows enterprises to make informed decisions, identify new opportunities and reduce risks. It helps businesses gain competitive advantage, improve marketing and sales strategies, identify market trends, and predict business changes. Using open source business intelligence requires a high level of expertise and specialized tools for data collection and analysis.
It is important to adhere to ethical standards and ensure confidentiality and protection of personal information. With open source business intelligence, businesses can effectively analyze information from public sources and use it to make informed decisions that contribute to their success and sustainability. Using open source business intelligence (OSINT) opens up many opportunities for companies of all sizes. Thanks to the availability of a large volume of open information, enterprises can obtain valuable conclusions and insights about their customers, competitors and the market situation. The purpose of open-source business intelligence is to gather information that helps avoid unforeseen risks, identify new market opportunities, understand consumer needs and desires, and implement effective marketing strategies. This allows companies to stay one step ahead of their competitors and maintain their competitiveness. The main elements of business intelligence from open sources are the search and collection of information from websites, social networks, forums, blogs and other sources, analysis and interpretation of the received data. The use of specialized tools and technologies allows you to automate the process of data collection and analysis, which makes it more efficient and faster. This chapter presents three categories of OSINT: business, people, and cyber threat intelligence. Then I’ll cover some business OSINT tools for useful tasks like finding company executive names, discovering public files, harvesting email addresses, and reading document metadata.
In 2017, I won the DerbyCon Capture the Flag (SECTF) social engineering challenge. In this exercise, I and five other participants confronted an unsuspecting Fortune 500 company in Louisville, Kentucky. We spent three weeks gathering OSINT and then spent 20 minutes in a soundproof booth talking to employees of the target company. While researching my target company, I checked the social media accounts of one of the executives and learned that he was late for a business meeting in Ams-Terdam because the airline had delayed his flight to Newark. This seemingly innocent information gave me a great reason to contact him.
Knowing this, I added the phone number of this airline to my list of numbers. He then learned the name, email address and phone number of the business executive and the victim and added them to his list of targets to attack. If I had a contract with this firm that allowed phishing, I would have sent an apology email mimicking the airline’s pattern and then called claiming to be an employee of the airline. I could then confirm the information I already knew and ask a “security question” to convince the victim that I was a reliable source. I could even turn on a few Windows operating system sounds to boost my authority. Finally, I would ask him potentially dangerous questions about the company’s operating environment, such as the status of equipment upgrades, work schedules, or other sensitive company data.
This attack would not have been possible if I had not first found the manager’s message about the flight delay. A rare effective social engineering attack occurs without information intelligence of the object. The better the OSINT, the better the social engineering.
Let’s deal with OSINT types
OSINT can refer to an organization, an individual, or a piece of code. When collecting business OSINT, we look for information about the company as a whole (technology in use, suppliers, customers, operations and location).
To gather OSINT information related to people, we can go in two directions. You can target the person themselves by hunting for information such as their likes and dislikes, personal connections, password reset questions, and the context of password selection. Alternatively, we may use a person to learn about the business they work for. This type of OSINT includes photos of a person at work, resumes, complaints or bragging about work and trips people have taken for work, and this is only a small portion of the potentially useful information.
NOTE I generally do not directly use a person’s personal accounts as part of an attack on an organization. I may collect information for further use, but I will not try to contact him through his personal account on social networks and messengers.
OSINT can be used for Cyber Threat Intelligence (CTI) purposes, which usually involves a piece of code or a specific adversary. We use OSINT as a means of identifying the attacker and their motives. For example, you can trace code elements to identify its author or country. Or track the email address or phone number from which your organization was contacted. People debate the effectiveness of OSINT for threat analysis. Some organizations do this very well, while others try to make a quick buck at the expense of their customers.
Collecting OSINT data about your organization
This section will help you get started with collecting OSINT data about organizations. What context can you use to build relationships with your employees? Here I will talk about some OSINT data collection tools.
Various aggregator sites can provide you with a selection of company information. Although most of them charge for detailed information, some allow you to get a limited amount of information for free or without authentication. An example of such a site in the US is Crunchbase (https://www.crunchbase. com/). There are similar services in other countries, they are all arranged in approximately the same way and provide similar information. You can find relevant services using the search engine.
Crunchbase has a free tier that meets most needs of the casual OSINT enthusiast. If you plan to use it actively or as a professional consultant, I recommend paying for the Pro level. A Crunchbase search for Walmart brings up a multi-tabbed profile. 4.1 shows the Summary tab, which allows you to get the address of the company’s headquarters. Before scrolling down, you can see the number of mergers, acquisitions, and exits the company has been involved in. You can see its stock ticker (if it’s a public company), the latest news about the company, and basic historical information about it. Crunchbase gathers this information from a combination of data from analysts, web crawls, and corporate reports that vary in accuracy.
The Financials tab provides specific information about investments and fundraising (Figure 4.2).
If the company shares are traded on a stock exchange, you will find information about the initial public offering (IPO) and the share price. If you are researching a closed private company, you will see almost nothing in this section or perhaps learn about fundraising efforts, including amounts raised, investors and dates. If the company invested money or made a donation, this will be noted below (Fig. 4.3), followed by exits and completed acquisitions, Fig. 4.4.
Next comes the People tab, which lists important employees. These are usually managers who oversee certain key areas or people who have influenced the history of the organization. For example, in Figure 4.5, Sam Walton, the founder of Walmart, is listed as the “founder and leader” of the current team and a member of the board of directors, despite the fact that he passed away in 1992.
The contents of the Technology tab are mostly hidden unless you have a Pro account. If you have one, this tab will show web traffic statistics, mobile app metrics, and limited information about the company’s patents and other intellectual property programs. This information can be found elsewhere on the internet, so blocking is not that big of a problem. Try searching on BuiltWith (https://www.builtwith.com/), Wappalyzer (https://www.wappalyzer.com/) або Shodan (https://www.shodan.io/).
The last tab, Signals &; News, contains the latest news and changes in the management (Fig. 4.6).
This tab also lists events that the organization has something to do with, either by sponsoring them or speaking at them on behalf of its employees. This is a good starting point, but not a substitute for other sources of information, including public documents, press releases and media reports. (We’ll discuss these sources in the next few sections.) This tab can also serve as a source of search query ideas that you can enter into the search engine of your choice.
The name WHOIS comes from who’s who — “who’s who” — and is a directory of websites, their network addresses, owners, and their contact information. Its purpose is to help people with a legitimate business need to contact companies’ web teams about their web presence.
You can perform a WHOIS lookup using DomainTools, as shown in Fig. 4.7. The whois command is built into both Kali Offensive Security and Trace Labs and can be added to any Linux system using apt-get or similar commands for other Linux distributions.
At the top of the page, you can see domains that are similar to the victim’s domain and are for sale. They can be useful for domain hijacking and further phishing attempts. Spoofing is easy to detect and most email clients have protection against it, which weakens your potential as a social engineer. Buying legitimate domains similar to the victim’s domain is more likely to cause emails to pass through filters and end up in inboxes.
Please note that in this case domain transfer is prohibited, which means that you most likely will not be able to transfer this domain to another provider, which red companies often do. Also, pay attention to the age of the domain. This helps ensure that you have selected the correct target. Also, the same feature can detect that the domains you are using are fake. That’s why it’s recommended to buy domains and wait six months to a year before using them.
Then there are the domain name servers used by the site. Sometimes they may refer to software and services used by the company. For example, Walmart uses Akamai and UltraDNS. Akamai also provides Content Delivery Network (CDN) services (to provide faster page loading and DOS attack mitigation) and performs web protection and load balancing (further DOS mitigation). This is important to know if you are preparing for a penetration test.
Please be aware that as of May 25, 2018, the EU General Data Protection Regulation (GDPR) has changed the way data is processed in a database in its jurisdiction. This prompted the Internet Corporation for Assigned Names and Numbers (ICANN), the governing body of WHOIS, to change the composition of the information provided about companies and contact persons located in the EU.
Recon-ng is a Linux command-line tool specifically written by Tim Thoms for OSINT gathering. It’s very similar to Metasploit: you can enter information, assign targets, and then use the run command to run the search.
Recon-ng has a variety of built-in OSINT collection tools for both businesses and individuals, ranging from hacked emails from Have I Been Pwned (discussed in Chapter 6) and DNS records to hosts or ports from Shodan (discussed in Chapter 5). You can find most things you want to know about a company using Recon-ng.
Recon-ng comes pre-installed in both Offensive Security and Trace Labs Kali versions. To use Recon-ng on another Linux system, you’ll need Python 3, the pip3 package management tool, and Git. You can then install it in the /opt directory using the following commands:
root@se-book:/opt# git clone https://github.com/lanmaster53/recon-ng Cloning into 'recon-ng'... ---snip-Resolving deltas: 100% (4824/4824), done. root@se-book:/opt# cd recon-ng/ root@se-book:/opt/recon-ng# ls -la --snip-- -rw-r--r-- 1 root root 97 Sep 25 18:37 REQUIREMENTS --snip-- -rwxr-xr-x 1 root root 2498 Sep 25 18:37 recon-ng -rwxr-xr-x 1 root root 97 Sep 25 18:37 recon-web root@se-book:/opt/recon-ng# python3 -m pip install -r REQUIREMENTS Requirement already satisfied: pyyaml in /usr/lib/python3/dist-packages (from -r REQUIREMENTS (line 2)) Collecting dnspython (from -r REQUIREMENTS (line 3)) Downloading https://files.pythonhosted.org/packages/ec/d3/3aa0e7213ef72b8585747 aa0e271a9523e713813b9a20177ebe1e939deb0/dnspython-1.16.0-py2.py3-none-any.whl (188kB) 100% |████████████████████████████████| 194kB 5.6MB/s
Recon-ng makes it possible to define separate work areas that are perfect for segmenting the collected information. You can define the working environment when you open Recon-ng and store the collected data in your own unique SQLite database. If I’m looking for different organizations or companies for the same contract, I’ll give them separate jobs so there’s no confusion when looking at the information I’ve collected. If you do not specify a workspace, Recon-ng will write all results to the default workspace and its associated database.
To use the workspace when running Recon-ng, run the command:
recon-ng -w name_of_working_area
For example, if I were researching Walmart, I might enter the command:
recon-ng -w Walmart
As a result, the workspace will look like this:
[recon-ng] [walmart]
If you are already using Recon-ng, you can view the available workspaces by typing workspace list.
NOTE You cannot do this when the module is loaded, so you will need to run the command back in that situation.
If you want to load an existing working environment, enter the following command:
workspace load name_of_working_area
You can also create a workspace using the command:
workspace create name_of_working_area
When information in a workspace is no longer needed, it can be deleted to meet information retention requirements.
workspace remove name_of_working_area
Next, you need to turn on and install the modules. Let’s see what modules are available using the search command in the marketplace:
[recon-ng][walmart] > marketplace search +----------------------------------------------------------------------------------- ------+ |. Path | Version | Status | Updated | D | K | +--------------------------------------------------------------------------------------- --+ | discovery/info_disclosure/cache_snoop | 1.0 | not installed | 2019-06-24 | | | | discovery/info_disclosure/interesting_files | 1.0 | not installed | 2019-06-24 | | | | exploitation/injection/command_injector | 1.0 | not installed | 2019-06-24 | | | | exploitation/injection/xpath_bruter | 1.1 | not installed | 2019-08-19 | | | | import/csv_file | 1.1 | not installed | 2019-08-09 | | | | import/list | 1.0 | not installed | 2019-06-24 | | |
There are two ways to install modules: one at a time or all at once. To install a single module, enter the following command, replacing import/csv_file with the full path to the module: [recon-ng][walmart] > marketplace install import/csv_file [*] Installed module: import / csv_file [*] Reloading modules…
To install all available modules, use the following command:
[recon-ng][walmart] > marketplace install all [*] Module installed: discovery/info_disclosure/cache_snoop [*] Module installed: discovery/info_disclosure/interesting_files [*] Module installed: exploitation/injection/command_injector --snip-- [*] Module installed: reporting/xml [*] Reloading modules... [!] 'google_api' key not set. pushpin module will likely fail at runtime. See 'keys add'. [!] 'bing_api' key not set. bing_linkedin_cache module will likely fail at runtime. See 'keys add'. [!] 'censysio_id' key not set. censysio module will likely fail at runtime. See 'keys add'.
In order for some tools to access external resources, you need to add API keys from different websites. Each website has its own process for obtaining these keys, and these procedures change frequently. You can find my up-to-date guide on how to get these API keys at https://www. theosintion.com/practical-social-engineering/ or check the API key pages on the websites for each tool individually.
Once you have the keys, use the following syntax in Recon-ng to add them:
keys add module_name key_value
Verify that Recon-ng has the key in the database with the following command:
keys list
There are five types of Recon-ng modules: Discovery, Exploitation, Import, Intelligence, and Reporting. In this book, we will use the Discovery, Intelligence, and Reporting modules.
To see the modules that belong to a specific type, use the search command followed by the type name, for example:
modules search discovery
If you know part of the module name, you can use the search function to find it, for example:
modules search hibp
You can also call the module directly using the modules load command if you know the name of the module or the beginning of its name:
modules load metacr
This command will load the metascanner module. Now let’s look at some of these modules in more detail.
To set a target for a module, you need to know what inputs the module accepts. Find out by typing the info command. When you are ready to enter a target or value into one of the valid fields, enter the optionset field_namefield_value command.
The metacrawler module searches the target site or sites for Microsoft PowerPoint, Word, Excel and PDF files. This is equivalent to a long Google Dork search term like this:
site:nostarch.com Filetype:XLS* OR Filetype:DOC* OR Filetype:PPT* or Filetype:PDF
For example, to find all file types on the nostarch.com site, use the following commands:
[recon-ng][default][metacrawler] > options set SOURCE nostarch.com SOURCE => nostarch.com [recon-ng][default][metacrawler] > run ------------ NOSTARCH.COM -----------[*] Searching Google for: site:nostarch.com filetype:pdf OR filetype:docx OR filetype:xlsx OR filetype:pptx OR filetype:doc OR filetype:xls OR filetype:ppt [*] https://www.nostarch.com/download/WGC_Chapter_3.pdf [*] Producer: Acrobat Distiller 6.0 (Windows) [*] Title: Write Great Code [*] Author: (c) 2004 Randall Hyde [*] Creator: PScript5.dll Version 5.2 [*] Moddate: D:20041006112107-07'00' [*] Creationdate: D:20041006111512-07'00' [*] https://www.nostarch.com/download/wcss_38.pdf [*] Producer: Acrobat Distiller 5.0 (Windows) [*] Title: wcss_book03.book [*] Author: Riley [*] Creator: PScript5.dll Version 5.2 [*] Moddate: D:20040206172946-08’00’ [*] Creationdate: D:20040116180100Z
If Extraction is set to True, this command displays all documents available on the public destination website in PDF or Microsoft Office (Excel, Word, or PowerPoint) formats with a link to the file and its metadata, including author, date modified, software the security that created the document and the date of creation. If the Extraction parameter is set to False, the output contains only the file name and link.
With the help of this information, you can do many things: from the metadata – extract user names, names of operating systems and used software; From the files themselves, find information that the target intended to keep secret, including names, email addresses, phone and fax numbers, locations, and important business matters.
Searching for domain contact information using whois_pocs
The whois_pocs module lists all known contacts for the specified domain. It is more reliable for this function than the whois_miner module and works even against targets with domain privacy enabled. Here’s an example of running this module against Walmart:
[recon-ng][default][whois_pocs] > modules load whois_pocs [recon-ng][default][whois_pocs] > options set SOURCE walmart.com SOURCE => nostarch.com [recon-ng][default][whois_pocs] > info Name: Whois POC Harvester Path: modules/recon/domains-contacts/whois_pocs.py Author: Tim Tomes (@LaNMaSteR53) Description: Uses the ARIN Whois RWS to harvest POC data from whois queries for the given domain. Updates the 'contacts' table with the results. Options: Name Current Value Required Description ------ ------------- -------- ----------- SOURCE walmart.com yes source of input (see 'show info' for details) Source Options: default SELECT DISTINCT domain FROM domains WHERE domain IS NOT NULL <string> string representing a single input <path> path to a file containing a list of inputs query <sql> database query returning one column of inputs [recon-ng][default][whois_pocs] > run ----------- WALMART.COM ----------- [*] URL: http://whois.arin.net/rest/pocs;domain=walmart.com [*] URL: http://whois.arin.net/rest/poc/ABUSE327-ARIN [*] Country: United States [*] Email: [email protected] [*] First_Name: None [*] Last_Name: Abuse [*] Middle_Name: None [*] Notes: None [*] Phone: None [*] Region: Brisbane, CA [*] Title: Whois contact [*] --------------------------------------------------
Be aware that some organizations do not disclose WHOIS information.
The mx_sfp_ip module gets the mail exchange (MX) DNS record for the domain. An MX record determines how a domain handles email. This displays the mail servers in use and any Sender Policy Framework (SPF) entries that limit the IP address ranges from which the domain can receive mail, as well as the domains that can send email to the organization without verification.
By using an MX record, an attacker can obtain the information it contains to create a successful email spoofing attack. For example, an attacker can set the IP address ranges specified in the record and their associated domains. This may provide clues about business relationships, suppliers or technology used.
The command below retrieves the MX record for nostarch.com. The result confirms that the site uses Google’s mail servers, but the absence of an SPF record indicates that No Starch does not have SPF implemented:
[recon-ng][book][mx_spf_ip] > options set SOURCE nostarch.com SOURCE => nostarch.com [recon-ng][book][mx_spf_ip] > run [*] Retrieving MX records for nostarch.com. [*] [host] alt1.aspmx.l.google.com (<blank>) [*] [host] aspmx.l.google.com (<blank>) [*] [host] alt3.aspmx.l.google.com (<blank>) [*] [host] alt2.aspmx.l.google.com (<blank>) [*] [host] alt4.aspmx.l.google.com (<blank>) [*] Retrieving SPF records for nostarch.com. [*] nostarch.com => No record found.
On the other hand, the following discovery shows us that Walmart uses SPF:
[recon-ng][book][mx_spf_ip] > options set SOURCE walmart.com SOURCE => walmart.com [recon-ng][book][mx_spf_ip] > run [*] Retrieving MX records for walmart.com. [*] [host] mxb-000c7201.gslb.pphosted.com (<blank>) [*] [host] mxa-000c7201.gslb.pphosted.com (<blank>) [*] Retrieving SPF records for walmart.com. [*] TXT record: "dtOeNuIs42WbSVe3Zf2qizxLw9LSQpFd6bWqCr166oTRIuJ9yKS+etPsGGNOvaiasQk2C 6GV0/5PjT9CI2nNAg==" [*] TXT record: "google-site-verification=ZZYRwyiI6QKg0jVwmdIha68vuiZlNtfAJ90msPo1i7E" [*] TXT record: "adobe-idp-site-verification=7f3fb527466337ac0ac0752c569ca2ac48926dc6c 6dad3636d581aa131a1cf3e" [*] TXT record: "v=spf1 ip4:161.170.248.0/24 ip4:161.170.244.0/24 ip4:161.170.236.31 ip4:161.170.238.31 ip4:161.170.241.16/30 ip4:161.170.245.0/24 ip4:161.170.249.0/24 include:Walmart.com include:_netblocks.walmart.com include:_vspf1.walmart.com include:_vspf2. walmart.co" "m include:_vspf3.walmart.com ~all" [*] [netblock] 161.170.248.0/24 [*] [netblock] 161.170.244.0/24 [*] [host] <blank> (161.170.236.31) [*] [host] <blank> (161.170.238.31) [*] [netblock] 161.170.241.16/30 [*] [netblock] 161.170.245.0/24 [*] [netblock] 161.170.249.0/24 [*] TXT record: "facebook-domain-verification=ximom3azpca8zph4n8lu200sos1nrk" [*] TXT record: "adobe-idp-site-verification=5800a1970527e7cc2f5394a2bfe99bcda4e5938e1 32c0a19139fda9bf6e30704" [*] TXT record: "docusign=5bdc0eb1-5fb2-471c-99a0-d0d9cc5fdac8" [*] TXT record: "MS=E4F53D5B1A485B7BA06E0D36A9D38654A16609F3"
The SPF record lists domain checks for Adobe, Facebook, DocuSign, Microsoft, and Google. A text record (TXT) beginning with MS= indicates that Walmart uses Microsoft Office 365. It also uses adobe-idp-site-verification to verify domains for Adobe Enterprise products such as Creative Cloud x. Facebook domain verification The TXT record restricts domains that edit the official Facebook page for the domain w. A TXT record beginning with docusign= indicates that the site uses DocuSign to sign official documents y.
Note that pphosted.comu is listed as the host. This involves the use of Proof-point, an anti-spoofing technology that adds a custom message, often the string [EXTERNAL], to the subject of received emails, making it easier to detect phishing or attempts to compromise corporate email.
Some network bands are also listed.
These are the target’s public IP addresses, and the two hosts listed are primary mail servers. You can check this with other tools.
Like Recon-ng, theHarvester is a Linux-based OSINT command-line tool available for free as part of Kali and Buscador. You can also find it on GitHub. theHarvester, written by Christian Martorella, requires API keys for Shodan and Google Custom Search Engine (CSE).
These keys can be entered in the following files:
way to_theHarvester discovery/googleCSE.py
and:
way to_theHarvester /discovery/shodansearch.py
In the combine, you can use switches to direct the tool to perform tasks. Deciding to use theHarvester over Recon-ng is a matter of personal preference. Even if you take Recon-ng as your primary tool, you can get a second opinion with theHarvester to see if Recon-ng missed any additional information.
The OSINT Framework (https://osintframework.com/) is a set of GUI tools. Developed under the leadership of Justin Nordin, the OSINT Framework groups resources based on what you’re looking for (Figure 4.8).
You’ll often need to look up company email addresses and their syntax (the format a company uses for its employees’ email addresses). Hunter is a great tool for making a list. Without logging in, you can get the basic email address syntax used by the company, and after creating an account and logging in, you can get the most common email address syntax, full company email addresses, and sometimes the person’s job title.
In fig. 4.9 Unauthenticated search results are displayed.
In fig. Figure 4.10 shows the results of an authenticated search that returns valid email addresses for our target domain and where they were found.
Looking at these results, you can infer the syntax of company email addresses. You can then go to LinkedIn and the company website to get more names and then collect more email addresses yourself if you want to send phishing emails to these people.
Hunter provides different levels of service; As of this writing, they range from free (100 requests per month without CSV export) to $399. per month, which includes 50,000 requests and allows CSV export.
You’ve probably used Google Maps or Bing Maps to navigate maps, satellite images, and street views. When it comes to collecting OSINT data, satellite and street view modes are generally the most valuable.
The satellite view can show gates, garbage cans, satellite dishes, entrances and exits, parking patterns and nearby properties. You can enlarge some areas large enough to define awnings, entrances and smoking areas.
The view from the street allows you to see the building and objects as if you were walking or driving by. From this point of view, the following can be defined:
Names of companies that maintain barriers and gates or that deal with the removal of garbage cans (useful information that can help you gain access to the premises of the building or rummage in the garbage can);
The presence and location of gates, doors and fences, and whether they are usually left open (and sometimes the presence of security guards);
Delivery service companies whose trucks are parked outside; zzspecific building names, such as Walmart Innovation Center, Walmart People Center, or Walmart Home Office, which may help you better fit into the organization; presence of other tenants in the building.
During the DerbyCon SECTF competition mentioned at the beginning of this section, I used Google Maps to identify delivery services for my target company by checking whose trucks were near the gate. I could use this information to gain physical access to the building, perhaps by finding a similar form at a thrift store, or as an excuse to call about a delivery.
Using both Google Maps and Bing Maps can provide more accurate information because these services have different data sources. Also, the images were taken on different days, so you might, for example, find a delivery truck in one app but not in another, see a new trash can in a later photo, or see a clearer service company logo.
We used materials from the book “Social Engineering and Ethical Hacking in Practice”, which was written by Joe Gray.