3. Gathering Information and Reconnaissance

Collecting information about the target is the very first and crucial step in the vulnerability testing process. During reconnaissance, we need to gather as much information as possible, including:

  • Server Technologies:
    • Operating System of the server (Windows, Linux).
    • Web server software (Apache, IIS, Nginx).
    • Content Management System (CMS) if used (e.g., Joomla, Drupal, WordPress).
    • Frameworks (e.g., CakePHP, Laravel).
    • Programming language used for the application (e.g., PHP, Python).
    • Templating system if employed.
  • Client-side Technologies:
    • JavaScript libraries (e.g., jQuery).
    • JavaScript frameworks (e.g., Vue, Angular).
    • Styling used for the website (embedded or from an external resource).
    • Content Delivery Networks (CDNs).
  • Open Ports and Running Services:
    • Identify open ports and other network services running on the server, aside from web services, which could potentially be exploited.
  • Website Structure:
    • Determine visible and hidden directories, links, and files on the website.
    • Understand the navigation structure of the pages.
  • Entry Points:
    • Identify where the application receives input for specific actions, such as displaying, storing, or deleting data. These input points are normally used in HTML forms or pre-prepared links on a page. Input data is transmitted via parameters in POST/GET/PUT requests.

Methods for finding this information include the following items:

  • Analysis of Loaded HTML/JS Code in the Browser: Examining the code that the browser downloads.
  • Analysis of Server Response HTTP Headers: Inspecting the headers sent by the server.
  • Spidering/Crawling: Automated web browsing to determine the site's structure, directories, files, and links.
  • Dictionary Brute-Forcing: Searching for hidden directories, files, and parameters using a large list of common names.
  • Online Services which constantly scanning websites; services like Shodan and Censys fall into this category.
  • Google Dorking: Using specific Google search queries to find information about network devices.
  • Internet Archive: Older or deleted pages can be found in the archive, which also takes snapshots of pages.
  • Analysis of Error and Debugging Pages: Sending malformed or incorrect requests to trigger error messages, which may contain valuable information.

These methods help gather the specified information needed for vulnerability testing and penetration testing.

You may also come across terms like Web Fingerprinting and Web Enumeration. The goal of Web Fingerprinting is to identify the type and version of the server, while Web Enumeration involves discovering the web technologies used on a server. In this chapter, we will focus on these two tasks.