< Back
Identify, in a continuous way, your web attack surface exposed on the Internet, using Open-Source Software

Tags:

Thales Cyber Solutions Belgium Risk and threat evaluation Detect and respond
21 May 2024

Identify, in a continuous way, your web attack surface exposed on the Internet, using Open-Source Software

Glossary

đź“– This section contains the collection of terms used in this post:

  • Attack surface: The set of points on the boundary of a system, a system element, or an environment where an attacker can try to enter, cause an effect on, or extract data from it (source).
  • Attack vector: An attack vector, or threat vector, is a way for attackers to enter a network or system (source).
  • Shadow IT: It is the usage of IT-related hardware or software by a department or individual without the knowledge of the IT or security group within the organization (source).
  • Reconnaissance: Techniques and methodology necessary to gather information about your target system secretly (source).
  • Red Team exercises: An exercise, reflecting real-world conditions, that is conducted as a simulated adversarial attempt to compromise organizational missions and/or business processes to provide a comprehensive assessment of the security capability of the information system and organization (source).
  • Domain name: A label that identifies a network domain using the Domain Naming System (source).
  • Certificate transparency: A framework for publicly logging the existence of Transport Layer Security (TLS) certificates as they are issued or observed in a manner that allows anyone to audit CA activity and notice the issuance of suspect certificates as well as to audit the certificate logs themselves (source).
  • External Attack Surface Management: External attack surface management (EASM) helps organizations identify and manage risks associated with Internet-facing assets and systems. The goal is to uncover threats that are difficult to detect, such as shadow IT systems, so you can better understand your organization’s true external attack surface (source).
  • Content Management System: A content management system (CMS) is computer software used to manage the creation and modification of digital content (source).
  • Network ACL: A network access control list (ACL) is made up of rules that either allow access to a computer environment or deny it. In a way, an ACL is like a guest list at an exclusive club (source).
  • Vulnerability scanner: A tool (hardware and/or software) used to identify hosts/host attributes and associated vulnerabilities (source).
  • Configuration management database: Also called CMDB, it is a file that clarifies the relationships between the hardware, software, and networks used by an IT organization (source).

Context 

🌏 Since few years, it has become easy to deploy new services to handle a new business opportunity. Popular cloud providers provide cheap and scalable services to quickly deploy a web-based application. The direct consequence is that, today, every company exposes more and more services on the Internet. From a business perspective, cloud-based services are a great lever to transform an opportunity or to quickly deploy a new service to consumers (client or partners). Shadow IT activities were also boosted by such context.

🤔 From a security perspective, it has led to a loss on the view of the attack surface exposed by the company. Indeed, all cloud providers propose features to secure and monitor the services exposed…But it assumes that the company is aware of the existence of the service and on which cloud provider is deployed.

Objective of the post

🎯 This post proposes an idea, alongside a technical proof of concept, to identify assets (an assets here is a web-based service) exposed on the Internet to build an attack surface inventory, all of this, in a continuous, maintainable, and, as much as possible, automated way.

💡 The idea proposed was using only open-source tools and sources of information. The goal was to allow a company to get started in taking back control over its’ exposed attack surface.

đź’¬ Note that such activity is also called the identification of the “External Attack Surface”. Below is a visual representation of an attack surface exposed by a company:

Attack Surface

 

Overview of the idea

đź“‹ The proposed idea was based on the following elements:

  • Use public sources of information.
  • Use the reconnaissance techniques used by attackers.
  • Use the same open source and free set of tools as the one used by attackers.

âš’ The elements above were used to create an Assets Collection Pipeline (called ACP from now on) flow like the following (read from left to right):

ACP Flow

 

🤝 This flow was adapted from the initial concept, created by Moses Frost, for its Red Team exercises.

đź’» The flow above was taking a company base domain name as the input data named seed, and once executed, the following information was gathered:

  • Domains & subdomains used or owned.
  • Cloud providers used.
  • Services exposed.

🔬 Such information was used to:

  • Build a real, and up-to-date, inventory of the services exposed to the Internet.
  • Validate unexpected services identified.
  • Feed the next run.

Source of information and tools

📡 The ACP used the following public source of information:

âš’ The following, free and open-source tools, were used to leverage the data gathered and reach the objective of the ACP:

  • Curl: To perform raw HTTP requests.
  • Nmap: To discover open ports.
  • JQ: To process JSON data.
  • Dnsx & SubFinder: To discover new subdomains.
  • Nuclei: To discover exposed services on domains & subdomains.
  • System scripting and Python for the logic.

Proof of concept

👨‍💻 To validate the proposed idea, a proof of concept of an ACP was created (named MyACP) using the base domain name of Thales as a seed Ă  thalesgroup.com

âš’ My ACP was the following:

My ACP Flow

 

🏭 Important point to keep in mind: Creating an effective ACP is an incremental process, missed items ratio will decrease over the different iterations. Indeed, the results of an iteration of the ACP will be used to tune the ACP and feed the next iteration.

đź’¬ CSV format was used to represent output data to have a flat format using columns allowing quick consultation in Microsoft Excel. It was a personal choice; you can use any format you want to represent the data gathered in your ACP.

MyACP: Hosts

âš’ For this step, the following flow was applied:

My ACP flow hosts steps

 

👨‍💻 The following scripts were created to perform the corresponding processing:

  • utils.shĂ  Utility script used by other scripts: Define constants and shared functions
  • hosts.sh

👨‍💻 Execution of the script “hosts.sh”, data were saved into a file named “hosts.csv”:

My ACP flow hosts script output 1

 

đź‘€ Overview of the results, using miller to display CSV content:

My ACP flow hosts script output 2

 

đź‘€ Overview of the cloud providers used by the hosts identified:

My ACP flow hosts script output 3

 

🎯 So, at this point, hosts identified from the seed were gathered into the file “hosts.csv”. This file was used for the next step of my ACP.

MyACP: Applications

đź’ˇ For the POC, a focus was made on the web applications and the following DB SQL/NoSQL systems:

  • SQL Server, Oracle DB, MySQL, PostgreSQL.
  • Neo4J, Cassandra, MongoDB.

âš’ For this step, the following flow was applied against identified hosts:

My ACP flow applications steps

 

👨‍💻 The following script was created to perform the corresponding processing:

👨‍💻 Execution of the script “applications.sh”, data were saved into a file named “applications.csv”:

My ACP flow applications script output 1

 

đź‘€ Overview of the results:

My ACP flow applications script output 2

 

🎯 So, at this point, applications (more services here) identified from the collection of hosts were gathered into the file “applications.csv”. This file was used for the next step of my ACP.

MyACP: Detection

âś… For this step, the POC was limited to the identification of any web applications (services) to not perform any illegal action or cause problems on target side. No offensive HTTP request was performed.

âš’ For this step, the following flow was applied against identified applications (services in terms of host: port tuple):

My ACP flow detection steps

 

👨‍💻 The following script was created to perform the corresponding processing:

👨‍💻 Execution of the script “detection.sh”, data were saved into files named “detection.csv” and “detection-nuclei.json”:

My ACP flow detection script output 1My ACP flow detection script output 4

 

đź‘€ Overview of the results:

My ACP flow detection script output 2My ACP flow detection script output 3

 

📚 The file “detection.csv” was containing all hosts for which, at least, one an active web application was identified. The “detection-nuclei.json” file was containing the technical details about each identified applications.

💬 The second file was also containing the information of the first one, but two different file were created for consistency of the processing as well as making the “classifier” step easier.

🎯 So, at this point, the following data were gathered based on the seed:

So, at this point, the following data were gathered based on the seed

🤔 The difference between the number of entries found for the different steps was normal. Indeed, some subdomains identified were not bound to any IP address anymore and live subdomains were not intended for non-web related activities.

đź’¬ JSON format was used for nuclei because it was easier for processing large and detailed data sets. Indeed, the nuclei output file was having a size of 87 MB:

My ACP flow detection script output 5

 

MyACP: Classifier

âš’ For this final step, the following approach was applied to process the gathered data:

My ACP flow classifier steps

 

👨‍💻 The following script was created to perform the corresponding processing. Python was used to facilitate the processing as well as the creation of selection rules for data:

đź‘€ Out of support software as well as non-expected services exposed:

My ACP flow classifier script output 1

 

đź‘€ Distribution of opened TCP web ports as well as CMS software used:

My ACP flow classifier script output 2

 

đź‘€ Distribution of the attack surface identified by attack vector:

My ACP flow classifier script output 3

 

🔬 The “classifier” step is really the one in which “you are only limited by your imagination” in term of analysis and leveraging of the gathered data. Indeed, for example you can use the information to feed:

  • A vulnerability scanner to identify vulnerabilities, missing security patches, etc. on assets identified.
  • A configuration management database (CMDB) to update your inventory of know assets.
  • An intrusion test to evaluate the security posture of a specific type of identified assets.
  • A tool, like ELK, to perform statistics and charts against the data.
  • A security incident to shutdown unwanted services.
  • And so on…

Key takeaways

đź’ˇ Open tools and sources of information can be used to identify the attack surface exposed by your company.

âš’ The idea and materials proposed in this post can be used freely, as a bootstrap, to create your own ACP.

🏭 It is not “magic”: It requires constant work to achieve a useful level of information and identification of new exposed assets.

đź“… It is important to run your ACP on a regular basis (at least once a week) to detect and be aware of new exposed assets. Ideally, use an external Internet connection to really use an external point of view and not be confused by any network ACL in place.

🔬 It is also important to review results of each run of your ACP to:

  • Enhance the detection logic.
  • Feed the next run with already identified assets.

Source code of scripts

đź’» This section contains the source code of all scripts mentioned into this post.

utils.sh

My ACP flow utils

hosts.sh

My ACP flow hosts script

applications.sh

My ACP flow applications script

detection.sh

My ACP flow detection script

classifier.py 

My ACP flow classifier script

 

Additional resources

Author

Dominique Righetto

Do you have any questions? Contact our experts!

Thales Cyber Solutions Belgium - Contact us