"LOL you're not executing that": Detecting Malicious LOLBin Commands

LOLBins (living off the land binaries) are executable files that are already present in the user environment, LOLBins (living off the land binaries) are executable files that are already present in the user environment, considered non-malicious, and able to be misused by an attacker for malicious purposes. These binaries are either pre-installed as part of the operating system (e.g., rundll32.exe) or are installed by the user as part of legitimate software (e.g. PSexec). They can be repurposed or exploited by the attacker in order to perform malicious tasks such as payload delivery and remote code execution. In these cases, often the only artifact that is available to us to detect this activity is the command line used to execute the malicious activity. In this blog post, we discuss attacks that use LOLBins, why they are a sophisticated threat, and the strategy that SophosAI is employing to detect and respond to them.

Why should we care about LOLBin attacks?

LOLBin-based attacks, a subset of fileless malware attacks, are on the radar of almost every security vendor as an emerging threat. We’ve seen an increasing number of malware campaigns employ attack techniques involving LOLBins at some stage of the intrusion.

They are appealing to attackers for several reasons:

They have a small footprint on the user system. All the malicious artifacts used as part of the attack are in memory, making them hard to find and trace.
They are more likely to go undetected. Since the executable performing the malicious activity is also used for legitimate purposes, the attack is more likely to fly under the radar of anti-malware systems. It can be hard to distinguish malicious activity from legitimate activity.
They generally have broad permissions and authority to make system-wide changes. A lot of the LOLBins targeted by attackers are system utilities like PowerShell or WMI (Windows Management Instrumentation). These applications have broad capabilities to make changes in the endpoint, grant permissions, modify running processes, etc.
They have remote access and code execution capabilities. Some LOLBins allow attackers to supply their own code, which is then executed indirectly through the LOLBin itself (for example, the windows command line utility and PowerShell). This can provide further avenues for attackers to exploit vulnerabilities and perform malicious tasks.
They support obfuscation. Utilities like PowerShell provide several ways for an attacker to obfuscate input, such as by encoding the input in base64, building code to be executed on the fly from strings, etc. This can provide multiple ways for an attacker to achieve the same objective, making detection of malicious activity harder.

If you want to read more about specific attacks involving LOLBins that were detected by SophosLabs researchers, see the blog posts here, here, and here, where the execution of such attacks is explained in detail.

How do we detect and stop these attacks?

Sophos has been working on strengthening our products’ defenses against fileless attacks in general in multiple ways, such as enabling AMSI protection for all customers running Intercept X and creating a heap-heap memory allocation barrier with Dynamic Shellcode Protection.

However, sophisticated attacks involving LOLBins in the wild often cannot be detected using a single catch-all solution but require a layered approach. Attackers constantly find new ways to exploit LOLBins and evade defenses, including bypassing AMSI detection entirely, for instance. As we discussed earlier, detecting LOLBin-based attacks is a hard problem for several reasons. Fortunately, machine learning (ML) can help reduce the complexity of the problem and improve our capability to detect more attacks.

Our URL model has demonstrated its capabilities in distinguishing between malicious and legitimate website URLs. A LOLBin model, supplied with the command line executed on a user endpoint, could similarly distinguish between malicious and legitimate commands. At SophosAI, we have designed a system, incorporating such an ML model, for detecting malicious command lines. The research for the ML model is ongoing, and the analysis of the performance of the ML model and the overall system in general will be included in a follow-up blog post.

Why such an elaborate system? Can’t we just train a model and call it a day?

Data and Ground Truth

We use command line execution data from customer telemetry in our ML system. However, one of the challenges is accurately labeling this data. If you read our blog post series about building ML models, you know that consistent and accurate ground truth labels are extremely important for the success of any ML deployment. Fortunately, we have multiple sources within the organization from which we can extract ground truth information for any given command line. Some of these are:

Case information from our Managed Threat Response team: When suspicious activity is detected in a customer endpoint that is part of the Sophos Managed Threat Response program, a case is created and then reviewed by a threat analyst. We use command lines and associated case information from these cases to infer whether a command line had malicious intent or not.
Detection information from SophosLabs’ AMSI detection engine: The AMSI detection engine built by SophosLabs collects telemetry information, which includes the command line that was executed and the detection name for the malicious activity that it encountered.
Root Cause Analysis reports: Reports from the Root Cause analysis generated by the Sophos Data Recorder when malicious activity is detected on an endpoint. This includes malicious command lines that were executed as part of the attack.
Labels for embedded URLs in the command, if any, cross-referenced against internal and external sources
Labels for associated files (Documents/PE files) that executed the command, if any, cross-referenced against internal and external sources.

Additionally, we have a continuous process where we provide human threat analysts from the Managed Threat Response team a curated set of commands on a periodic basis, which they then investigate and hand-label for us. We use this process to prioritize the labeling of those samples that cannot be labeled using other sources. During the model training process, we also use this process to validate the performance of the model.

Detection at scale

One of the biggest challenges we face when trying to train a machine learning model to detect malicious command lines is the sheer scale of the problem, in terms of how many command lines are executed on customer endpoints every single day. Relatively, attacks that employ LOLBin-based attacks are few and far between, turning this into a “find a tiny needle in an enormous haystack” problem. In such cases, machine learning problems tend to produce a lot of false positives. The scale of the problem also makes manual review and verification of model results difficult.

However, there are some characteristics common to command lines:

They often have a file-path or other artifact as one of the arguments, that changes based on the user environment or machine, such as usernames or system GUIDs in file paths.
The order of arguments in the command change, or a single argument has a slightly different value.
They can have randomly generated strings in embedded URLs or file paths.
They can be obfuscated on purpose by attackers (variable assignment, invocation of string expressions created on the fly, etc).

Here’s an example powershell command that illustrates how there can be several near duplicates that perform the same task. A minor variation in the file path causes similar command lines to be treated as separate unique rows in the dataset we collect.

In order to reduce the impact of this problem, we use a combination of two strategies:

Normalization: During normalization, we run a series of regex-based substitutions on the command line that replaces common parts of the command that lead to near duplicates.
Clustering: We then group command lines based on content similarity, using MinHash based clustering. This groups similar command lines into the same cluster. When compiling a dataset for training the ML model, we sample a few commands from each cluster, proportional to the logarithm of the cluster size.

Identifying variations of previously seen commands

Another advantage that clustering gives us is the ability to do an approximate lookup – given a command, we can find the command lines that are most like it in endpoint telemetry. If the command line has already been seen, then we have more information about it in order to make a better decision. Since a lot of the command lines we see are frequently and repeatedly used for configuration changes and other admin tasks, we can infer the intent of a large chunk of input data without having to run it through the machine learning model. As a result, we can limit ML-based inference to a small fraction of data that has not been seen previously, and further drive down false positive rates.

Clustering is also a useful tool in simplifying the workflow of the Security Operations Center (SOC). Analysts can look at a grouped set of commands together and make triage/escalation decisions for the whole cluster, as opposed to having to investigate and make decisions on individual commands. Awalin Sopan from the SophosAI team presented her work on a UI workflow that enables analysts to manage clustered alerts in CAMLIS 2021. Stay tuned for the presentation video and publication about this work, to be released soon!

Conclusion

In this blog post, we’ve talked about how LOLBin based fileless malware attacks are an emerging threat that is also sophisticated and hard to detect. We introduced a multi-layered approach to detecting LOLBin attacks, operating on the command lines used to execute them. Our system is designed to complement and support rule-based strategies in helping MTR Security Operations Center (SOC) and customer analysts to better detect these attacks. In addition to providing an extra data point in the form of a score from an ML model, the system we’ve designed also improves analyst workflow by grouping similar command lines together and reducing the scale and complexity of the problem at the SOC.