Enterprise networks are in constant danger of being breached by cyber-attackers, but making the decision about what security tools to deploy to mitigate this risk requires carefully designed evaluation of security products. One of the most important metrics for a protection product is how well it is able to stop malware, specifically on” zero”-day malware that has not been seen by the security community before. However, evaluating zero-day performance is difficult, because of larger number of previously unseen samples that are needed to properly measure the true and false positive rate, and the challenges involved in accurately labeling these samples. This paper addresses these issues from a statistical and practical perspective. Our contributions include first showing that the number of benign files needed for proper evaluation is on the order of a millions, and the number of malware samples needed is on the order of tens of thousands. We then propose and justify a time-delay method for easily collecting large number of previously unseen, but labeled, samples. This enables cheap and accurate evaluation of zero-day true and false positive rates. Finally, we propose a more fine-grain labeling of the malware/benignware in order to better model the heterogeneous distribution of files on various networks.
As antivirus and network intrusion detection systems have increasingly proven insufficient to detect advanced threats, large security operations centers have moved to deploy endpoint-based sensors that provide deeper visibility into low-level events across their enterprises. Unfortunately, for many organizations in government and industry, the installation, maintenance, and resource requirements of these newer solutions pose barriers to adoption and are perceived as risks to organizations’ missions. To mitigate this problem we investigated the utility of agentless detection of malicious endpoint behavior, using only the standard built-in Windows audit logging facility as our signal. We found that Windows audit logs, while emitting manageable sized data streams on the endpoints, provide enough information to allow robust detection of malicious behavior. Audit logs provide an effective, low-cost alternative to deploying additional expensive agent-based breach detection systems in many government and industrial settings, and can be used to detect, in our tests, 83% percent of malware samples with a 0.1% false positive rate. They can also supplement already existing host signature-based antivirus solutions, like Kaspersky, Symantec, and McAfee, detecting, in our testing environment, 78% of malware missed by those antivirus systems.
Malware remains a serious problem for corporations, government agencies, and individuals, as attackers continue to use it as a tool to effect frequent and
costly network intrusions. Today malware detection
is still done mainly with heuristic and signature-based
methods that struggle to keep up with malware evolution. Machine learning holds the promise of automating
the work required to detect newly discovered malware
families, and could potentially learn generalizations
about malware and benign software (benignware) that
support the detection of entirely new, unknown malware
families. Unfortunately, few proposed machine learning based malware detection methods have achieved the
low false positive rates and high scalability required to
deliver deployable detectors.
In this paper we introduce an approach that addresses these issues, describing in reproducible detail
the deep neural network based malware detection system that Invincea has developed. Our system achieves
a usable detection rate at an extremely low false positive rate and scales to real world training example volumes on commodity hardware. Specifically, we show
that our system achieves a 95% detection rate at 0.1%
false positive rate (FPR), based on more than 400,000
software binaries sourced directly from our customers
and internal malware databases. We achieve these results by directly learning on all binaries, without any
filtering, unpacking, or manually separating binary files
into categories. Further, we confirm our false positive
rates directly on a live stream of files coming in from
Invincea’s deployed endpoint solution, provide an estimate of how many new binary files we expected to see
a day on an enterprise network, and describe how that
relates to the false positive rate and translates into an
intuitive threat score.
Our results demonstrate that it is now feasible to
quickly train and deploy a low resource, highly accurate
∗Authors contributed equally to the work.
machine learning classification model, with false positive rates that approach traditional labor intensive signature based methods, while also detecting previously
unseen malware. Since machine learning models tend
to improve with larger data-sizes, we foresee deep neural network classification models gaining in importance
as part of a layered network defense strategy in coming
This paper proposes a method for identifying and visualizing similarity relationships between malware samples based on their embedded graphical assets (such as desktop icons and button skins). We argue that analyzing such relationships has practical merit for a number of reasons. For example, we find that malware desktop icons are often used to trick users into running malware programs, so identifying groups of related malware samples based on these visual features can highlight themes in the social engineering tactics of today’s malware authors. Also, when malware samples share rare images, these image sharing relationships may indicate that the samples were generated or deployed by the same adversaries.
To explore and evaluate this malware comparison method, the paper makes two contributions. First, we provide a scalable and intuitive method for computing similarity measurements between malware based on the visual similarity of their sets of images. Second, we give a visualization method that combines a force-directed graph layout with a set visualization technique so as to highlight visual similarity relationships in malware corpora. We evaluate the accuracy of our image set similarity comparison method against a hand curated malware relationship ground truth dataset, finding that our method performs well. We also evaluate our overall concept through a small qualitative study we conducted with three cyber security researchers. Feedback from the researchers confirmed our use cases and suggests that computer network defenders are interested in this capability.