With the rise in prevalence of more complicated cyber-attacks, such as polymorphic malware, scripting and other living-off-the-land attacks, it has become easier to bypass traditional file scanning based anti-virus defenses. To protect against this evolution of malware, orthogonal approaches to file scanning, such as behavior analysis, need to become more central in cyber defenses. Behavior analysis and detection approaches can be very powerful, as all malware eventually needs to exhibit malicious behavior in order to succeed. However, developing behavior detections, specially ML based behavior detections is difficult, because of the volume of data, diversity of behavior over exponential combination of software running on various machines, the difficulty in collecting and properly labeling a representative dataset, and understanding the full context around each behavior trace.
Our group has taken incremental approaches to this problem, in order to develop practical solutions that can be deployed to our customers. Our initial work focused on simplifying and understanding how the massive behavior data can be reduced to several prominent automatically discovered features, to prevent overfitting. We currently building on that work by exploring deep learning, such as learning on graphs, can be used to build better models. In parallel, we are also working on building simpler, but more resilient models, context models that can be used to augment current deployed static model by feeding in additional, such filepaths or processes trees.
Check out our peer reviewed publications below:
Malicious Behavior Detection using Windows Audit Logs: https://dl.acm.org/doi/abs/10.1145/2808769.2808773