Adarsh Kyadige

Research Manager

Adarsh oversees the Research wing of the Sophos AI team, where he has been working since 2018 at the intersection of Machine Learning and Security. He earned a Masters degree in Computer Science, with a specialization in Artificial Intelligence and Machine Learning, from UC San Diego. His interests and responsibilities involve applying Deep Learning to Cybersecurity, as well as orchestrating pipelines for large scale data processing. In his leisure time, Adarsh can be found at the archery range, tennis courts, or in nature. His latest research can be found on Google Scholar.

@adarshkyadige

LOLBins (living off the land binaries) are executable files that are already present in the user environment, LOLBins (living off […]

Machine Learning has seen a huge boom in the past decade, with many industries now investing heavily in Machine Learning […]

In the wild, we often see that malware in user systems persists well hidden in obfuscated or randomized file locations. […]

Machine learning (ML) used for static portable executable (PE) malware detection typically employs per-file numerical feature vector representations as input […]

Comparative Sophos X-Ops testing not only indicates which models fare best in cybersecurity, but where cybersecurity fares best in AI.

The conference on machine learning in cybersecurity is key to open exchange of research and knowledge.

Machine learning (ML) used for static portable executable (PE) malware detection typically employs per-file numerical feature vector representations as input with one or more target labels during training. However, there is much orthogonal information that can be gleaned from the \textit{context} in which the file was seen. In this paper, we propose utilizing a static source of contextual information — the path of the PE file — as an auxiliary input to the classifier. While file paths are not malicious or benign in and of themselves, they do provide valuable context for a malicious/benign determination. Unlike dynamic contextual information, file paths are available with little overhead and can seamlessly be integrated into a multi-view static ML detector, yielding higher detection rates at very high throughput with minimal infrastructural changes. Here we propose a multi-view neural network, which takes feature vectors from PE file content as well as corresponding file paths as inputs and outputs a detection score. To ensure realistic evaluation, we use a dataset of approximately 10 million samples — files and file paths from user endpoints of an actual security vendor network. We then conduct an interpretability analysis via LIME modeling to ensure that our classifier has learned a sensible representation and see which parts of the file path most contributed to change in the classifier’s score. We find that our model learns useful aspects of the file path for classification, while also learning artifacts from customers testing the vendor’s product, e.g., by downloading a directory of malware samples each named as their hash. We prune these artifacts from our test dataset and demonstrate reductions in false negative rate of 32.3% at a 10−3 false positive rate (FPR) and 33.1% at 10−4 FPR, over a similar topology single input PE file content only model.

Adarsh Kyadige

Blog Posts

“LOL you’re not executing that”: Detecting Malicious LOLBin Commands

ML Expectation vs. Reality, Part 1: Don’t build a house on sand!

The File path Model: Using Context To Help Convict Malware

Presentations

CAMLIS 2023 – Playing Defense: Benchmarking Cybersecurity Capabilities of Large Language Models

BSides LV 2022: Weeding Out Living-off-the-land Attacks at Scale

Learning from Context: A Multi-view Deep Learning Architecture for Malware Detection

Publications

Benchmarking the Security Capabilities of Large Language Models

Sophos AI team to present at CAMLIS

Learning from Context: Exploiting and Interpreting File Path Information for Better Malware Detection