Sophos AI Pushing the boundaries of machine learning for information security
Sophos Artificial Intelligence was formed in 2017 to produce breakthrough technologies in data science and machine learning for information security. We're currently focused on machine learning, large scale scientific computing architecture, human-AI interaction, and information visualization. Here we present our current projects, our team, our conference talks, and our publications.
Using Undocumented Hardware Performance Counters to Detect Spectre-Style Attacks
In this paper, we’ll first introduce our version of Spectre variant 4 with evasive changes that can bypass any detections using conventional cache miss, branch miss, and branch misprediction counters. We’ll then show how our model using select undocumented counters is able to detect this new edited variant, and how it is also able to detect a novel Spectre implementation submitted to Virus Total.
SOREL-20M: A Large Scale Benchmark Dataset for Malicious PE Detection
Analyzing Security ML Models with Imperfect Data in Production
Check out our paper to see how the power of visualization fulfilled the operational needs of our industry research team to detect and resolve the frequently seen issues in our productionized operational security models. We described the full step-by-step design of the user interface and shared the lessons we learned, and demonstrated how we used the system. We added multiple simple views rather than one complex view to support data scientists’ workflow while keeping it simple for high-level users. We focused on finding trends and anomalies in data feeds relevant to the models. A combination of several charts enabled the team to ask questions, verify their hypotheses and generate insights.
CatBERT: Context-Aware Tiny BERT for Detecting Targeted Social Engineering Emails
Targeted phishing emails are a major cyber threat on the Internet today and are insufficiently addressed by current defenses. In this paper, we leverage industrial-scale datasets from Sophos cloud email security service, which defends tens of millions of customer mailboxes, to propose a novel Transformer-based architecture for detecting targeted phishing emails. Our model leverages both natural language and email header inputs, is more computationally efficient than competing transformer approaches, and we show that it is less prone to adversarial attacks which deliberately replace keywords with typos or synonyms.