Analyzing Security ML Models with Imperfect Data in Production

Check out our paper to see how the power of visualization fulfilled the operational needs of our industry research team to detect and resolve the frequently seen issues in our productionized operational security models. We described the full step-by-step design of the user interface and shared the lessons we learned, and demonstrated how we used the system. We added multiple simple views rather than one complex view to support data scientists’ workflow while keeping it simple for high-level users. We focused on finding trends and anomalies in data feeds relevant to the models. A combination of several charts enabled the team to ask questions, verify their hypotheses and generate insights.

Awalin Sopan
Konstantin Berlin

CatBERT: Context-Aware Tiny BERT for Detecting Targeted Social Engineering Emails

Targeted phishing emails are a major cyber threat on the Internet today and are insufficiently addressed by current defenses. In this paper, we leverage industrial-scale datasets from Sophos cloud email security service, which defends tens of millions of customer mailboxes, to propose a novel Transformer-based architecture for detecting targeted phishing emails. Our model leverages both natural language and email header inputs, is more computationally efficient than competing transformer approaches, and we show that it is less prone to adversarial attacks which deliberately replace keywords with typos or synonyms.

Younghoo Lee
Joshua Saxe
Richard Harang