Richard Harang, Author at Sophos AI

Targeted phishing emails are a major cyber threat on the Internet today and are insufficiently addressed by current defenses. In this paper, we leverage industrial-scale datasets from Sophos cloud email security service, which defends tens of millions of customer mailboxes, to propose a novel Transformer-based architecture for detecting targeted phishing emails. Our model leverages both natural language and email header inputs, is more computationally efficient than competing transformer approaches, and we show that it is less prone to adversarial attacks which deliberately replace keywords with typos or synonyms.

Introduction The machine learning-based detection technologies we build at Sophos AI rely on many information sources, including binary programs, system […]

Gift cards are a favorite way for scammers to squeeze money out of their victims. Unlike wire or bank transfers, […]

The Sophos AI team is excited to announce the release of SOREL-20M (Sophos-ReversingLabs – 20 million) – a production-scale dataset […]

So, you’ve followed the advice in Part 1 of this series. Now you’ve got a nice big data set and you’re pretty sure that […]

Introduction When we move machine learning models from the lab to the real world, tracking and evaluating model performance becomes […]

Attention conservation notice: This is a slightly expanded version of a Twitter thread I posted back in January 2020. If […]

Author Richard Harang

SOREL-20M: A Large Scale Benchmark Dataset for Malicious PE Detection

CatBERT: Context-Aware Tiny BERT for Detecting Targeted Social Engineering Emails

A machine learning approach to inferring the maliciousness of unknown IP addresses, autonomous systems, and ISPs

How SophosAI Stops BEC gift card scams

Sophos-ReversingLabs (SOREL) 20 Million sample malware dataset

ML Expectation vs. Reality, Part 2: Doing the Actual Analysis

How much malware is out there, anyway?

Debugging Deep Learning Models

Malware Data Science: Attack Detection and Attribution

De-anonymizing programmers via code stylometry