Author Joshua Saxe
Lessons learned from building a 4,000+ member cybersecurity volunteer organization in four months
When I posted this tweet in March of this year, kicking off a process which would give birth to the […]
DEF CON 28 AI Village: Detecting hand-crafted social engineering emails with a bleeding-edge neural language model
Garbage in, garbage out: how purportedly great ML models can be screwed up by bad data
SeqDroid: Obfuscated Android Malware Detection Using Stacked Convolutional and Recurrent Neural Networks
A Deep Learning Approach to Fast, Format-Agnostic Detection of Malicious Web Content
Detecting Malicious URLs and Stopping the Attack Early
Any good attack-chain usually involves tricking users at some point, whether it’s getting them to run a malicious file because […]
The New Cat and Mouse Game: Attacking and Defending Machine Learning Based Software
eXpose: A Character-Level Convolutional Neural Network with Embeddings For Detecting Malicious URLs, File Paths and Registry Keys
For years security machine learning research has promised to
obviate the need for signature based detection by automatically learning
to detect indicators of attack. Unfortunately, this vision hasn’t come to
fruition: in fact, developing and maintaining today’s security machine
learning systems can require engineering resources that are comparable
to that of signature-based detection systems, due in part to the need
to develop and continuously tune the “features” these machine learning
systems look at as attacks evolve. Deep learning, a subfield of machine
learning, promises to change this by operating on raw input signals and
automating the process of feature design and extraction. In this paper
we propose the eXpose neural network, which uses a deep learning approach we have developed to take generic, raw short character strings as
input (a common case for security inputs, which include artifacts like potentially malicious URLs, file paths, named pipes, named mutexes, and
registry keys), and learns to simultaneously extract features and classify using character-level embeddings and convolutional neural network.
In addition to completely automating the feature design and extraction
process, eXpose outperforms manual feature extraction based baselines
on all of the intrusion detection problems we tested it on, yielding a 5%-
10% detection rate gain at 0.1% false positive rate compared to these
baselines.
Improving zero-day malware testing methodology using statistically significant time-lagged test samples
Enterprise networks are in constant danger of being breached by cyber-attackers, but making the decision about what security tools to deploy to mitigate this risk requires carefully designed evaluation of security products. One of the most important metrics for a protection product is how well it is able to stop malware, specifically on” zero”-day malware that has not been seen by the security community before. However, evaluating zero-day performance is difficult, because of larger number of previously unseen samples that are needed to properly measure the true and false positive rate, and the challenges involved in accurately labeling these samples. This paper addresses these issues from a statistical and practical perspective. Our contributions include first showing that the number of benign files needed for proper evaluation is on the order of a millions, and the number of malware samples needed is on the order of tens of thousands. We then propose and justify a time-delay method for easily collecting large number of previously unseen, but labeled, samples. This enables cheap and accurate evaluation of zero-day true and false positive rates. Finally, we propose a more fine-grain labeling of the malware/benignware in order to better model the heterogeneous distribution of files on various networks.