Loss is More! Improving Malware Detectors by Learning Additional Tasks

Malware detection is perhaps the most common use case of machine learning for information security (ML-Sec/AI-Sec). ML-Sec malware detectors consist of binary classifiers trained to associate malicious and benign files with labels 1 and 0 respectively. When trained on a large labeled dataset – millions of labeled malicious and benign files in practice – these “artificially intelligent” detectors become very good at detecting malicious samples in the wild. From a human malware analyst’s perspective, this type of “learning” seems terribly flawed – humans don’t learn to distinguish malware from benignware by staring at a bunch of “malicious” and “benign” labels; they learn from many auxiliary sources which describe characteristics of files. If other related information is available, in addition to a 1/0 label and the content of the file, maybe it makes sense to try and teach machines to learn from it as well.In fact, other related information is available! Malicious/benign labels on which malware detectors are trained are typically aggregated from threat intelligence feeds, often consisting of multiple sources of external or internal telemetry. These sources often provide their own detection scores and a detection name associated with a particular malware family or signature at the very least, and in some cases even more information.In this talk, we discuss how we trained a neural network to learn from multiple sources of information simultaneously, and how by doing that, we arrived at a deep neural network (DNN) that is able to not only perform multiple auxiliary tasks, but also yields a substantial bump in detection performance over a baseline DNN trained with only malicious/benign labels. In addition to presenting compelling results, we explain, through visualization-aided analysis, why our novel DNN learned to detect samples that the baseline missed and what it learned differently.