Richard Harang and Felipe Ducau
Security is a constant cat-and-mouse game between those trying to keep abreast of and detect novel malware, and the authors attempting to evade detection. The introduction of the statistical methods of machine learning into this arms race allows us to examine an interesting question: how fast is malware being updated in response to the pressure exerted by security practitioners? The ability of machine learning models to detect malware is now well known; we introduce a novel technique that uses trained models to measure “concept drift” in malware samples over time as old campaigns are retired, new campaigns are introduced, and existing campaigns are modified. Through the use of both simple distance-based metrics and Fisher Information measures, we look at the evolution of the threat landscape over time, with some surprising findings. In parallel with this talk, we will also release the PyTorch-based tools we have developed to address this question, allowing attendees to investigate concept drift within their own data.