Security data science – Getting the fundamental right

A data science team is now table stakes for most security operations, however data science for security poses unique challenges that are different from both traditional data science as well as traditional security. Rather than clean data sets with reliable ground truth labels, obvious metrics, and clear featurization strategies, security data sets tend to be messy, ambiguous, and noisy, with metrics that can be difficult to operationalize, and require significant expert knowledge build good features.

In this self-contained and broadly accessible talk, drawing from real-world experience leading basic research in a global anti-malware/security company, we’ll cover everything *but* the modeling bit of security data science, and give attendees a roadmap for how to maximize their effectiveness when starting their own security data science teams and/or projects. From how to collect, clean, and label security-relevant data, how to approach feature construction and extraction, organizing and managing reproducible experiments, to finally addressing how to manage evaluation both for head-to-head comparison of candidate models as well as mapping model metrics to business outcomes, we’ll cover the major pitfalls in both doing security data science with an experienced team as well as the areas that ‘traditional’ data scientists often have trouble with.