Ajay is a Principal Software Engineer at Sophos. He has experience in building fast data architecture pipelines and prediction-based solutions involving thousands of machine learning models. He earned his Master’s degree in Computer Science from Stony Brook University and is an alumnus of Data Science Lab there.
Generating up to date, well labeled datasets for machine learning (ML) security models is a unique engineering challenge, as large data volumes, complexity of labeling, and constant concept drift makes it difficult to generate effective training datasets. Here we describe a simple, resilient cloud infrastructure for generating ML training and testing datasets, that has enhanced the speed at which our team is able to research and keep in production a multitude of security ML models.