Detecting malware samples with similar image sets

This paper proposes a method for identifying and visualizing similarity relationships between malware samples based on their embedded graphical assets (such as desktop icons and button skins). We argue that analyzing such relationships has practical merit for a number of reasons. For example, we find that malware desktop icons are often used to trick users into running malware programs, so identifying groups of related malware samples based on these visual features can highlight themes in the social engineering tactics of today’s malware authors. Also, when malware samples share rare images, these image sharing relationships may indicate that the samples were generated or deployed by the same adversaries.

To explore and evaluate this malware comparison method, the paper makes two contributions. First, we provide a scalable and intuitive method for computing similarity measurements between malware based on the visual similarity of their sets of images. Second, we give a visualization method that combines a force-directed graph layout with a set visualization technique so as to highlight visual similarity relationships in malware corpora. We evaluate the accuracy of our image set similarity comparison method against a hand curated malware relationship ground truth dataset, finding that our method performs well. We also evaluate our overall concept through a small qualitative study we conducted with three cyber security researchers. Feedback from the researchers confirmed our use cases and suggests that computer network defenders are interested in this capability.