Android App Feature Extraction: A review of approaches for malware and app similarity detection
Abstract
This paper reviews work published between 2002 and 2022 in the fields of Android malware, clone, and similarity detection. It examines the data sources, tools, and features used in existing research and identifies the need for a comprehensive, cross-domain dataset to facilitate interdisciplinary collaboration and the exploitation of synergies between different research areas. Furthermore, it shows that many research papers do not publish the dataset or a description of how it was created, making it difficult to reproduce or compare the results. The paper highlights the necessity for a dataset that is accessible, well-documented, and suitable for a range of applications. Guidelines are provided for this purpose, along with a schematic method for creating the dataset.