In recent years much effort has been invested in the detection of malicious mobile applications for Android operating systems. Often focused on dynamic analysis sandboxing based on complex or tedious and slow processes which explode the analysis of binary code. Our research explores the potential of detecting malicious Android apps by analyzing different features of each APK and its place where they are released (markets) without analysis of binary code or dynamic analysis sandboxing.
We start by introducing a study of which antivirus is more precise in the definition of what is malware and what is not. Today, the antivirus industry has difficulty in detecting malware in a dynamic mobile world where the definition of what is malware is not clear. In fact, our research introduces new concepts highlighting how some antivirus engines do not work properly in this new environment (markets and mobile apps). In order to prove our theory, we develop a tool that allows a massive experiment involving 1.3 million applications extracted from the Google Play market. By means of various estimators from each apk (permissions, digital certificates, natural language processing, etc.), big data techniques (Amazon AWS, Storm and Spark) and machine learning algorithms, we can prove that it is possible to detect malware without analysis of binary code. By analysing 1.3 million .dex files, without decompilation, we want to demonstrate it is possible to improve ongoing investigations with opcodes patterns.
To our knowledge the quantity of applications analyzed (1.3 million) has not been addressed in any previous research.