Anomaly Based Malware Detection — 1

Kabir Dhruw
4 min readJun 19, 2022

--

With the rapid development of the Internet, malware became one of the major cyber threats nowadays. Any software performing malicious actions, including information stealing, espionage, etc. can be referred to as malware. Kaspersky Labs (2017) define malware as “a type of computer program designed to infect a legitimate user’s computer and inflict harm on it in multiple ways.”

While the diversity of malware is increasing, antivirus scanners cannot fulfil the needs of protection, resulting in millions of hosts being attacked. According to Kaspersky Labs (2016), 6,563,145 different hosts were attacked, and 4,000,000 unique malware objects were detected in 2015. In turn, Juniper Research (2016) predicts the cost of data breaches to increase to $2.1 trillion globally by 2019.

In addition to that, there is a decrease in the skill level that is required for malware development, due to the high availability of attacking tools on the Internet nowadays. High availability of anti-detection techniques, as well as the ability to buy malware on the black-market result in the opportunity to become an attacker for anyone, not depending on the skill level. Current studies show that more and more attacks are being issued by script-kiddies or are automated. (Aliyev 2010).

Malware detection through standard, signature-based methods are getting increasingly difficult since all current malware applications tend to have multiple polymorphic layers to avoid detection or to use side mechanisms to automatically update themselves to a newer version for short periods of time to avoid detection by any antivirus software.

Machine learning helps antivirus software detect new threats without relying on signatures. In the past, antivirus software relied largely on fingerprinting, which works by cross-referencing files against a huge database of known malware.

The major flaw here is that signature checkers can only detect malware that has been seen before. That’s a rather large blind spot, given that hundreds of thousands of new malware variants are created every single day. Machine learning, on the other hand, can be trained to recognize the signs of good and bad files, enabling it to identify malicious patterns and detect malware — regardless of whether it’s been seen before or not. However, it is unable to detect polymorphic malware, which has the ability to change its signatures, as well as new malware, for which signatures have not been created yet. In turn, the accuracy of heuristics-based detectors is not always sufficient for adequate detection, resulting in a lot of false positives and false negatives. (Baskaran and Ralescu 2016). The need for new detection methods is dictated by the high spreading rate of polymorphic viruses.

Methods

All malware detection techniques can be divided into signature-based and behaviour-based methods. Before going into these methods, it is essential to understand the basics of two malware analysis approaches: static and dynamic malware analysis. As it implies from the name, static analysis is performed “statically”, i.e. without execution of the file. In contrast, dynamic analysis is conducted on the file while it is being executed, for example in the virtual machine/sandbox.

  1. Signature Based Malware detection — It uses virus codes/hashes to identify malware. Malware carries a unique code that is used to identify it. When a file reaches the computer, the malware scanner collects the code and sends it to a cloud-based database. The database has a vast collection of virus codes. If the file code is found in the list, the database returns a verdict that the file is malware. The anti-malware denies the file from the computer and deletes it. If there’s a new malware discovered, its code is added to the list.
  • Pros: This method of malware analysis is the fastest and most accurate when it comes to popular malware types spread across the internet.
  • Cons: But it is not the most accurate when it comes to recent/unpopular malware. Its accuracy depends on the dataset it uses.

2. Behaviour-based — Its detection lives up to its namesake by identifying malware based on behavior. Malware typically behaves differently than legitimate software. Even before it’s able to execute itself, malware may exhibit behaviors that can reveal its identity to antivirus products. Behavior-based detection involves scanning these behaviors to determine whether a piece of software is malicious.

  • Pros: This method works best for unpopular malware.
  • Cons: It lacks the ability to stop well-known malware which should be removed as soon as possible without having the need to scan for it.

--

--

No responses yet