Heuristics in cybersecurity
It is well known that anti-malware protecting us from various malicious files – viruses, worms, Trojans, and other malware – work by scanning files using the signatures they already have stored in the malware “identifier” database. A signature can be as simple as searching for a string or as complex as a small macro or subroutine that tells the scanning engine what to look for and where to find it. Signature scanning works very well for detecting threats that have already been identified, but how do antimalware programs detect new, previously unseen threats? One of the methods used is heuristics.
Heuristics is derived from the Ancient Greek word for “discover,” heuristic analysis is an approach to discovering, learning, and solving problems that uses rules, estimates, and knowledge-based guesswork to find a satisfactory solution to a specific problem. In computer science, a heuristic algorithm is an algorithm that is able to create an acceptable solution to a problem in many practical scenarios, along the lines of a general similarity, but for which there is no unambiguously perfect proof of its existence. It is based on correctness of similarity proposed in solution.
While this method of troubleshooting may not be perfect, it is very effective when applied to computer processes where a quick response or timely warning is required based on intuitive judgment. It works consistently, quickly and/or produces good results. But for anti-malware, heuristics can also have a more specialized meaning: heuristics refers to a set of broader rules as opposed to a specific set of instructions a program uses to detect malicious behavior – without having to uniquely identify the responsible program (like the classic signature-based “virus scanner” ).
The heuristic engine used by an anti-malware program may contain rules for finding programs that:
- try to copy themselves to other programs (in other words, a classic computer virus),
- try to write data directly to the disk,
- try to remove traces in memory after completing the performance,
- decrypt themselves after startup (a method often used by malware to avoid signature scanners),
- connect to the TCP / IP port and listen for instructions over a network connection (this is what a bot sometimes also called a drone or zombie does),
- manipulate (copy, delete, modify, rename, replace, etc.) files required by the operating system,
- have a set of activities that is similar to programs already known to be malicious
Additionally, the analyzes may include techniques such as:
- File analysis – When analyzing files, the scanning software carefully examines the file to determine its purpose, structure and operation. For example, if a file aims to remove certain files, it can be flagged as a virus.
- File emulation – Also known as dynamic scan or sandbox testing, file emulation tests a file in a controlled virtual environment to see what happens. If the file behaves like a virus, it’s probably a virus.
- Signature Correlation Detection – Designed to locate different varieties of a virus, signature correlation detection uses previous virus definitions to detect viruses from the same family.
Different heuristic rules may have different weight (and therefore score higher) than others, which means that matching one specific rule creates an incident more critical than a few matches related to other rules. The rules with the most importance usually sandbox the suspect file for further analysis.
In general, what is not deterministic (you know what to expect in terms input-output) is heuristic. And the undoubted advantage of heuristic code analysis is that it can detect not only variants (modified forms) of existing malicious programs, but also new, previously unknown malicious programs – zero days. When combined with other ways to search for malware, such as signature detection, behavioral monitoring, and reputation analysis, heuristics can provide impressive accuracy. This means that it correctly detects a large portion of real malware and exhibits a relatively low rate of false-positives. Remember that relatively – in the case of the similarity algorithm – this is a very relative concept.