A machine learning approach to intrusion detection

A machine learning approach to intrusion detection
By Gregory Shearer
Dec 18, 2019
No more needle in a haystack: combining machine learning with human analysts helps to maximize the efficiency of intrusion detection.

For human analysts, one of the most challenging aspects of network security management is examining the massive volume of alerts generated by large-scale intrusion detection systems (IDS). Although these IDS alarms may identify unauthorized network traffic, they are also known to frequently misclassify normal and background traffic as malicious.

Machine learning can assist by automating certain intrusion detection functions. But at present, machine learning is not capable of interpreting the “why” of which events require human intervention and which do not. Given these complexities, how can you be sure you’re focusing on the true cyber threats?

Take into account organizational policy and operational environments

The core function of an IDS is to look for events that are malicious or concerning—not just ones that are sort of strange, but those that you truly have to worry about and respond to. In large organizations, there’s likely to be a response team that produces reports when they notice a malicious event. But first they have to be notified of an alert, and that’s where an IDS comes in.

An organizational policy is typically defined by the operational environment that the organization works in. That policy will tend to affect which alerts you have an interest in—those that you think are going to be malicious and affect your organization’s ability to operate. Likewise, changes to organizational policy and operational environments may also influence which alarms are actively reported on versus those that are ignored.

Differentiate between signature-based vs. anomaly-based models

When monitoring computer network traffic for suspicious activity—such as cyberattacks and intrusions—an IDS uses two general classes of detection models. Signature-based models detect malicious behavior by searching for predefined features of an attack, otherwise known as attack signatures. Anomaly-based models detect irregular characteristics in data, which are inconsistent with previously observed patterns and may be indicative of malicious behavior.

Signature-based detection works if signatures are relevant and up to date, false positive levels are low or easily sorted, and policy is applied consistently. Anomaly-based detection presents other challenges: definition of abnormal traffic may not equate to malicious traffic, resulting in false positive levels that are typically very high.

Beware false positives

False positives are the main problem facing customer IDS support. An IDS can produce thousands of alarms per day, many of which are likely to be either false positives or true alarms of low value. As a result, it can become difficult for administrators or automated network defense programs to distinguish between the two, resulting in either overzealous action or missed true alarms. Alert prioritization is needed in order to learn from past incidents and improve efficiency when similar alerts are encountered again.

You may assume that an alert is always investigated when an IDS sounds an alarm. But that’s not the case. In a large organization, it’s too costly to investigate each alert. Instead you have to look for the ones that are really interesting. Analysts will sift through and decide, based on their expertise and intuition, which alerts are truly meaningful. This human analysis is critically important in network security—and you don’t want to waste it on insignificant or false positive events.

Use human analysts and machine learning together to make IDS smarter and more effective

Humans and computers have very different—and complementary—strengths. Machine learning is fast and thorough, whereas humans possess a wider knowledge base and can understand the mission. We know that we’re trying to protect our organization and maintain business continuity—those ideas make sense to us. But a computer doesn’t understand value-based concepts; it just sees 1s and 0s, packets, and alerts.

While a machine can read very fast and reliably return results, you need to be able to direct it and tell it what to do. The responsibility of responding to indications of malicious behavior ultimately rests with a human network defender who must interpret the output of an IDS. Using machine-based intelligence for the low-hanging fruit helps streamline the process. That way, the analyst can spend more of their time looking for strange and interesting events that could ultimately signal true malicious intent.

See how ICF leverages the Elastic Stack to bring next-generation cyber analytics to clients.

Meet the author
  1. Gregory Shearer, Senior Software Systems Engineer, Cybersecurity
File Under

Subscribe to get our latest insights