Frequent Attack Dictionary Decision Tree Method for Advanced Signature-Based Intrusion Detection

Cyber-Physical Systems (CPS) are advanced intelligent systems that consist of networked or distributed computational elements, sensors and actuators that control physical entities and mechanisms. Nowadays, CPS have attracted much attention due to their vast applications. The precursor generation of CPS can be found in a variety of areas including aerospace, industrial infrastructures, health care, transportation, energy, Supervisory Control And Data Acquisition (SCADA) and autonomous automobile systems. The development of safe CPS needs a deep understanding of the potential impacts of successful malicious cyber-attacks. CPS security-related concerns include attacker’s efforts to intercept information captured by sensors and manipulate rules sent to actuators to disrupt, defeat and eventually cause the system to fail. Since there are physical actuators included in CPS, the damages could be vital as in autonomous automobile systems.

With the increasing utilization of CPS in infrastructural and vital parts of countries and controlling of significant industrial processes and applying rules based on the perception of sensors from the environment, securing the CPS has become essential. One of the techniques that are used in these systems is signature-based intrusion detection. One of the main problems in signature-based intrusion detection techniques is difficulty in maintaining the dictionary of attacks due to the requirement of memory size to keep all the signatures. The second issue is the low speed in detecting the attacks since packets need to be checked by each signature in the dictionary of attacks one by one.

This thesis proposes and evaluates a method to increase the speed and performance of the signature-based intrusion detection and eventually increase the CPU availability. As the first step in this work, the least valuable information is found and removed from the attack dictionary. Then the dictionary is divided to two sub-dictionaries based on the most numerous attacks and finally, it is classified in a decision tree. Removing the least valuable information and searching for the rules inside the decision tree reduces processing time. Also as the probability of each rule’s occurrence gets more when an incoming packet is matched with a signature, it will enhance the accuracy comparing to traditional ID3 (Iterative Dichotomiser 3) method. The proposed method is simulated using Python based on data sets that have been gathered from real-world networks (KDD-99). The performance enhancement and resource availability improvement are demonstrated as results of the proposed method.

Keywords: Cyber-Physical Systems, Intrusion Detection, Signature, Framework, Frequent Attack Dictionary, Entropy, Support Vector.

Chapter 1

  1. Introduction

 

The term Cyber-Physical Systems is introduced as a research model and mechanism that connects computing, communications and control. Although the exact definition is difficult due to the wide domain of these systems, CPS can be generally depicted as physical engineered systems that monitor, control and integrate their operations into a computational and communicational core. In other words, CPS are important intelligent systems consisting of computing elements, network elements and distributed sensors and actuators, which control the physical entities.

Security issues in CPS including attackers’ efforts to intercept and manipulate the data captured from the sensors and the instructions sent to the operators can disrupt and ultimately defeat the system. The importance of securing CPS is becoming more sensitive considering the rising use of CPS in vital and critical sectors, infrastructures and sensitive industrial processes.

CPS applications likely will affect the Information Technology revolution of the twentieth century. Examples of CPS include wide range of large-scaled engineered systems such as automated control systems in aviation, energy conservation, medical systems with high reliability in the field of health care, distributed robots (like telemedicine and defense systems), help with daily living (Assisted living), traffic and safety control, SCADA, advanced automotive systems, transportation, automation and smart grid systems. In all these systems, the perfect solution for the complex interaction between different physical and computational elements is very important.

In recent decades, with the wide growth of computers and Internet, too many security issues have been raised in CPS. The number of networks and their applications and security threats is increasing every day. At the same time, the creation of one hundred percent secure personal computers without weaknesses and failures from the technical point is impossible. Therefore, research on intrusion detection systems (IDS) for CPS is being pursued with great interest. One of the intrusion detection techniques that has gathered so much interest is signature-based intrusion detection. This technique is based on a set of rules and signatures in a data set or dictionary which define malicious packet and files. Each rule corresponds with one known threat.

One of the major benefits of signature-based intrusion detection is its accuracy and low false positive rate. On the other hand, it is its disadvantage that the performance is too dependent on the rules and choosing the wrong set of rules might lead to instability in the process control. In this case, many organizations might experience denial of services availability, information stealing and disruption of decision-making that all will affect CPS and may cause financial damages and even vital human injuries. Another disadvantage of the signature-based IDSs is that the received packet should be checked with all individual rules one by one. Therefore, it would increase the time required for detection process and thus makes maintenance complicated due to limited resources.

This thesis proposes a method to increase the speed and performance of the signature-based intrusion detection. First, the least valuable information is found and removed from the attack dictionary. The enhanced value dictionary is divided to two sub-dictionaries based on the most numerous attacks and finally, it is classified in a decision tree. Removing the least valuable information and searching for the rules inside the decision tree reduces processing time. When an incoming packet is matched with a signature, the probability of each rule’s occurrence is increased. These improvements increase the accuracy comparing to traditional ID3 (Iterative Dichotomiser 3) method. the proposed method is evaluated using Quality of Service parameters including accuracy, false positive and false negative.

1.1.            Purpose and organization of the thesis

In this thesis, a method for signature-based intrusion detection is proposed. The proposed method is simulated using Python 3.5 and KDD99 data set and the simulation results are analyzed to evaluate the performance of the method. The development of the proposed method is based on exploration and evaluation of existing intrusion detection techniques and frameworks and analysis of their shortcomings.

The thesis is organized into seven chapters. Chapter 2 introduces the main concepts that this thesis relies on intrusion detection techniques and their comparison. Chapter 3 provides an overview of CPS and their security frameworks. Chapter 4 reviews the background of chosen intrusion detection technique. Chapter 5 presents the main contributions of this work, namely the frequent attack dictionary decision tree for advanced signature-based intrusion detection and design of simulation experiment. This chapter is followed by analysis of simulation results and discussion in chapter 6. Finally, chapter 7 presents the conclusions and outlines the possible future research.

 

 

 

 

 

 

 

 

 

2          Chapter 4

 

  1. Signature-Based Intrusion Detection

 

Signature-based IDSs are very effective for known attacks. Since its very fast and easy to install these systems, they can start working immediately. Signature-based IDS analyzes each packet and compares the content with the dictionary of known attacks. Sometimes normal packets are mistaken as attacks (False positives) but this does not occur too often. These systems generate easy to understand reports and label each packet as normal or as one class of attacks.

Although signature-based IDSs are efficient for known attacks, their problem is that they are not able to find zero-day attacks. Hackers use zero-day attacks and attack many systems before the administrators adapt their organizations IDSs [22]. For this reason, signature-based IDS should be updated continuously. Attack reports should be collected from all over the world and as soon as a new attack is detected. In addition, security engineers should analyze and develop a solution for defending against the attack. The solution should be distributed to all the subsystems and IDS/IPS systems should be updated accordingly. However, the first subsection that has been attacked is already compromised and may have been damaged.

Figure 4-1 illustrates the detection process in signature-based techniques. This type of intrusion detection analysis packets’ features and tries to find a match in stored dictionaries which have all the known attacks recorded in them. Some articles call this technique misuse detection or pattern-based detection. Having low false positive rate is the most important advantage of signature-based IDS.

This technique reacts to known malicious behavior. In another word, they define a node as a good node if it is not exhibiting any attack signatures. The most significant issue in these topics is to create an efficient attack database. If the signature is too large it spends too much memory and if it is not detailed enough it will reduce its accuracy. Signature-based IDSs are more efficient and accurate for detecting outsider attack than other IDS techniques.

C:UsersSepidDownloadsTelegram Desktop2.png

Figure ‎4‑1. Signature-based intrusion detection process

  1.     

2.1.                       Decision tree based intrusion detection

One of the disadvantages in signature-based IDS is that it drops many packets as it does not have enough time to check each packet with all the rules in the attack dictionary. To solve this problem many researchers have used decision tree search methods. SQL injection attacks are examined in [31] and the decision tree is used for their detection. In [31] incoming HTTP requests are filtered using the tree.

A decision tree is a classification algorithm in data mining and its fundamental algorithm is called ID3 (Iterative Dichotomiser 3) [39]. This algorithm builds a tree based on the given classified data and each data is recognized and defined with its features’ values. Classification in decision trees is done in a reverse order and the main challenge is to define the key features for the nodes. Each node in the tree shows a feature from signatures and the process is done when all the signatures are registered in the tree. The leaves in the tree show ending of each connection and are labeled with a type of attack. These trees are capable of working with large amount of data which is an advantage for CPS because there is plenty of traffic flow in CPS. Besides, the high performance in decision trees makes them a good option for the real-time systems. Figure 4-2 shows a small part of a decision tree. In this Figure, the destination node is decided based on the value of source port.

Figure ‎4‑2. Sample of a small branch in decision tree

Decision tree accuracy and their ease of construction is another benefit which makes them a suitable choice for IDS in CPS. Researchers in [30] proposed a classification method using machine learning and simulated the method based on KDD data set. This algorithm has a similar function but it tries to find the attack with the minimum comparison. Although their results demonstrate a good performance improvement, the problem of having a huge amount of data in the dictionary remains unsolved.

2.2.                       Intrusion detection with filtering mechanism

Nowadays, wireless sensor networks have many security concerns. Sensor networks which are the main part of the critical infrastructures such as CPS require strong security mechanisms. These systems are typically developed in a critical environment which is very vulnerable to attacks. Traditional security methods including encryption, VPN, authentication and firewall are not adequate since they just examine external threats. Therefore, many organizations employ different IDSs to overcome this issue. It is an important step to decide which type of IDS is the best based on the organization’s architecture, size and finance. It should be considered that not every company has enough resources to afford too expensive IDSs.

In signature-based IDS the quality of security depends on the quality of the signatures in the dictionary. However, having a lot of information in the dictionary consumes plenty of resources. Furthermore, each event will be logged and each comparison will record some warning in the system. Recording too many warnings and log files is another problem in signature-based IDS which makes it hard to analyze all the information later.

Researchers in [29] have proposed a new type of signatures which combines traditional signatures with contextual information from the network. They also have defined a Hash function which drops unimportant and uncritical warnings. In their proposed method, each signature is an ordered pair as (CI, Sig) which CI contains the contextual information of the network and Sig shows the related signature. They could drop the warnings by 66.1% filtering rate.

2.1.                       Learning based intrusion detection

As noted earlier, signature-based technique can detect an attack if and only if there is a matching signature built, tested and developed for it. In many signature-based IDSs, most of the signatures should be extracted manually which is very time-consuming and it gets more probable to have errors. Therefore, the quality of the system becomes dependent on experts that have registered the signatures manually.

To solve this problem, by exploring ideas from Ecology, researchers in [28] have proposed a modified supervised learning Classifier System algorithm (UCSm). This algorithm builds dynamic adjustable signatures to detect intrusion using a supervised classification system. The classification system is an online parallel rule-based system. This method is simulated using KDD data set and approached intrusion detection accuracy rate in this method is 92.03 which has increased its basic method rate (UCS) by 9%.

Some other articles have used data mining to overcome the mentioned issue. A data mining network-based approach is proposed and evaluated in [26] to build signatures. This approach which is called Signature Apriori employs both network information and protocols and learns the signatures based on an environment attacked using many different methods of cyber-attacks. This approach compares normal and malicious packets and registers the characteristic of malicious ones.

C:UsersSepidDownloadsTelegram Desktop�00.png

Figure ‎4‑3. Signature Arpirori system; A data mining network-based approach for intrusion detection.

Figure 4-3 demonstrate the Signature Arpirori system. This system learns signatures using four main entities namely packet sensor, signature minor, rule set and associated miner. Packet sensor captures packets from the network and sends them to the signature miner. Signature miner recognizes and builds probable signatures based on Signature Apriori. The associated signature miner filters signatures and final filtered signatures will be recorded in the rule set. [26]

2.1.                       Time-based intrusion detection

Analyzing time of events can be a great help for intrusion detection. Time analysis can be done dynamically, statically or hybrid. The dynamic analysis examines incoming packet’s immediately in processing time. In contrast, static analysis, analysis data in specific periods of time. A sudden change in a period can be a sign of a potential attack.

2.2.                       Parallel intrusion detection

This section explains the difference between distributed IDS and parallel. A distributed IDS employs distributed monitoring system from different points of the network. The main purpose of these systems is to improve the detection quality. In contrast, parallel IDS analysis incoming traffic at the same time (incoming traffic is split between multiple systems). These systems focus on speed and parallelism of processes.

There are two general types of parallelism in IDS: Data parallelism and function (performance) parallelism. In the former, data is split between different systems so that each system has a different part of the data. The latter, the same data is sent to different functions.

Some articles, such as [25], use parallelism to improve signature-based detection methods’ productivity. In [25] researchers utilize two Snort systems and divide the traffic between them to improve systems’ performance. As a result, the accuracy and efficiency are improved and fewer packets are dropped. Since this method demands a considerable number of strong and high-priced processors, it is not efficient in large-scale infrastructure.

 

 

3          Chapter 5

  1. Proposed Method

 

While the number of critical CPS in different areas is increasing, improving their security becomes more serious and important. On one hand, according to the earlier chapters, it was concluded that the signature-based intrusion detection method is one of the best methods in precision which makes it more suitable for critical infrastructures such as CPS. Signature-based methods are recognized as an efficient technique and they are being used in so many infrastructures and real-time systems such as CPS because of their low percentage of error rate. On the other hand, they suffer from two important facts or restrictions:

• The fact that each incoming packet should be compared and checked by each of the rules in the attack dictionary till one of the rules matches. This will potentially decrease system performance and speed and it consumes too many resources.

• The fact that this method will not detect new (zero-day) attacks.

Regarding the second restrictions, many pieces of research have offered a combination of signature-based and anomaly-based detection technique. In this thesis, the focus is on improving the first restriction. The vivid problem arising from this fact is difficulty in maintaining the dictionary of attacks due to the need of having too much memory to keep all the signatures. The second issue is the low speed of detecting attacks since packets need to be checked by each signature in the dictionary of attacks one by one. Unnecessary data in the dictionary which will not make too much difference for comparing two incoming packets should be removed to improve this problem. This will change our normal dictionary to a dictionary with enhanced merit and will reduce consuming resources.

To increase the speed and to decrease the need of high amount of memory and resources, the enhanced dictionary can be also divided to two sub-dictionaries based on most numerous attacks and be classified in a decision tree. Therefore, when a packet finds no match in the most numerous dictionary the connection will be allowed and if the second dictionary finds any match, it will disconnect the connection as soon as possible. To remove unnecessary data in the original dictionary two different methods are used:

  • Support Vector
  • Entropy

Using these two methods the least valuable columns of data were found and discarded from the database. Figure 5-1 shows the conceptual diagram of the proposed method explaining the steps which have been taken. A comprehensive explanation of the proposed method is given in this chapter. Finding unnecessary data, building the frequent dictionary, making the decision tree and employed data set and technology for simulation are discussed in this chapter.

C:UsersSepidDownloadsTelegram Desktop�10.png

Figure ‎5‑1. Conceptual diagram of proposed method.

  1.     

3.1.                       Data used for the simulation

In this simulation, an offline mode data set called KDD99 is used which is based on the DARPA data set. DARPA provided KDD99 from audit data that have been collected from intrusions against a real-world network environment. This data set is one of the most common data sets from 1999 and it is used in many intrusion detection simulations and evaluations. The data set is recorded based on network traffic for seven weeks and has about 4 GB data that may illustrate five million connections. Each connection is labeled as one specific attack group or as normal. Each vector has 41 features [32] which all are listed in Table 5-1.

Table ‎5‑1. Features of the DARPA data set

C:UsersSepidDownloadsTelegram Desktop�06.png

3.2.                       Enhanced valued dictionary

Figure 5-2 shows some samples of signature-based intrusion detection rules in the data set. Each sample in the data set stands for a network connection and each column shows one of the features of that connection. Distinguishing most important valuable data is critical in intrusion detection process. Removing unvalued data will make the detection process faster and easier.

S1 satan. 0 icmp ecr_i SF 20 0 0 1 1 0.00 0.00 0.00 0.00 1.00 0.00 0.00 255 1 0.00 0.02 0.02 0.00 0.00 0.00 0.00 0.00
Place your order
(550 words)

Approximate price: $22

Calculate the price of your order

550 words
We'll send you the first draft for approval by September 11, 2018 at 10:52 AM
Total price:
$26
The price is based on these factors:
Academic level
Number of pages
Urgency
Basic features
  • Free title page and bibliography
  • Unlimited revisions
  • Plagiarism-free guarantee
  • Money-back guarantee
  • 24/7 support
On-demand options
  • Writer’s samples
  • Part-by-part delivery
  • Overnight delivery
  • Copies of used sources
  • Expert Proofreading
Paper format
  • 275 words per page
  • 12 pt Arial/Times New Roman
  • Double line spacing
  • Any citation style (APA, MLA, Chicago/Turabian, Harvard)

Our Guarantees

Money-back Guarantee

You have to be 100% sure of the quality of your product to give a money-back guarantee. This describes us perfectly. Make sure that this guarantee is totally transparent.

Read more

Zero-plagiarism Guarantee

Each paper is composed from scratch, according to your instructions. It is then checked by our plagiarism-detection software. There is no gap where plagiarism could squeeze in.

Read more

Free-revision Policy

Thanks to our free revisions, there is no way for you to be unsatisfied. We will work on your paper until you are completely happy with the result.

Read more

Privacy Policy

Your email is safe, as we store it according to international data protection rules. Your bank details are secure, as we use only reliable payment systems.

Read more

Fair-cooperation Guarantee

By sending us your money, you buy the service we provide. Check out our terms and conditions if you prefer business talks to be laid out in official language.

Read more