Cyber-Physical Systems (CPS) are advanced intelligent systems that consist of networked or distributed computational elements, sensors and actuators that control physical entities and mechanisms. Nowadays, CPS have attracted much attention due to their vast applications. The precursor generation of CPS can be found in a variety of areas including aerospace, industrial infrastructures, health care, transportation, energy, Supervisory Control And Data Acquisition (SCADA) and autonomous automobile systems. The development of safe CPS needs a deep understanding of the potential impacts of successful malicious cyber-attacks. CPS security-related concerns include attacker’s efforts to intercept information captured by sensors and manipulate rules sent to actuators to disrupt, defeat and eventually cause the system to fail. Since there are physical actuators included in CPS, the damages could be vital as in autonomous automobile systems.
With the increasing utilization of CPS in infrastructural and vital parts of countries and controlling of significant industrial processes and applying rules based on the perception of sensors from the environment, securing the CPS has become essential. One of the techniques that are used in these systems is signature-based intrusion detection. One of the main problems in signature-based intrusion detection techniques is difficulty in maintaining the dictionary of attacks due to the requirement of memory size to keep all the signatures. The second issue is the low speed in detecting the attacks since packets need to be checked by each signature in the dictionary of attacks one by one.
This thesis proposes and evaluates a method to increase the speed and performance of the signature-based intrusion detection and eventually increase the CPU availability. As the first step in this work, the least valuable information is found and removed from the attack dictionary. Then the dictionary is divided to two sub-dictionaries based on the most numerous attacks and finally, it is classified in a decision tree. Removing the least valuable information and searching for the rules inside the decision tree reduces processing time. Also as the probability of each rule’s occurrence gets more when an incoming packet is matched with a signature, it will enhance the accuracy comparing to traditional ID3 (Iterative Dichotomiser 3) method. The proposed method is simulated using Python based on data sets that have been gathered from real-world networks (KDD-99). The performance enhancement and resource availability improvement are demonstrated as results of the proposed method.
Keywords: Cyber-Physical Systems, Intrusion Detection, Signature, Framework, Frequent Attack Dictionary, Entropy, Support Vector.
The term Cyber-Physical Systems is introduced as a research model and mechanism that connects computing, communications and control. Although the exact definition is difficult due to the wide domain of these systems, CPS can be generally depicted as physical engineered systems that monitor, control and integrate their operations into a computational and communicational core. In other words, CPS are important intelligent systems consisting of computing elements, network elements and distributed sensors and actuators, which control the physical entities.
Security issues in CPS including attackers’ efforts to intercept and manipulate the data captured from the sensors and the instructions sent to the operators can disrupt and ultimately defeat the system. The importance of securing CPS is becoming more sensitive considering the rising use of CPS in vital and critical sectors, infrastructures and sensitive industrial processes.
CPS applications likely will affect the Information Technology revolution of the twentieth century. Examples of CPS include wide range of large-scaled engineered systems such as automated control systems in aviation, energy conservation, medical systems with high reliability in the field of health care, distributed robots (like telemedicine and defense systems), help with daily living (Assisted living), traffic and safety control, SCADA, advanced automotive systems, transportation, automation and smart grid systems. In all these systems, the perfect solution for the complex interaction between different physical and computational elements is very important.
In recent decades, with the wide growth of computers and Internet, too many security issues have been raised in CPS. The number of networks and their applications and security threats is increasing every day. At the same time, the creation of one hundred percent secure personal computers without weaknesses and failures from the technical point is impossible. Therefore, research on intrusion detection systems (IDS) for CPS is being pursued with great interest. One of the intrusion detection techniques that has gathered so much interest is signature-based intrusion detection. This technique is based on a set of rules and signatures in a data set or dictionary which define malicious packet and files. Each rule corresponds with one known threat.
One of the major benefits of signature-based intrusion detection is its accuracy and low false positive rate. On the other hand, it is its disadvantage that the performance is too dependent on the rules and choosing the wrong set of rules might lead to instability in the process control. In this case, many organizations might experience denial of services availability, information stealing and disruption of decision-making that all will affect CPS and may cause financial damages and even vital human injuries. Another disadvantage of the signature-based IDSs is that the received packet should be checked with all individual rules one by one. Therefore, it would increase the time required for detection process and thus makes maintenance complicated due to limited resources.
This thesis proposes a method to increase the speed and performance of the signature-based intrusion detection. First, the least valuable information is found and removed from the attack dictionary. The enhanced value dictionary is divided to two sub-dictionaries based on the most numerous attacks and finally, it is classified in a decision tree. Removing the least valuable information and searching for the rules inside the decision tree reduces processing time. When an incoming packet is matched with a signature, the probability of each rule’s occurrence is increased. These improvements increase the accuracy comparing to traditional ID3 (Iterative Dichotomiser 3) method. the proposed method is evaluated using Quality of Service parameters including accuracy, false positive and false negative.
In this thesis, a method for signature-based intrusion detection is proposed. The proposed method is simulated using Python 3.5 and KDD99 data set and the simulation results are analyzed to evaluate the performance of the method. The development of the proposed method is based on exploration and evaluation of existing intrusion detection techniques and frameworks and analysis of their shortcomings.
The thesis is organized into seven chapters. Chapter 2 introduces the main concepts that this thesis relies on intrusion detection techniques and their comparison. Chapter 3 provides an overview of CPS and their security frameworks. Chapter 4 reviews the background of chosen intrusion detection technique. Chapter 5 presents the main contributions of this work, namely the frequent attack dictionary decision tree for advanced signature-based intrusion detection and design of simulation experiment. This chapter is followed by analysis of simulation results and discussion in chapter 6. Finally, chapter 7 presents the conclusions and outlines the possible future research.
Signature-based IDSs are very effective for known attacks. Since its very fast and easy to install these systems, they can start working immediately. Signature-based IDS analyzes each packet and compares the content with the dictionary of known attacks. Sometimes normal packets are mistaken as attacks (False positives) but this does not occur too often. These systems generate easy to understand reports and label each packet as normal or as one class of attacks.
Although signature-based IDSs are efficient for known attacks, their problem is that they are not able to find zero-day attacks. Hackers use zero-day attacks and attack many systems before the administrators adapt their organizations IDSs [22]. For this reason, signature-based IDS should be updated continuously. Attack reports should be collected from all over the world and as soon as a new attack is detected. In addition, security engineers should analyze and develop a solution for defending against the attack. The solution should be distributed to all the subsystems and IDS/IPS systems should be updated accordingly. However, the first subsection that has been attacked is already compromised and may have been damaged.
Figure 4-1 illustrates the detection process in signature-based techniques. This type of intrusion detection analysis packets’ features and tries to find a match in stored dictionaries which have all the known attacks recorded in them. Some articles call this technique misuse detection or pattern-based detection. Having low false positive rate is the most important advantage of signature-based IDS.
This technique reacts to known malicious behavior. In another word, they define a node as a good node if it is not exhibiting any attack signatures. The most significant issue in these topics is to create an efficient attack database. If the signature is too large it spends too much memory and if it is not detailed enough it will reduce its accuracy. Signature-based IDSs are more efficient and accurate for detecting outsider attack than other IDS techniques.
Figure 4‑1. Signature-based intrusion detection process
One of the disadvantages in signature-based IDS is that it drops many packets as it does not have enough time to check each packet with all the rules in the attack dictionary. To solve this problem many researchers have used decision tree search methods. SQL injection attacks are examined in [31] and the decision tree is used for their detection. In [31] incoming HTTP requests are filtered using the tree.
A decision tree is a classification algorithm in data mining and its fundamental algorithm is called ID3 (Iterative Dichotomiser 3) [39]. This algorithm builds a tree based on the given classified data and each data is recognized and defined with its features’ values. Classification in decision trees is done in a reverse order and the main challenge is to define the key features for the nodes. Each node in the tree shows a feature from signatures and the process is done when all the signatures are registered in the tree. The leaves in the tree show ending of each connection and are labeled with a type of attack. These trees are capable of working with large amount of data which is an advantage for CPS because there is plenty of traffic flow in CPS. Besides, the high performance in decision trees makes them a good option for the real-time systems. Figure 4-2 shows a small part of a decision tree. In this Figure, the destination node is decided based on the value of source port.
Figure 4‑2. Sample of a small branch in decision tree
Decision tree accuracy and their ease of construction is another benefit which makes them a suitable choice for IDS in CPS. Researchers in [30] proposed a classification method using machine learning and simulated the method based on KDD data set. This algorithm has a similar function but it tries to find the attack with the minimum comparison. Although their results demonstrate a good performance improvement, the problem of having a huge amount of data in the dictionary remains unsolved.
Nowadays, wireless sensor networks have many security concerns. Sensor networks which are the main part of the critical infrastructures such as CPS require strong security mechanisms. These systems are typically developed in a critical environment which is very vulnerable to attacks. Traditional security methods including encryption, VPN, authentication and firewall are not adequate since they just examine external threats. Therefore, many organizations employ different IDSs to overcome this issue. It is an important step to decide which type of IDS is the best based on the organization’s architecture, size and finance. It should be considered that not every company has enough resources to afford too expensive IDSs.
In signature-based IDS the quality of security depends on the quality of the signatures in the dictionary. However, having a lot of information in the dictionary consumes plenty of resources. Furthermore, each event will be logged and each comparison will record some warning in the system. Recording too many warnings and log files is another problem in signature-based IDS which makes it hard to analyze all the information later.
Researchers in [29] have proposed a new type of signatures which combines traditional signatures with contextual information from the network. They also have defined a Hash function which drops unimportant and uncritical warnings. In their proposed method, each signature is an ordered pair as (CI, Sig) which CI contains the contextual information of the network and Sig shows the related signature. They could drop the warnings by 66.1% filtering rate.
As noted earlier, signature-based technique can detect an attack if and only if there is a matching signature built, tested and developed for it. In many signature-based IDSs, most of the signatures should be extracted manually which is very time-consuming and it gets more probable to have errors. Therefore, the quality of the system becomes dependent on experts that have registered the signatures manually.
To solve this problem, by exploring ideas from Ecology, researchers in [28] have proposed a modified supervised learning Classifier System algorithm (UCSm). This algorithm builds dynamic adjustable signatures to detect intrusion using a supervised classification system. The classification system is an online parallel rule-based system. This method is simulated using KDD data set and approached intrusion detection accuracy rate in this method is 92.03 which has increased its basic method rate (UCS) by 9%.
Some other articles have used data mining to overcome the mentioned issue. A data mining network-based approach is proposed and evaluated in [26] to build signatures. This approach which is called Signature Apriori employs both network information and protocols and learns the signatures based on an environment attacked using many different methods of cyber-attacks. This approach compares normal and malicious packets and registers the characteristic of malicious ones.