CHAPTER ONE 1

CHAPTER ONE
1.0 INTRODUCTION
1.1 Background of Study

Cloud computing have become prevalent among methods in rendering services by organization. This is mainly because it provides a medium for the big steps needed in the development and deployment of an increasing number of distributed applications (Marinescu, 2012). The main objective of the cloud computing is that the customers use and pay only for what they want. But as more and more information of individuals and companies are placed in the cloud data centers, the questions arise regarding to the safety and security of cloud environment. Cloud Computing can be easily targeted by attackers (Modi et al, 2013). There are number of security, privacy and trust issues associated with cloud computing (Sun et al, 2011). These issues have a great impact on the integrity of client’s data stored in the cloud. For this reason, even with the flexibility and efficiency in computing the cloud provide, most clients are reluctant in confidential information such as Personally Identifiable Information (PII) in the cloud.

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now

When providing relevant services on the Internet using a pool of shared resources, security is a major concern and policies must exist in cloud computing to address important issues such as reliability, security, anonymity and liability. Three types of intrusion can occur in a network of computing machines: Scanning, Denial of Service (DoS) and penetration (Rup et al, 2015). The cloud incessantly faces security threats such as Structured Query Language (SQL) injection, Cross Site Scripting (XSS), DoS and Distributed Denial of Service (DDoS) attacks and hacking in general.

The common network attacks affect the cloud security at the network layer which includes: Address Resolution Protocol (ARP) spoofing, IP spoofing, port scanning, man-in-middle attack, Routing Information Protocol (RIP) attack, Denial of Service (DoS) and Distributed Denial of Service (DDoS) (Modi et al, 2013). Therefore, providers must protect the systems against both insiders and outsider attacks. The traditional network security channels like firewall can be used to stop many outsider attacks but attacks from within the network as well as complicated outsider attacks such as DoS and DDoS attacks can’t be control easily by using such mechanism (Modi et al, 2012).

DDoS attacks have for the last two decades been among the greatest threats facing the internet infrastructure. Mitigating DDoS attacks is a particularly challenging task. It is known that ordinary signature-based detection techniques are inefficient in undermining DDoS attacks as this type of attacks can mask itself among legitimate traffic (Madeleine, 2017).

To overcome such problems, an intrusion detection system (IDS) comes into play. Intrusion Detection System is the most commonly used mechanism to detect various attacks on cloud. The IDS play very important role in the security of cloud and instead of detecting only known attacks; it can detect many known and unknown attacks (Quick, 2013). IDS are defined to preserve the confidentiality, integrity, and availability of network (Bace and Mell, 2001). IDS could be software, hardware or a combination of both. It captures the data from the under examination and notify to the network manager by mailing or logging the intrusion event (Oktay and Sahingoz, 2013).

Thes paper presents a detection system of HTTP DDoS attacks in a Cloud environment based on Least absolute shrinkage and selection operator and Random Forest. The proposed detection system consists of two (2) main steps: feature selection and classification. An embedded algorithm is used to reduce overfitting. The network traffic data after feature selection is then classified into normal and HTTP DDoS traffic. A test procedure is used to select appropriate classifier for HTTP DDoS detection based on accuracy, FPR, TPR, and F-measure metrics. The obtained results from the experiments show that the Random Forest ensemble classifiers depict high detection performance for HTTP DDoS attacks.
1.2 Problem statement

HTTP-DDoS attack is heavily used in Cloud Computing web services and very little work has been done to ensure security related to these protocols (Adrien and Martine, 2017).
It generally targets the victim’s communication bandwidth, computational resources, memory buffers, network protocols or the victim’s application processing logic.
Additionally, they do not generate significant traffic hence they are hard to detect (Csubak, 2016).
Machine leaning approach is the most come approach previous researchers have used in addressing DDoS attack detection. However, achieving high detection accuracy with lower false positive rate remains issue that still need to be addressed.
Hence , Random Forest based HTTP-DDoS attack detection system in cloud computing environment was designed.
1.3 Aim and objectives

The aim of this research is to propose a detection system for detection of HTTP DDoS attack in cloud computing environment based on Random Forest Algorithm for classification and LASSO for feature selection. The objectives of this study are to:
Design a Random Forest framework for detection of HTTP-DDoS attack in cloud computing environment

Formulate Random Forest based Model for detection of HTTP-DDoS attack in Cloud Computing environment
Evaluate the performance of the designed model.
1.4 Scope and Limitations of the Study

This study focuses on comparing several machine learning algorithms through testing and performance evaluation given some set of metrics in order to come up with a model that will be suitable to be deployed in a cloud environment for intrusion detection system. The study is limited to HTTP-DDoS attack in the cloud environment as very little work has been done on this area.
1.5 Significance of the study

Nowadays, Cloud Computing is the first choice of every IT organization because of its scalable and flexible nature. However, the availability and security is a major concern in its success because of its open and distributed architecture that is open for intruders. While Cloud Computing has received mixed reviews from its customers, some experts describe it as the reinvention of distributed main frame model (Schneier & Ranum, 2011). It could be the most significant shift in IT infrastructure area in recent times as it appears promising but still a great deal of work is warranted in the area of security to minimize the gaps. Random Forest based model will increase the security in Cloud Computing by identifying and classifying traffic as either normal or containing a threat in minimal time. And as such enhance its (cloud computing) adoption to reduce upfront investment costs, minimize maintenance work in IT infrastructure and to enhance on-demand capabilities.
Hence, we believe this research work will benefit researchers, Cloud providers and their customers with the initiative to proactively protect themselves from known or even unknown security issues.
1.6 Definition of terms

HTTP DDoS attack is an attack method used by hackers to attack web servers and application. It consists of seemingly legitimate session-based sets of HTTP GET or POST requests sent to a target web server (radware, 2018).
Intrusion Detection System is a system that monitors network traffic for suspicious activity and issues alerts when such activity is discovered.
Cloud Computing is an information technology paradigm that enables ubiquitous access to shared pools of configurable system resources and higher-level services that can be rapidly provisioned with minimal management effort, often over the Internet.

CHAPTER TWO
2.0 LITERATURE REVIEW
2.1 Introduction

This chapter reviews documented literature on various works related to the research topic of this study. It starts with the discussion of relevant concepts needed to find answers to the research problem. Extensible Markup Language(XML) (or JSON) and Hypertext Transfer Protocol (HTTP) are heavily used in Cloud Computing web services and very little work has been done to ensure security related to these protocols (Adrien and Martine, 2017), as, most of the time, for example with XML (XML encryption, digital signatures, user tokens), the request is implicitly assumed to be necessarily legitimate. This puts XML-DoS and HTTP-DoS among the most destructive DoS and DDoS attacks in Cloud Computing. (Adrien and Martine, 2017).
The review is therefore aimed at gaining an insight on different network-based intrusion detection system used for HTTP-DDOS attacks in cloud environment. The review of literature was done through literature search of both print and electronic materials on topics related to similarities and differences between cloud Intrusion detection systems using machine learning approach.
2.2 Cloud Computing

Cloud computing is a model for enabling ubiquitous, convenient, on- demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. Cloud computing is an Internet-based computing where the services are fully served by the provider. Users need only personal devices and Internet access to exploit cloud resources. Computing services, such as data, storage, software, computing, and application, can be delivered to local devices through Internet. NIST, (2011) proposed three service models, and four deployment models.
2.2.1 Cloud Service Models

The service model in the cloud are listed below:
Software as a Service (SaaS)
This is the capability provided to the consumer to use the provider’s applications running on a cloud infrastructure. The applications are accessible from various client devices through either a thin client interface, such as a web browser (e.g., web-based email), or a program interface. The consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, storage, or even individual application capabilities, with the possible exception of limited user-specific application configuration settings.
Platform as a Service (PaaS)
This the capability provided to the consumer to deploy onto the cloud infrastructure consumer-created or acquired applications created using programming languages, libraries, services, and tools supported by the provider. The consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, or storage, but has control over the deployed applications and possibly configuration settings for the application-hosting environment.
Infrastructure as a Service (IaaS).
This is the capability provided to the consumer is to provision processing, storage, networks, and other fundamental computing resources where the consumer is able to deploy and run arbitrary software, which can include operating systems and applications. The consumer does not manage or control the underlying cloud infrastructure but has control over operating systems, storage, and deployed applications; and possibly limited control of select networking components (e.g., host firewalls).
2.2.2 Cloud Deployment Models

Private cloud
This cloud infrastructure is provisioned for exclusive use by a single organization comprising multiple consumers (e.g., business units). It may be owned, managed, and operated by the organization, a third party, or some combination of them, and it may exist on or off premises.
Community cloud
This cloud infrastructure is provisioned for exclusive use by a specific community of consumers from organizations that have shared concerns (e.g., mission, security requirements, policy, and compliance considerations). It may be owned, managed, and operated by one or more of the organizations in the community, a third party, or some combination of them, and it may exist on or off premises.
Public cloud
This cloud infrastructure is provisioned for open use by the general public. It may be owned, managed, and operated by a business, academic, or government organization, or some combination of them. It exists on the premises of the cloud provider.
Hybrid cloud
This cloud infrastructure is a composition of two or more distinct cloud infrastructures (private, community, or public) that remain unique entities, but are bound together by standardized or proprietary technology that enables data and application portability (e.g., cloud bursting for load balancing between clouds).

2.3 Denial-of-Service (DoS) Attacks

One of the popular and most followed social networking and blogging site, Twitter on Thursday 6thaugust 2009, went down for several hours and their services were unavailable to the users. The administrators and owner of the site apologies to its user that the site went down due to some technical reasons, but it will be fix as soon as possible. In which they were under denial of service attack in which they tried to quickly restore site in order not to lost integrity for their clients. (http//www.money.cnn.com). A denial of service attack is any deliberate and malicious attempt in which an adversary interrupts a particular network, online service of a server and making it unavailable to its authorize users. This is also described as an act or series of activities that has the ability and capability to deny or stop part of an information system from functioning normal. This kind of attack normally target computer system resources (such as memory and CPU) and network infrastructures (bandwidth) of the victim network link this can also affect both network resources and computing resources. Whenever denial of service attack occurs it consequences range from insignificant rise in the service response time and lead to total service unavailable and also have financial benefit attached to it, i.e. when organization depend fully on availability of service (Arockiam et al., 2010).
2.4 Distributed Denial-of-Service (DDoS) Attacks

Recently cloud computing has been greatly increased in both academic research and industry technology. DDoS is one of the security threats that challenge availability cloud resources. The first DDoS attack happened in 1999 (Nazario, 2008). Many popular websites like yahoo were affected by DDoS in early 2000. In 2001, Register.com was affected by DDoS; and it was the first DDoS attack to use DNS servers as reflectors (Dittrich et. Al., 2004). In Cloud environment when the workload increases on a service, it will start providing computational power to withstand the additional load. Which means Cloud system works against the attacker, but to some extent it supports the attacker by enabling him to do most possible damage on availability of service, starting from single attack entry point. Cloud service consists of other services provided on the same hardware servers, which may suffer by workload caused by flooding. Thus, if a service tries to run on the same server with another flooded service, this can affect its own availability. Another effect of a flooding is raising the bills for Cloud usage drastically. The problem is that there is no “upper limit” to the usage 12. And one of the potential attacks to cloud environment is neighbor attacks i.e. Virtual Machine can attack its neighbor in same physical infrastructures and thus prevent it from providing its services. These attacks can affect cloud performance and can cause financial losses and can cause harmful effect in other servers in same cloud infrastructure. A DDoS occur when huge amount of internet packets is overloaded in the buffer of a system known as slave it overflow the bandwidth or resources of the intended or targeted system( i.e. victim), due to the number of traffic or packets sends to the targeted system is larger than it can take or transmit, then the system performance reduce drastically and then tend to work slowly and also render it services unavailable or shut down the entire system, therefore this lead to denial of service of the authorize user of the targeted system (Morales and Dobbins, 2011). Figure 2.1 presents illustration of DDoS attacks.

Figure2.1: Ilustration of DDoS Attack (Miao et al, 2015)

As mentioned earlier, the cloud computing market continues to grow, and the cloud platform is becoming an attractive target for attackers to disrupt services and steal data, and to compromise resources to launch attacks. Miao et al. (2015) present a large-scale characterization of inbound attacks towards the cloud and outbound attacks from the cloud using three months of Net Flow data in 2013 from a cloud provider. Despite the promising business model and hype surrounding cloud computing, security is the major concern for a business that is moving its applications to clouds. When a DDoS attack is launched from a botnet with a lot of zombies, Web servers can be flooded with packets quickly, and memory can be exhausted quickly in an individual private cloud. So, we can say that the main competition between DDoS attacks and defenses is for resources. The increase of DDoS attacks in volume, frequency, and complexity, combined with the constant required alertness for mitigating Web application threats, has caused many Website owners to turn to Cloud-based Security Providers (CBSPs) to protect their infrastructure (Thomas et al, 2015). In one recent analysis, DDoS attacks are considered one of the top nine threats to cloud based environments. This report concludes that cloud services are very tempting to DDoS attackers who now focus mainly on private data centers. It is safe to assume that, as more cloud services come into use, DDoS attacks on them will become more commonplace. Meanwhile, Figure 2.2 presents possible scenario of DDoS attack types in private cloud.
.

Figure2.2 Possible Scenario of DDoS Attack Types in Private Cloud (Qiao, and Richard, 2015)

2.5 XML-DDoS and HTTP-DDoS

Those attacks belong to the resource exhaustion attack category. Extensible Markup Language(XML) (or JSON) and Hypertext Transfer Protocol (HTTP) are heavily used in Cloud Computing web services and very little work has been done to ensure security related to these protocols as, most of the time, for example with XML (XML encryption, digital signatures, user tokens, etc.), the request is implicitly assumed to be necessarily legitimate. This puts XML-DoS and HTTP-DoS among the most destructive DoS and DDoS attacks in Cloud Computing (Adrien and Martine, 2017).
2.5.1 HTTP-DDoS

An HTTP flood is a layer 7 attack that targets web applications and servers. During this attack, an attacker exploits the HTTP GET as shown in (Figure 2.3) or POST (Figure 2.4) requests sent when an HTTP client, like a web browser, “talks” to an application or server. The attacker employs a botnet to send the victim’s server a large volume of GET (images or scripts) or POST (file or forms) requests with the intent of overwhelming its capabilities. The victim’s web server becomes inundated attempting to answer each request from the botnet, which forces it to allocate the maximum resources to handle the traffic. This prevents legitimate requests from reaching the server, causing a denial of service. An Http-DDoS consists of sending a lot of arbitrary HTTP requests. HTTP repeats requests and HTTP recursively attacks a web service (Vissers et al, 2014). A high rate of legitimate or invalid HTTP packets is sent to the server with the goal of overwhelming the web service resources. Processing all the requests and the cost associated with each request (which may be quite significant for certain web services) eventually triggers the DDoS.

Figure2.3: HTTP Get Attack (https: www.verisign.com, 2017)

Figure2.4: HTTP Post Attack (https: www.verisign.com, 2017)

2.5.2 Impact of HTTP DDoS on Cloud Environment

Cloud Computing services are often delivered through HTTP protocol. This means that the HTTP protocol’ attacks, vulnerabilities, misconfiguration, and bugs have direct impact on the users services deployed on the Cloud. HTTP DDoS attacks are classified among the major threats of web services availability. Hence, they are a major threat of the Cloud services’ availability.

In the Cloud Computing context, two ways to achieve a DoS is established: direct implying which consists of predetermination of the target service’s host and indirect implying which consists of denying other services being hosted on the same host or network of the target(Mohamed, Krim and Mustapha, 2018). The resources auto scaling characteristic of the Cloud enables, on one hand, the providers to supply the clients with a large pool of resources. The clients are, then, charged based on a pay-per-use model. On the other hand, this enables attackers to deny many Cloud services with a single attack. The detection of HTTP DDoS attacks in the Cloud requires a deep monitoring of the network traffic and strong modeling of the Cloud users’ behaviors

2.6 Machine Learning algorithms

Machine learning uses two types of techniques: supervised learning, which trains a model on known input and output data so that it can predict future outputs, and unsupervised learning, which finds hidden patterns or intrinsic structures in input data. (Deepak, 2018)

Figure 2.5: Machine Learning Techniques Categorization (Deepak, 2018)

2.6.1 Naïve Bayes Classifier

It is a supervised classification method developed using Bayes’ Theorem of conditional probability with a ‘Naive’ assumption that every pair of feature is mutually independent. That is, in simpler words, presence of a feature is not affected by presence of another by any means. Irrespective of this over- simplified assumption, NB classifiers performed quite well in many practical situations, like in text classification and spam detection. Only a small Amount of training data is needed to estimate certain parameters. (Kajaree and Rabi, 2017).
2.6.2 Support Vector Machine

Support vector machine (SVM) are supervised learning models with associated learning algorithm that analyze data after which they are used for classification. Classification refers to which images are related to which class or data set or set of categories. (Sunpreet and Sonika, 2016). In the SVM training algorithms model is built in which the new examples are assigned to one category class or other. In this model representation of examples in categories are done with clear gaps that are as vast as possible. The main objective of the SVM machine is to find a particular hyper-plane for which the margin of separation is very high or which can be controlled to be maximized when this condition is met or we can under these circumstances, the decision plane which we take to differentiate between two classes, and then it is called as optimal hyper plane. The Support vectors play an important role in the operation of this class of learning machine as we can define Support vectors as the elements of training data set that would change the position of the dividing hyper-plane in SVM training algorithm if they are removed. As maximum-margin hyper-plane and margins for an SVM trained with samples from two classes and these samples on the margin are called as support vectors or we can say that these are data point that lies closest to the decision surface.
It has the advantage of offering a good performance on the training dataset and also provide more efficiency for pure classification of the future data. But on the other hand, performance of SVM degrades when it is adapted for multi-class classification (Cheng et al, 2012).
2.6.3 J48
Classification is the process of building a model of classes from a set of records that contain class labels. Decision Tree Algorithm is to find out the way the attributes- vector behaves for a number of instances. Also, on the bases of the training instances the classes for the newly generated instances are being found (Kortin, 2012). This algorithm generates the rules for the prediction of the target variable. With the help of tree classification algorithm, the critical distribution of the data is easily understandable (Nadali et al, 2011).
2.6.4 IBK
IBK algorithm is a k-nearest-neighbor classifier that uses the similarity of two points to be the distance between them under some appropriate metric. The number of nearest neighbors can be specified explicitly in the object editor or determined automatically using leave-one-out cross-validation focus to an upper limit given by the specified value. The distance function is used as a parameter of the search method. The remaining thing is the same as for IBL that is, the Euclidean distance; other options include Chebyshev, Manhattan, and Minkowski distances (Dieterrich, 1998)
2.6.5 Multi-Layer Perceptron

The multilayer perceptron is the most known and most frequently used type of neural network. On most occasions, the signals are transmitted within the network in one direction: from input to output. There is no loop, the output of each neuron does not affect the neuron itself (Marius Et al, 2009).
2.6.6 kStar

In this classification problems, “each new instance is compared with existing ones using a distance metric, and the closest existing instance is used to assign the class to the new one” (Witten et al, 2011). The principal difference of K* against other IB algorithms is the use of the entropy concept for defining its distance metric, which is calculated by mean of the complexity of transforming an instance into another; so, it is taken into account the probability of this transformation occurs in a “random walk away” manner. The classification with K* is made by summing the probabilities from the new instance to all the members of a category. This must be done with the rest of the categories, to finally select that with the highest probability (Cleary and Trigg, 1995).

2.6.7 PART

PART is a partial decision tree algorithm, which is the developed version of C4.5 and RIPPER algorithms. The main specialty of the PART algorithm is that it does not need to perform global optimization like C4.5 and RIPPER to produce the appropriate rules (Frank and Witten, 1998)

2.6.9 Decision Table

A decision table is a useful tool when the rules for handling a data record are more complex than a single simple discriminating test. Usual practice is to record and analyses this type of situation by means of a flow-chart. This is then used for writing a program made up of several branches. Such programs, even though written in a high-level language, are often not readily comprehensible in the absence of the accompanying flowchart or without constructing one (King, 2018).

2.6.10 Random Forest

Random forest classifiers were developed by LEO Breiman and Adele Cutler. They combine tree classifiers to predict new unlabeled data, the predictor depends on the number of as that are represented by the number of trees in the forest, the attributes are selected randomly, each number of trees represents a single forest and each forest represents a predation class for new unlabeled data (Apale et al, 2015). In this algorithm, random features selection will be selected for each individual tree. A random forest classifier ensemble learning algorithm is used for classification and prediction of the outputs based on an individual number of trees (Araar and, Bouslama, 2014). Using random forest classifiers, many classification trees will be generated, and each individual tree is constructed by a different part of the general dataset. After each tree is classified in an unlabeled class, a new object will be implemented under each tree vote for decision. The forest chosen as the winner is based on the highest number of votes recorded. Figure 2.5 shows decision forest architecture and how the number of votes is calculated.

Figure2.1 Decision Forest Architecture (Mouhammd et al, 2016)

The accuracy rate and error rate for Random Forest (RF) classifiers can be measured by splitting a whole dataset for testing, e.g. (30%) and for training, e.g. (70%). After the random forest, a model test (30%) can be used to calculate the error rate, and the accuracy rate can be measured based on comparing correctly classified instances with incorrectly classified instances. Out of bag (OOB) is another way of calculating the error rate (Bret, 2017). In this technique, there is no need to split the dataset because calculation occurs in the training phase. The following parameters need to be adjusted correctly to achieve the highest accuracy rate with minimum error rate:
Number of trees.
Number of descriptors that occurs randomly for present candidates m(tries).

Figure2.6 Random Forest derived from Decision Tree (Bret, 2017)

After analysis and studying many cases, 500 trees are needed within the descriptor. Even if there is a great number of trees that will not achieve the highest accuracy rate and will only waste training time and resources (Hasan et al, 2014), so that random forest tuning parameters are a vital research area that needs to be fine-tuned.
2.7 Feature Selection

This is one of the important technique that is used to improve the quality of a given dataset that is use to acquire better data mining result , this involve removal of unwanted, redundant ,missing, and noisy features. Feature selection brings about speeding up of data mining algorithm, also improved the accuracy and also lead to creation of better model (Liu et al., 2010). There are three methods of feature selection which are wrapper, filters and embedded. The wrapper normally uses the proposed learning algorithms to evaluate the effectiveness of the features while filter evaluate the features based on the general characteristic of given data. The embedded method are a combination between the wrapper and the filter methods. The Least Absolute Shrinkage and Selection Operator (LASSO) is an embedded method best known for its (Valeria and Eduard, 2017) powerful feature selection ability. As a result of that, the proposed model uses LASSO based feature selection approach for better accurate result .

2.7.1 LASSO -Least Absolute Shrinkage and Selection Operator

Least Absolute Shrinkage and Selection Operator was first formulated by Robert Tibshirani in 1996. It is a powerful method that performs two main tasks: regularization and feature selection. The LASSO method puts a constraint on the sum of the absolute values of the model parameters; the sum has to be less than a fixed value (upper bound). In order to do so, the method applies a shrinking (regularization) process where it penalizes the coefficients of the regression variables shrinking some of them to zero. During features selection process the variables that still have a non-zero coefficient after the shrinking process are selected to be part of the model. The goal of this process is to minimize the prediction error and overfitting.
Least absolute shrinkage and selection operator is a widely known model (Tibshirani, 1996) that essentially consists of a simple linear model combined constraint with an l1-penalty term to the objective function. Let us assume our data set is represented as D ={ xi, yi }, with i? {1..N}samples, xi representing the features describing the i-th sample, and yi being the class label. Then, the equation 1 below shows the objective function that is minimized under the LASSO approach for the case of classification problem:
?min??? ?_(i-1)^N?( y_(i-) F_sig ???(?xi))?^2 ? +??_(j-i)^??|?_j | (2.1)
where the function Fsig represents the sigmoid function and is defined as follows:
(2.2)

When we minimize the optimization problem some coefficients are shrank to zero, i.e ?_j |?|=0,for some values of j (depending on the value of the parameter ?). In this way the features with coefficient equal to zero are excluded from the model. For this reason LASSO is a powerful method for feature selection while other methods (e.g. Ridge Regression) are not (Valeria and Eduard, 2017).
Valeria and Eduard ( 2017) the LASSO helps to increase the model interpretability by eliminating irrelevant variables that are not associated with the response variable, this way also overfitting is reduced.
2.8 Related Work on Detection of HTTP DDoS Attacks

A detection system of HTTP DDoS attacks in a Cloud environment was proposed by (Mohamed, Karim and Mustapha, 2018) which is based on Information Theoretic Entropy and data learning classifier. The proposed detection system consists of three main steps: entropy estimation, preprocessing, and classification. The authors used a time-based sliding window algorithm to estimate the entropy of the network heeder features of the incoming network traffic and then classify the data into normal and HTTP DDoS traffic. Performance metrics based on accuracy, FPR, AUC, and running time metrics were used for the evaluation of the proposed detection system. They achieved an accuracy rate of 99.54% with 0.4 FPR.

Choi et al (2014) have presented a method of DDoS attack detection using HTTP packet pattern and rule engine in a Cloud Computing environment. The method integrates between HTTP GET flooding among DDoS attacks and MapReduce processing, for fast attack detection in a Cloud Computing environment. This method can ensure the availability of the target system for accurate and reliable detection of HTTP GET flooding. The method was compared with the Snort IDS based on the processing time and the reliability when the congestion increases in the Cloud infrastructure.

Xiao et al (2017) have proposed a Protocol-Free Detection (PFD) against Cloud oriented Reflection DoS (RDoS) attacks. They focus on analyzing the network flow of the Cloud services, by studying the basic traffic correlation near the victim Cloud under RDoS attack. In their work, they sampled packet in upstream router and correlation of flows is tested using flow correlation coefficient (FCC), and the detection result is given by considering current FCC value and historical information. In the Cloud environment, PFD is designed to be inserted in a protected virtual LAN. However, a protected VLAN requires deployment of other security techniques which consume the Cloud resources and effect against it. Also, deploying the PFD inside the Cloud instances makes it vulnerable to the HTTP DDoS attacks

Zecheng et al. (2017) have proposed a DDoS detection system based on machine learning techniques. The system is designed to be implemented on the Cloud provider’s side in order to early detect DDoS attacks sourced from virtual machines of the Cloud. The system leverages statistical information from both the Cloud server’s hypervisor and the virtual machines, in order to prevent network packages from being sent out to the outside network. Nine machine learning algorithms are evaluated and the most appropriate is selected based on the detection performances. They achieved an accuracy rate of 99.73%

Similarly, Sreeram and Vuppala (2017) have proposed a Bio-Inspired Anomaly based Application Layer DDoS attack (App-DDoS attack) detection in order to achieve fast and early detection. The proposed system is a bioinspired bat algorithm which is used to detect the HTTP DDoS attacks. The authors have evaluated their system using the CAIDA dataset. The system achieved satisfactory results of 94.80% for the detection of HTTP flooding attacks.

(Mouhammd et al, 2016) collected a new dataset that includes modern types of attack, which they claim has not been used in previous research. The dataset contains 27 features and five classes. A network simulator (NS2) was used in the work. Three machine learning algorithms (MultiLayer Perception (MLP), Random Forest, and Naïve Bayes) were applied on the collected dataset to classify the DDoS types of attack namely: Smurf, UDP-Flood, HTTP-Flood and SIDDOS. The MLP classifier achieved the highest accuracy rate with (98.63%).
Bio Inspired anomaly-based HTTP-Flood attack detection was devised in (Indraneel and Venkata, 2017). In their work, they adopted the bat algorithm. First, they defined feature metrics to identify if the request stream behavior is of attack or normal, Secondly, they customize the bat algorithm to train and test. The devised bat algorithm amplified detection accuracy with minimal process complexity. Then experiment was carried out on a benchmarking CAIDA dataset and achieved an accuracy of 98.4%.

Thomas et al. (2014) present a system for defending against two types of Application Layer DDoS attacks in the Cloud environments, in particular XML-DDoS and SOAP-DDoS. The proposed defense system is specific for threats involved with web service deployment. It does not replace the lower-layer DDoS defense systems that target network and transportation attacks. The authors propose an intelligent, fast, and adaptive system for detecting XML and HTTP application layer attacks. The intelligent system works by extracting several features and use them to construct a model for typical requests. Finally, outliers detection can be used to detect malicious requests. Furthermore, the intelligent defense system is capable of detecting spoofing and regular flooding attacks. The system is designed to be inserted in a Cloud environment where it can transparently protect the Cloud broker and even Cloud providers.

A detection method that analyzes specific spectral features of traffic over small time horizons without packet inspection was proposed in (Aiello et al. 2014). Real traffic traces mixed with several low rate HTTP DDoS attacks are collected locally from their institute, LAN, and are used to evaluate the method. Satisfactory results are obtained by the method.
A refinement to traditional IDS to be more efficient in a Cloud environment was proposed by (Vieira et al, 2010). To test their system, they use three sets of data. The first represents legitimate actions. In the second, they altered the services and their usage frequency to simulate anomalies. The last set simulates policy violations. To evaluate the event auditor that monitors the requests received and the responses sent on a node, they chose to examine the communication elements, since log data present little variations, making attacks difficult to detect. A feed-forward neural network is used for the behavior-based technique, and the simulation includes five legitimate users and five intruders. Their scenario simulates ten days of usage. Although the results yielded a high number of false negatives and positives, its performance improved when the training period of the neural network was prolonged. They conclude that their system could allow real time analyses, provided the number of rules per action remains low.

A new dataset that includes modern types of attack, which were not been used in previous research was collected in (Irfan, Amit, and Vibhakar, 2017). The dataset contains 27 features and five classes. The collected data has been recorded for different types of attack that target the Application and network layers. Four machine learning algorithms (NaïveBayes, Decision Trees, MLP, and SVM) were applied on the collected dataset to classify the DDoS types of attack namely: Smurf, UDP-Flood, HTTP-Flood and SIDDOS. The MLP classifier achieved the highest accuracy rate with 98.91%. They recommend examining the different features for feature selection technique and include the more types of modern attacks in different OSI layers, such as the transport layer for future work.

Chitrakar and Chuanhe (2012) have given an approach which combines k-Medoids clustering with SVM. In the first step, k-Medoids clustering technique is used to group the instances of similar behavior. In the second step, SVM classifier classifies the resulting clusters into normal and attack classes. This approach shows good performance for small dataset but detection rate falls in case of larger dataset.

Kausar et al (2012) has given an SVM based IDS mechanism with Principal Component Analysis (PCA) feature subsets. The dataset that is used for evaluation is transformed into another space and feature vectors using PCA. Then, these feature vectors are arranged in descending order of the Eigen values and divided into feature subsets. After that, these subsets are used as an input to the SVM classifier for classification purpose. The processing overhead of the classifier is reduced by using few features from the dataset. SVM can be used efficiently for intrusion detection in Cloud, if given sample data is limited in size since the dimensions. But (Cheng et al, 2012) said that, performance of SVM degrades when it is adapted for multi-class classification.

A DIDS to encounter DDoS attacks was proposed in (Lo, Huang, & Ku, 2008). In this approach IDS systems are deployed in each cloud region. An IDS sends alert messages to other IDSs. By judging the accuracy of these alerts if agent finds an intrusion, it adds a new rule into the block table. This system implements four components; intrusion detection, alert clustering and threshold checking, intrusion response and blocking, cooperative operation. If intrusion is detected by an agent in a region, it drops that packet and sends alert message about that attack to other regions. Alert clustering module is used to collect alerts coming from other regions. The severity of collected alerts is calculated and decision is made whether it is true or false.

Modi et al. (2012) proposed and implemented a Network intrusion detection system (NIDS) which uses Snort to detect known attacks and Bayesian classifier to detect unknown attacks. NIDS deployed in all servers’ work in a collaborative approach by generating alerts into knowledge base and thus making detection of unknown attacks easier. In the given technique, signature-based detection is followed by anomaly based detection, since it detects just unknown attacks. However, detection rate is increased by sending alert to other NIDS deployed in cloud environment. A Cloud Intrusion Detection Dataset (CIDD) that is the first one for cloud systems and that consists of both knowledge and behavior-based audit data collected from both UNIX and Windows users was proposed by (Hisham ; Fabrizio, 2012). However, the datasets are not sufficient for intrusion detection in cloud.

Ektefa et al. (2010) compared C4.5 ; SVM to show the performance of both algorithm and FAR values too. Among these two, C4.5works better compared to the other. Since the performances of a classifier are often evaluated by an error rate and it does not suit the complex real problems, multiclass. Based on values obtained, the accuracy of C4.5 is 93.23%.

A hybrid PSO algorithm that can deal with nominal attributes without going for the both conversion and nominal attribute values was proposed in (Holden ; Freitas, 2008). To overcome the drawback (features) that the PSO/ACO algorithm lacks. The proposed method shows simple rule set efficiently to increase in accuracy. Similarly, Hybridization of SVM with PSO as (PSO-SVM) to optimize the performance of SVM was proposed in (Ardjani ; Sadouni, 2010). 10-Fold cross validation is done to estimate the accuracy. It utilizes the advantage of minimum structural risk with global optimizing features. The result shows better accuracy with high execution time. The accuracy of Support Vector Machine plus PSO is 91.57%.
Denial of capability attack is one of the major causes for the existence of DDoS attacks. DDoS attacks can be prevented by denial of the capability approach by Sink tree model (Zhang et al, 2010) representing quota assigned to each domain on the network. Distributed Denial of Service Attacks are not only suited for the specified target machine but also compromises the whole network. Based on this perspective proactive algorithm has proposed by (Zhang et al, 2011). The network is divided into a set of clusters. Packets need permission to enter, exit or pass through other clusters.

Panda, Abraham, and Patra (2011) used two class classification method in terms of normal or attack. The combination of J48 and RBF shows more error prone and RMSE rate. Compared to this, Nested Dichotomies and random forest method show 0.06% error with a 99% detection rate. Monowar, Bhattacharyya, and Kalita (2012) present a tree-based clustering technique to find clusters among intrusion detection data set without using any labeled data. The data set can be labeled using cluster labeling technique based on a TreeCLUS algorithm. It works faster for the numeric and a mixed category of network data.

Hanna et al. (2016) presented the performance of machine learning techniques used in attack identification in a cloud computing environment. From the available list of algorithms in machine learning, they selected Naive Bayes (John and Langley, 1995), multilayer perception (Lopez and Onate, 2006), support vector machine (Platt, 1999), decision tree (C4.5) (Quinlan, 1993) and Partial Tree (PART) (Frank and Witten, 1998) for classifying their data. A statistical ranking approach was used for the final selection of a learning technique for the task. C4.5 technique’s performance has been evaluated through different performance evaluation matrices that included the rigorous testing of 10-fold cross-validation, true positive rate, false positive rate, precision, recall, F-measure and the area of receiver operating characteristic. Decision tree (C4.5) had the highest accuracy of 94%.

A filtering tree, which works like a service was developed. The XML consumer request is converted into a tree form and uses a virtual Cloud defender to defend against these types of attacks. The Cloud defender basically consists of five steps: sensor filtering (check number of messages from a user), hop count filtering (number of nodes crossed from source to destination—this cannot be forged by the attacker), IP frequency divergence (the same range of IP addresses is suspect), puzzle (it sends a puzzle to a user: if it is not resolved, the packet is suspect) and double signature. The first four filters detect HTTP-DDoS attacks while the fifth filter detects XML-DDoS attacks (Karnwal, Sivakumar, & Aghila, 2012).

Sarmila and Kavin (2014) introduced the Heuristic clustering algorithm to cluster the data and detect DDoS attacks in DARPA 2000 datasets and has obtained better results in terms of detection rate and false positive rate in comparison to K-Means and K-Medoids algorithm. A hybrid learning approach of combining k-Medoids clustering and naive Bayes classification was proposed by (Chitrakar and Huang, 2012). The hybrid model grouped the whole data into clusters more accurately than K-means such that it results in better classification. The hybrid approach was tested in Kyoto 2006+ datasets. Ankita and Fenil (2015) proposed an approach for detecting HTTP based DDoS attacks. It entails a five-step filter tree approach of cloud defense. These steps include filtering of sensors and Hop Counts, diverging IP frequencies, Double signatures, and puzzle solving. The approach helped in determining anomalies with the various Hop Counts and treating the sources of such anomaly as attack source.
Sharmila and Roshan (2018) proposed a system that effectively detects DDoS attacks using the clustering technique of data mining followed by classification. This method uses a HeuristicsClustering Algorithm (HCA) to cluster the available data and Naïve Bayes (NB) classification to classify the data and detect the attacks created in the system based on some network attributes of the data packet. They point out that clustering algorithm is based in unsupervised learning technique and is sometimes unable to detect some of the attack instances and few normal instances, therefore classification techniques are also used along with clustering to overcome this classification problem and to enhance the accuracy. They performed series of experiment using two types of dataset; The CAIDA UCSD DDoS Attack2007 Dataset and DARPA 2000 The efficiency of the proposed system was tested based on the following accuracy, detection rate and False Positive Rate and the result obtained from the proposed system has been found that Naive Bayes Classification results in better in all the parameters.

The methodology of applying MADM in the cloud was proposed by (Abdulaziz and Shahrulniza, 2017). Experiments were conducted using real private testbed. The result of the study has shown high performance of MADM in detecting the HTTP-flooding attacks in the cloud environment based on the confusing matrices and AUC results. And it has been concluded that MADM performance using 4 thresholds is higher as compared with using 3 thresholds with 86.77% detection accuracy.
2.8 Summary

This chapter discussed several machine learning algorithms, their advantages as well as their weaknesses. Among the algorithms been explored, Random Forest appeared to have the best suiting characteristics for this research work. Unlike other algorithms, the random algorithm helps to save data preparation time, as they do not require any input preparation and are able to handle numerical data and categorical features without scaling or transformation. It also discussed the several techniques proposed in existing literature in curbing HTTP-DDoS attacks in cloud computing as well as other intrusion attacks. Similarly, cloud computing and machine learning were also discussed in line with the proposed model (Random Forest).
Based on the reviews carried out it was observed that the existing approach still suffer from low true positive rate (TPR), high false positive rate, accuracy, f- measure rate of detection of DDoS attacks ,based on this problems the stability and robustness of approaches are not guaranteed.

CHAPTER THREE
3.0 METHODOLOGY
3.1 Research Processes

This chapter outlines the processes involved in achieving the aim of this study. For the reader to have a clear understanding of this study, this chapter begins with a presentation of research processes employed from the beginning of the study to the end of the study. It is important to note that the research methodology employed for this research is Data Analysis which involved validation through experimentation

Figure 3.1 Research Processes flow chart
3.2.1 Identification of problem

This study is not in any way different from research process conventional approach. In order to have a better understanding of the problem, several literatures on HTTP DDoS attacks detection system in cloud environment were reviewed in chapter two. Machine learning algorithms approaches and applications, and LASSO feature selection were also reviewed. The identification of problem is achieved by evaluating the existing information about various approached adopted and identifying the weaknesses associated with the methods to formulate a more specific research hypothesis. Low detection accuracy and high false positive rate remains issue that need to be addressed.
3.2.2 Study of Existing Approaches for Cloud Based HTTP-DDoS attack Detection

The study of various existing approaches used for detection of HTTP-DDoS attack was carried out to know and understand how the existing techniques work and how the approaches were used to detect DDoS attacks. Among the Approaches existing in the literature is machine learning. Various machine learning algorithms were reviewed with a view to identified technique that perform better in term of detection accuracy which remain issues in this area, as mentioned earlier. The result of the study on existing machine learning techniques for HTTP-DDoS attack detection is presented in chapter 2.
3.2.3 Identification of strength and weakness of the existing detection models

During the review process, different HTTTP-DDoS detection techniques were studied. After this, the weakness and strengths (in term of detection accuracy, false positive rate, dataset used for experimentation and so on) of the reviewed machine learning based detection techniques for HTTP-DDoS attack were identified. This gave room for the selection of Random Forest based techniques that could be considered suitable for detection of HTTP-DDoS attack in a cloud computing environment.
3.2.4. Dataset Description

The dataset used for this study was obtained from Mouhammd et al, 2016. The dataset comprises of four different DDoS attack types of which HTTP-DDoS attack is one of the attack types. The dataset contains 27 features and five classes. The five classes are a representation of the four attack types and normal. For the purpose of this study, 7256 instances of HTTP-DDoS attacks were extracted from the dataset and 10256 of normal traffic were also extracted from the dataset. Table1 shows total number of instances of HTTP-DDoS attack and Normal traffic while Table 2 shows features of the dataset.
Table 1 Dataset for this study
Class Type Number of Records
Normal 10256 packets
HTTP-DDoS 7256 packets

Table 2 Extracted dataset features

Variable No Features Type
1 SRC ADD Continuous
2 DES ADD Continuous
3 PKT ID Continuous
4 FROM NODE Continuous
5 TO NODE Continuous
6 PKT TYPE Continuous
7 PKT SIZE Continuous
8 FLAGS Continuous
9 FID Symbolic
10 SEQ NUMBER Continuous
11 NUMBER OF PKT Continuous
12 NUMBER OF BYTE Continuous
13 NODE NAME FROM Continuous
14 NODE NAME TO Symbolic
15 PKT IN Symbolic
16 PKTOUT Continuous
17 PKTR Continuous
18 PKT DELAY NODE Continuous
19 PKTRATE Continuous
20 BYTE RATE Continuous
21 PKT AVG SIZE Continuous
22 UTILIZATION Continuous
23 PKT DELAY Continuous
24 PKT SEND TIME Continuous
25 PKT RESEVED TIME Continuous
26 FIRST PKT SENT Continuous
27 LAST PKT RESEVED Continuous

3.3 The Proposed Detection System

The proposed HTTP DDoS detection model for cloud computing consists of two major steps; features selection using LASSO and classification step using Random Forest classifier. The model first starts by taking the dataset as input, then LASSO algorithm was adopted to select relevant features and reduce redundancy, then the selected features were used to feed the Random Forest and the results obtained were evaluated using six different performance metrics which are; precision, FP Rate, TP Rate, Accuracy, Recall, and F-Measure . Figure 3.2 and figure 3.3 present the pseudocode and Flowchart of the proposed model.
3.3.1 Features selection phase

LASSO was used as feature selection algorithm. The whole dataset was fed into Matlab R2018a and 24 out of the 28 features were selected as most relevant features retained with the use of the best position. L1 or LASSO for generalized models can be understood as adding a penalty against complexity to reduce the degree of overfitting, or variance of a model, by adding more bias. In L1 the penalty term is;
L1 : ? ?ki wi = ? w 1,
(3.1)
Where,
W: is our k-dimensional feature vector
?: is just a free parameter to fine-tune the regularization strength.
We can induce sparsity through this L1 vector norm, which can be considered as an intrinsic way of feature selection that is part of the model training step.
3.3.2 Classification Phase

Random Forest was then adopted as the classifier and Waikato Environment for Knowledge Analysis (WEKA) tool was used as the interface.

Figure 3.2: Pseudocode of the Proposed Model

No

Yes

Yes No

Figure3.3 Flowchart of the Proposed Model

3.4 Random Forest Based HTTP-DDoS Detection System Framework

This research detection system for cloud environment is based on a Random Forest aproach. In the designed framework, network traffic is classified as either attack or normal. The normal traffic is that which is anticipated between the client and the server, and the attacked traffic is that which is contrary to the anticipated one. This framework is designed to enhance real time detection with high detection accuracy, low false positive, low false negative and low detection time rate. The detection system operates in a cooperative way with the classification algorithm for detection the HTTP-DDoS attack on the go. In this way, any abnormality or process that can affect network performance, availability and/or security will be analyzed and managed first while the random forest algorithm classifies the traffic as either normal or containing the attack type HTTP-DDoS. The designed HTTP-DDoS attack detection system is presented in Figure 3.2. Meanwhile, the sub-sections below described how each of the components of the designed detection framework works.

Figure3.4 Random Forest based HTTP-DDoS attack detection system Framework

3.4.1 HTTP-DDoS Detector Engine

As shown in Figure 3.2, HTTP-DDoS detector engine is the principal component for the designed detection system. It has relatively three important functions; namely, as a traffic monitoring process which comes from the cloud user through the cloud provider network; as a feature extractor and finally as a classifier.
Traffic Monitor
The role of the traffic monitor is to incorporate network sniffing and packet capturing in a network to ensure availability and swift operation. The traffic monitor generally reviews each incoming and outgoing packet for any abnormality or process that can affect network performance, availability and/or security before forwarding it to the feature extractor.
Traffic Feature Extractor
This transform the input data into set of features found on the network packets based on the feature set stored to build derived value so as to carry out the desired task.
Random Forest based Classifier
Random forest classifier plays the role of analyzing and classifying the received traffic from traffic feature extractor to figure out intrusion before granting access to the cloud information. or forward them to user blacklist database. This decision will be taken based on the trust value of the cloud application and the threshold value. If the traffic has no feature of HTTP-DDoS attack, then access to the cloud services will be granted, otherwise, there will be a signature database (user blacklist) for future pattern matching.
User blacklist
The User blacklist database stores the data that have been classified as malicious by the random forest-based model. Subsequently, incoming traffic will be matched with those in the blacklist database. In doing so, known attacks will be dropped while unknown attack will be filtered by the random forest-based model.
3.5 Formulated Random Forest Based Model

A random forest is a classifier based on a family of classifiers g(M|?1),…..g(M|?k) based on a classification tree with parameters ?k randomly chosen from a model random vector ? .
Assume we have training dataset
D= {(M1, N1),…..(Mn, Nn } (3.2)
drawn randomly from a possibily unknown distribution (Mi,Ni) ~(M,N).
And, Given a set of possibly features
F= {f1{(M1),……,fk(M)} (3.3)
Goal: is to build a model which classifies an instance as either an attack or normal data from the data set of (1) .
With each instance of the dataset D, features f are chosen to reduce or minimize redundancy in the dataset. This redundancy is often measured by Gini criterion. Using Gini Criterion, we define:
h =attack and n = normal data
If each Ck (D) is a decision tree, then the ensemble is a random forest. We define the parameters of the decision tree for classifier Ck (D) to be ,
?k = (?k1, ?k2,… ?kp ) (3.4)

Thus decision tree k leads to a classifier, Ck (D) = C(D|?k (3.5)
For the final classification {Ck (D|h,n) }, each of the instances in the dataset is been classified as either containing an attack or normal.
Specifically given data:
D = {(hi, ni)} i = 1, we train an ensemble of classifiers Ck (D). The classifier Ck (D) in this case is a predictor of either attack h= (1), or normal data n= (-1)
Y= ± 1 associated with input dataset D.

3.6 Validation and Testing via Experimentation

The performance of the proposed HTTP-DDoS detection system is largely depending on the effectiveness of the model formulated. The formulated Random Forest based model performance was evaluated based on certain metrics. Similar to previous studies by (Sharmila and Roshan, 2018), (Indraneel and Venkata, 2017), (Irfan, Amit and Vihakar, 2017) and (Mouhmmad et al., 2016), the below metrics were used for performance evaluation analysis of the proposed Random Forest based model. The proposed model was implemented and tested using Windows 8 with the following specification:
Processor: Intel Pentium (R) Core ™ i7-5500U CPU @ 2.40GHz 2.30GHz
Installed Memory (RAM): 16.00 GB
System Type: 64-bit Operating System

Figure3.3 Experimental Process Flow

Figure 3.3 present the experimental process flow of the detection model. The model starts by taking the extracted dataset as inputs variables after feature selection with LASSO, then convert it into CSV format. Random Forest then classifies the data as either normal or containing an attack. The result obtained is evaluated using several performance measures such as accuracy, true positive(TP), false positive (FP), precision and f-measure as described below.

3.7 Performance Metrics

The performance of the proposed system was evaluated using the performance metrics: Accuracy, FP Rate, TP Rate, Precision, Recall, and F-measure,
3.7.1 Accuracy

Accuracy of an algorithm is calculated as the percentage of the dataset correctly classified by the algorithm. It looks at positives or negatives dependently and therefore other measures for performance evaluation apart from the accuracy were used.
A=(TP+TN)/(TP+TN+FP+FN)*100% (3.6)
For
TP= True Positive
FP = False Positive
TN = True Negative
FN = False Negative
Positive and negative represents the classifier’s prediction, true and false signify the classifier’s expectation.

3.7.2 Precision

Precision=TP/(TP+FP) (3.7)

It indicates the number of instances which are positively classified and are relevant. A high precision shows high relevance in detecting positives.

3.7.3 Recall

Recall=TP/(TP+FN) (3.8)

It indicates how well a system can detect positives

3.7.4 F-Measure

F-Measure=2* (precision*Recall)/(Precision+Recall) (3.9)

CHAPTER FOUR
4.0 RESULTS AND DISCUSSIONS
4.1 Introduction

In Chapter 3, the experimentation process flow was presented. This experimentation played a vital role in the validation and evaluation of the Random Forest based model. Experimentation also give room for testing the performance of the formulated Random Forest based model. The Experiment was carried out using Waikato Environment for Knowledge Analysis (WEKA) and 10 folds cross validation was used. The comparison of performance is discussed here. However, after preliminary Experimentation which is based on Random Forest based Model, this study experimented with additional eleven machine learning algorithms including J48, Naïve Bayes, IBK, Kstar, SMO, SimpleLogistics, MultiLayerPerception, Decision Trees, PART, NaivebayeSimple and BayesNet.
4.2. Results

This section presents the results and discusses the performance of this study formulated Random Forest based and the eleven machine leaning algorithms this study experimented with. A comparison of the performance evaluation of these experiments is also presented in this section. In order to compare the Random Forest based model of this study with other machine learning classification algorithms, the performance metrics described in chapter 3 were used. This comparison is presented in sub-section 4.2.1. Finally, comparison of this study Radom Forest based model with existing detection models by previous researcher are presented in sub-section 4.2.2.
4.2.1 Results of Comparison of this study model with other Machine Learning Algorithms

The summary of the results and comparison of experimentation with different machine learning classification algorithms and this study Random Forest based model are presented in Table 4.1 below. As it can be seen from Table 4.1, Random Forest based model of this study has highest accuracy with 99.9371% with lowest FP rate of 0.001 which is considerably good for any detection system. The implication of this is that more than 99 out of 100 attacks will be detected by this study detection system. However, Naivebayes has lowest accuracy of 93.524% and highest FP which is not good for any detection system.

Table 3 Results of Performance Evaluation of different machine learning algorithms with Random Forest of this study
Models TP Rate FP Rate Precision Recall F-Measure Accuracy
Random Forest (This study) 0.999 0.001 0.999 0.999 0.999 99.9371
J48 0.994 0.006 0.994 0.994 0.994 99.3713
Naivebayes 0.935 0.056 0.942 0.935 0.935 93.524
IBK 0.999 0.001 0.999 0.999 0.999 99.9057
Kstar 0.991 0.008 0.991 0.991 0.991 99.0883
SMO 0.984 0.015 0.984 0.984 0.984 98.3967
simpleLogistics 0.994 0.006 0.994 0.994 0.994 99.4027
Multilayeperseption 0.995 0.005 0.995 0.995 0.995 99.497
Decision Table 0.995 0.005 0.995 0.995 0.995 99.5285
PART 0.997 0.003 0.997 0.997 0.997 99.7485
NaivebayesSimple 0.946 0.045 0.952 0.946 0.946 94.6226
BayesNet 0.995 0.005 0.995 0.995 0.995 99.4656

4.2.2 Accuracy Comparison

As illustrated in Figures 4.1 below, out of the twelve classifiers, about nine achieved accuracy of up to 99% whereas the random forest model obtained the highest with 99.94%. Naivebayes achieved the lowest with 93.524%.

Figure 4.1 Accuracy results of the different Models

4.2.3 True Positive Rate (TPR) Comparison

True Positive Rate metric indicate the proportion of correctly identified attack. Figure 4.2 shows the graph of true positive rate of the different models. The true positive rate of random forest is higher as shown in the graph with 0.999% as compared to other models.

Figure 4.2 TPR results of the different classifiers

4.2.4 False Positive Rate of the classifiers (FPR)

Figure 4.3 shows the graph of false positive showing the misclassified data. Random forest model has a negligible percentage of incorrectly classified data with 0.001 in comparison with the other models.

Figure 4.3 FPR results of the different classifiers
4.2.5 Time Taken Performances Comparison

The time taken as shown in the Figure 4.4 below is depicting the duration of the models in detecting HTTP-DDoS attack when applied on the dataset. The Random Forest based model performed best within a shorter time range.

Figure: 4.4 Time taken performances result

4.2.6 Recall Performance of the classifers

Figure: 4.5 Recall performances result

4.2.7 F-measure Performances result of the Model

In Figure 4.6 below, random forest-based model is higher with 0.999 f-measure rate which signifies the highest performance accuracy when compared with other algorithms.

Figure: 4.6 F-measure performances result

4.3 Comparison of this study with existing research works

Table 4.2 displays the comparative analysis in terms of Machine Leaning model used, attack type, time taken, F-measure, True positive rate (TPR), False Positive rate (FPR), Precision , Recall and Accuracy of this study with other related studies.

Table 4 Comparison of this Study with existing

SN Author(s) & year Machine learning Attack Type F-measure TPR FPR Precision Recall Accuracy (%)

1 Mohamed, Karim& Mustapha(2018) RF HTTP-DDoS NA NA 0.04 NA NA 97.5%
1 Mouhmmad et al, (2016) MLP DDoS NA NA NA 0.48% 0.93% 98. 63%
2 Irfan, Amit and Vihakar (2017) MLP HTTP DDoS NA NA NA 0.92 0.96 98. 91%
3 Indraneel and Venkata (2017) SVM &BA HTTP DDoS 0.9457 0.96 NA 0.945 0.94 94. 8%
4 Sharmila and Roshan (2018) HCA andNB DDoS NA NA 0.54 NA NA 99. 45%
5 Proposed model RF HTTP-DDoS 0.999 0.999 0.001 0.999 0.999 99. 94%

*NA = not available
However, taking into consideration only the accuracy rate is not sufficient, especially when the data are imbalanced (Irfan, Amit, and Vibhakar, 2017). Just like in our case, the number of instances in the normal class was much higher than the other class. Therefore, the precision, F-measure, false positive, True positive and recall were also calculated for each model as shown above. From the comparison in Table 4.2 above, this research work performed better in all the parameters. We also expanded our experiment and used more machine learning algorithms when compared to existing models. The Figure 4.7 and Figure 4.8 below shows the comparison of the accuracy of this study with other related studies, as well as that of accuracy, precision and recall respectively.

Figure 4.7 Comparison of Accuracy of this Study with other Related Studies

Figure 4.8: Comparison of Accuracy, Precision and Recall of this Study with other Related Studies

4.4 Discussion

From the results and analysis above, we can infer that the random forest-based model outperformed (Mouhmmad et al, 2016) and (Irfan, Amit and Vibhakar, 2017) even though the same dataset was used in carrying out the study. This result also obtains higher accuracy as compared to (Indraneel and Venkata, 2017) and (Sharmila and Roshan, 2018) with an accuracy rate of 99.94%. (Mouhmmad et al, 2016) considered three machine learning, (Irfan, Amit and Vihakar, 2017) used four, (Indraneel and Venkata, 2017) used two, (Sharmila and Roshan, 2018) also used two while this research work used twelve. Random Forest based Model for detection of HTTP DDoS attack in Cloud Computing environment performed better.

CHAPTER FIVE
5.0 CONCLUSION AND RECOMMENDATIONS
5.1 Conclusion

Although cloud computing is a new emerging technology that introduces a number of benefits to the users, but unfortunately it faces lot of security challenges. Security challenges include DoS, DDoS, SQL injection, Cross Site Scripting (XSS), and hacking in general. XML and HTTP Flood are heavily used in Cloud Computing web services and very little work has been done to ensure security related to these protocols (Adrien and Martine, 2017). In this research study, we collected a new dataset of Mouhammd et al that includes modern types of attack. The dataset contains 27 features and four classes. The collected data has been recorded for different types of attack that target the application and network layers. The use of machine learning techniques in cloud computing will be an effective tool that will help in securing the data. Twelve machine learning algorithms (Random Forest, J48, Naïve Bayes, IBK, Kstar, SMO, Simple Logistics, Multilayer Perception, Decision Trees, PART, and NaivebayeSimple) were selected based on literature and applied on the extracted dataset to classify the data as either Normal or HTTP-DDoS. The Random Forest Model achieved the highest accuracy rate with 99.94%, outperforming some of the most recent existing models proposed by (Mohamed, Karim and Mustapha, 2018) with 97.5% , (Indraneel and Venkata, 2017) with 94.8%, (Irfan, Amit, and Vibhakar, 2017) with 98.91%, and (Mouhammd et al, 2016) with 96%.

5.2 Recommendations

Based on the findings of the study, the following recommendations were made for future work;
Hybridize two or more machine learning based models for better performance, especially supervised learning performance.
Investigate more DDoS attacks affecting the cloud environment and integrate their features into the existing dataset,
5.3 Contribution to Knowledge

According to results obtained, Random forest-based model is effective and efficient in detecting HTTP DDOS attacks. It also provides a model that reduces the rate of HTTP DDoS attacks success, thereby improving accuracy.
In terms of feature selection, this research was able to propose the use of Least Absolute Shrinkage and Selection Operator on the dataset which improved the performance of the classification algorithm used .

Reference

Abdulaziz A. & Shahrulniza, M. (2017). Cloud-Based DDoS HTTP Attack Detection Using Covariance Matrix Approach. Hindawi Journal of Computer Networks and Communications, 2017, 8 pages.
Adrien B. & Martine B. (2017). A Survey of Denial-of-Service and Distributed Denial of Service Attacks and Defenses in Cloud Computing. In Proceedings of the 2017 IEEE Students’ Conference on Electrical, Electronics and Computer Science (SCEECS), Montreal, Canada, 1–5.

Ankita, P. and Fenil, K. (2015) Survey on DDoS Attack Detection and Prevention in Cloud.
International Journal of Engineering Technology, Management, and Applied Sciences, 3, 43-47.

Apale, S., Kamble, R., Ghodekar, M., Nemade, H. ; Waghmode, R. (2015). Defense mechanism for ddos attack Approaches, methods and techniques. Journal of Network and Computer Applications, 57, 71–84.

Araar A. ; Bouslama, R. (2014). A comparative study of classification models for detection in IP networks intrusions. Journal of Theoretical ; Applied Information Technology, vol. 64, no. 1.

Ardjani, F., Sadouni K., ; Mohmed, B. (2010). Optimization of SVM multiclass by particle
swarm (PSO-SVM). J Mod Educ Comput Sci, 2,32–8.

Bandara, K.R. Abeysinghe, T.S. Hijaz A.J.M., Darshana, D.G.T Azeez, H., Kaluarachchi, S.J. Sulochana K.V.D.L ; Dhishan, D. (2016). Preventing attack using Data Mining Algorithms. International Journal of Scientific and Research Publications, 6, (10), 390 ISSN 2250-3153 ww.ijsrp.org

Chitrakar, R. ; Chuanhe, H. (2012). Anomaly Based Intrusion Detection Using Hybrid Learning Approach Computing. International Journal of Computer Applications, 72, 27-31.

Chitrakar, R. ; Chuanhe, H. (2012) . Anomaly Based Intrusion Detection Using Hybrid
Learning Approach of Combining k-Medoids Clustering and Naive Bayes Classification. In Proceedings of 8th IEEE International Conference on Wireless Communications, Networking and Mobile Computing (WiCOM).

Chunyi, P., Minkyong K, Zhe Zhang, ; Hui Lei. (2012).Vdn:Virtual machine image distribution network cloud computing. In: Proceedings of INFOCOM, IEEE, Orlando, FL, USA, pp. 181–189.

Csubak, D., Szucs, K., Voros, P. ; Kiss, A. (2016) Big Data Testbed for Network Attack Detection. Acta Polytechnica Hungarica, 13, 47-57.

Deka, R. K., Bhattacharyya, D. K. ; Kalita, J. (2015). Network defense: Approaches, methods
and techniques.” Journal of Network and Computer Applications, 57,71–84.

Frank, E. & Witten IH. (1998). Generating accurate rule sets without global optimization. In
Fifteenth International Conference on Machine Learning, 144–151.

Hanna, M. S, Ibrahim, E., Bader, A. & Alyoubi, A A. (2016). Application of Intelligent Data Mining Approach in Securing the Cloud Computing. (IJACSA) International Journal of Advanced Computer Science and Applications, Vol.7, No.9.

Hasan, M. A. M., Nasser, M.,. Pal, B. & Ahmad, S. (2014). Support vector machine and random forest modeling iptables. International Journal of Computer Engineering and Applications, 5 (2).

Hesham, K. & Fabrizio B. (2012). A cloud intrusion detection dataset for cloud computing and Masquerade attacks. Ninth International Conference on Information Technology- New Generations, 397-402.

Holden, A. N, & Freitas, A. (2008). A hybrid PSO/ACO algorithm for discovering classification rules in data Mining. Journal of Artificial Evolution Application, 2, 1–11.

Indraneel, S, Venkata P. & Kumar V. (2017). HTTP flood attack detection in application layer Using machine learning metrics and bio inspired bat algorithm. Journal of applied computing and informtics.

Irfan S., Amit, M. & Vibhakar M., (2017). Machine Learning Techniques used for the Detection and Analysis of Modern Types of DDoS Attacks. International Research Journal of Engineering and Technology (IRJET) Volume: 04.

John, G.H & Langley P. (1995). Estimating continuous distributions in Bayesian classifiers. In Eleventh Conference. Journal of Engineering Technology, Management, and Applied Sciences, 3, 43-47.

Karnwal, T., Sivakumar, T. & Aghila, G. (2012). A comber approach to protect cloud computing against XML DDoS and HTTP DDoS attack. In Proceedings of the 2012 IEEE Students’ Conference on Electrical, Electronics and Computer Science (SCEECS), Bhopal, India, 1–5.

Lo, C.C, Huang C.C, & Ku J. (2008). Cooperative Intrusion detection system framework for cloud computing Networks. In First IEEE International Conference on Ubi-Media Computing. 280–284

Lopez R & Onate E. (2006). A variation formulation for the multilayer perceptron. In International Conference on Artificial Neural Networks – ICANN, 4131, 159-168.

Modi, C. N., Patel, D. R., Patel A. & Muttukrishnan, R. (2012). Bayesian Classifier and Snort
Based Network Intrusion Detection System in Cloud Computing. In: The third IEEE international conference on computing communication & networking technologies, ICCCNT, Coimbatore, India; 1-7.
Mohammadreza, E., Sara, M, Fatimah, S. & Lilly, S. A. (2010). Intrusion detection using data Mining techniques. In International conference on information retrieval and knowledge management; 200–204.

Monowar, H., Bhuyan, Bhattacharyya, D.K, & Kalita, J. K. (2012). An effective unsupervised network anomaly detection method. In International conference on advances in computing, communications and informatics, 1,533–539

Mouhammd, A., Ghazi A., Ahmad, B.A. & Hassanat, M. A. (2016). Detecting Distributed Denial of Service Attacks Using Data Mining Techniques. In International Journal of Advanced Computer Science and Applications, 7, 1,

Mrutyunaya P., Ajith A. & Manas R. P. (2011). A hybrid intelligent approach for network intrusion detection. In International conference on communication technology and system design, procedia engineering, 1–9.

Nadiammai, G., & Hemalatha, M. (2014). Effective approach toward intrusion detection system using data mining Techniques. Egyptian Informatics Journal, 15(1), 37–50.

Noreen, K., Brahim B. S., Suziah, B.T., Sulaiman, I.A, & Muhammad, H. (2012). An Approach towards Intrusion Detection using PCA Feature Subsets and SVM. International Conference on Computer & Information Science, 569-574.

Prasad, K. M., Mohan, A. R., & Rao, K. V. (2014). Dos and DDoS attacks: Defense, detection and trace back mechanisms. In Global Journal of Computer Science and Technology, 14(7).

Qiao, Y. & Richard F. Y. (2015). Distributed denial of service attacks in software-defined
Networking with cloud computing. In IEEE Communications Magazine, 53(4), 52–59.

Rashmi, D. & Kailas D. (2015). Mitigating DDoS attack in cloud environment with packet
filtering using Iptables. In International Journal of Computer Engineering and Applications, 7(2).

Rui M., Rahul, P., Minlan Yu, & Navendu J. (2015). The dark menace: Characterizing network-based attacks in the cloud. In Proceedings of the ACM Conference on Internet Measurement Conference, 169–182.

Sahardi, R.M. & Vahid, G. (2013). New Approach to Mitigate XML-DOS and HTTP-DOS
Attacks for Cloud Computing. International Journal of Computer Applications, 72, 27-31.

Sarmila, K. & Kavin, G. (2014). A Clustering Algorithm for Detecting DDoS Attacks in Networks. International Journal of Recent Engineering Science, 1, 2349-7157.

Schneier, B. & Ranum, R (2011). Face-off: Assessing cloud computing risks. Available: http://searchcloudsecurity.techtarget.com/video/Face-off-Assessingcloud-computing-risks.

Sharmila, B. & Roshan, C. (2018). DDoS Attack Detection Using Heuristics Clustering Algorithm and Naïve Bayes Classification. Journal of Information Security, 9, 33-44

Thomas, V., Tom, V. Goethem, W. J. & Nick N. (2015). Maneuvering around clouds: Bypassing cloud-based security providers. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, 1530–1541.

Vieira, K., Schulter, A. & Westphall, C. (2010). Intrusion Detection for Grid and Cloud Computing. International Conference on Computer & Information Science, 12, 38–43.

Vissers, T., Somasundaram, T.S., Pieters, L., Govindarajan, K. & Hellinckx, P. (2014). DDoS defense system for web Services in a cloud environment. International Research Journal of Engineering and Technology, 37, 37–45.

Zhang, F. Marina, P., Philippas, T. (2011). CluB: A cluster based framework for mitigating distributed denial of service attacks. In ACM symposium on applied computing, 26, 20–27.

Zhang, F., Marina P., Philippas T. & Wei, W. (2010). Mitigating denial of capability attacks using Sink tree based quota Allocation. In ACM symposium on applied computing, 25,13–18.

APPENDIX A

Data during preprocessing in excel environment

Data during preprocessing in excel environment

Data During Preprocessing in Notepad ++ environment

Dataset view at the Edit Ribbon in WEKA Environment with its Features

Dataset at the Edit Ribbon in WEKA Environment showing a normal class
Dataset at the Edit Ribbon in WEKA Environment showing attack class (HTTP-FLOOD)

Dataset at the Edit Ribbon in WEKA Environment showing both normal and attack class

.