Use of Confusion Matrix in Cyber Security

When we get the data, after data cleaning, pre-processing and wrangling, the first step we do is to feed it to an outstanding model and of course, get output in probabilities. But hold on! How in the hell can we measure the effectiveness of our model. Better the effectiveness, better the performance and that’s exactly what we want. And it is where the Confusion matrix comes into the limelight. Confusion Matrix is a performance measurement for machine learning classification.

What is Confusion Matrix?

A Confusion matrix is an N x N matrix used for evaluating the performance of a classification model, where N is the number of target classes. The matrix compares the actual target values with those predicted by the machine learning model. This gives us a holistic view of how well our classification model is performing and what kinds of errors it is making.

For a binary classification problem, we would have a 2 x 2 matrix as shown below with 4 values:

Let’s decipher the matrix:

  • The target variable has two values: Positive or Negative
  • The columns represent the actual values of the target variable
  • The rows represent the predicted values of the target variable

But wait — what’s TP, FP, FN and TN here? That’s the crucial part of a confusion matrix. Let’s understand each term below.

Understanding True Positive, True Negative, False Positive and False Negative in a Confusion Matrix

  • The predicted value matches the actual value
  • The actual value was positive and the model predicted a positive value

True Negative (TN)

  • The predicted value matches the actual value
  • The actual value was negative and the model predicted a negative value

False Positive (FP) — Type 1 error

  • The predicted value was falsely predicted
  • The actual value was negative but the model predicted a positive value
  • Also known as the Type 1 error

False Negative (FN) — Type 2 error

  • The predicted value was falsely predicted
  • The actual value was positive but the model predicted a negative value
  • Also known as the Type 2 error

Let me give you an example to better understand this. Suppose we had a classification dataset with 1000 data points. We fit a classifier on it and get the below confusion matrix:

The different values of the Confusion matrix would be as follows:

  • True Positive (TP) = 560; meaning 560 positive class data points were correctly classified by the model
  • True Negative (TN) = 330; meaning 330 negative class data points were correctly classified by the model
  • False Positive (FP) = 60; meaning 60 negative class data points were incorrectly classified as belonging to the positive class by the model
  • False Negative (FN) = 50; meaning 50 positive class data points were incorrectly classified as belonging to the negative class by the model

This turned out to be a pretty decent classifier for our dataset considering the relatively larger number of true positive and true negative values.

What is Cyber Security?

Cyber Crimes are increasing day by day, here is some example of cybercrime that took place in 2021 and this will give you the idea that how important is cybersecurity :

  • Australian broadcaster Channel Nine was hit by a cyberattack on 28th March 2021, which rendered the channel unable to air its Sunday news bulletin and several other shows.
  • In March 2021, the London-based Harris Federation suffered a ransomware attack and was forced to “temporarily” disable the devices and email systems of all the 50 secondary and primary academies it manages. This resulted in over 37,000 students being unable to access their coursework and correspondence.
  • A cybercriminal attempted to poison the water supply in Florida and managed by increasing the amount of sodium hydroxide to a potentially dangerous level.
  • Acer suffered a ransomware attack and was asked to pay a ransom of $50 million, which made the record of the largest known ransom to date

By going through the cyber crimes that happened in 2021, we can understand that how important is cybersecurity.

Now we will look at a case study to summarize what we have learnt

Cyber Attack Detection Using Support Vector Machine(SVM)

KDD CUP ‘’99 Data Set Description

In the KDD Cup 99, the criteria used for evaluation of the participant entries is the Cost Per Test (CPT) computed using the confusion matrix and a given cost matrix. A Confusion Matrix (CM) is a square matrix in which each column corresponds to the predicted class, while rows correspond to the actual classes. An entry at row i and column j, CM (i, j), represents the number of misclassified instances that originally belong to class i, although incorrectly identified as a member of class j. The entries of the primary diagonal, CM (i, i), stand for the number of properly detected instances. Cost matrix is similarly defined, as well, and entry C (i, j) represents the cost penalty for misclassifying an instance belonging to class i into class j.

  • True Positive (TP): The amount of attack detected when it is actually attack.
  • True Negative (TN): The amount of normal detected when it is actually normal.
  • False Positive (FP): The amount of attack detected when it is actually normal (False alarm).
  • False Negative (FN): The amount of normal detected when it is actually attack.

Confusion matrix contains information actual and predicted classifications done by a classifier. The performance of cyber attack detection system is commonly evaluated using the data in a matrix.

Thank You!!