Confusion Matrix and Cyber Crime

Sweta Sardar
6 min readJun 6, 2021

--

Does the confusion matrix making you confused?

So let’s go into deep dive and know more in details about it.

Introduction to Confusion Matrix

  • A confusion matrix is a table that is often used to describe the performance of a classification model (or “classifier”) on a set of test data for which the true values are known. It shows what kinds of errors are made in predictions.

In order to check the performance of a classification-based ML model, the confusion matrix is hugely deployed.

  • As algorithms increasingly make decisions about human affairs, it is important that these algorithms and the data they rely on be fair and unbiased. One of the diagnostics for algorithmic bias is the Confusion Matrix.

While everyone who works with data knows what a Confusion Matrix is, it is a more subtle matter to gain an intuition for how it behaves under different kinds of distributions of predictions and outcomes and the range of possible decision thresholds.

Let’s start with an example confusion matrix for a binary classifier (though it can easily be extended to the case of more than two classes):

What can we learn from this matrix?

  • There are two possible predicted classes: “yes” and “no”. If we were predicting the presence of a disease, for example, “yes” would mean they have the disease, and “no” would mean they don’t have the disease.
  • The classifier made a total of 165 predictions (e.g., 165 patients were being tested for the presence of that disease).
  • Out of those 165 cases, the classifier predicted “yes” 110 times, and “no” 55 times.
  • In reality, 105 patients in the sample have the disease, and 60 patients do not.

Let’s now define the most basic terms, which are whole numbers (not rates):

  • true positives (TP): These are cases in which we predicted yes (they have the disease), and they do have the disease.
  • true negatives (TN): We predicted no, and they don’t have the disease.
  • false positives (FP): We predicted yes, but they don’t actually have the disease. (Also known as a “Type I error.”)
  • false negatives (FN): We predicted no, but they actually do have the disease. (Also known as a “Type II error.”)

The most dangerous error is the False Positive [FP] error as the machine predicted false but it was not false it was true. For example, the machine predicted student fails but actually student was a pass.

This error causes problems in the cybersecurity world where the tools used are based on machine learning or AI, it may give a False Negative error that may cause dangerous impacts.

Therefore the role of the confusion matrix is important in the field of machine learning.

Precision, Recall, Accuracy and F-Measure in Confusion Matrix

Precision:

It explains how many correctly predicted values came out to be positive actually. Or simply it gives the number of correct outputs given by the model out of all the correctly predicted positive values by the model. It determines whether a model is reliable or not. It is useful for the conditions where false positive is a higher concern as compared to a false negative. For calculating the precision, the formula is;

Precision: TP/(TP+FP)

Recall :

Recall describes how many of the actual positive values to be predicted correctly out of the model. It is useful when false-negative dominates false positives. The formula for calculating the recall is

Recall: TP/(TP+FN)

Accuracy :

One of the significant parameters in determining the accuracy of the classification problems, it explains how regularly the model predicts the correct outputs and can be measured as the ratio of the number of correct predictions made by the classifier over the total number of predictions made by the classifiers. The formula is;

Accuracy: (TP+TN)/(TP+TN+FP+FN)

F-Measure:

For the condition when two models have low precision and high recall or vice versa, it becomes hard to compare those models, therefore to solve this issue we can deploy F-score.

“F-score is a harmonic mean of Precision and Recall”.

By calculating F-score, we can evaluate the recall and precision at the same time. Also, if the recall is equal to precision, The F-score is maximum and can be calculated using the below formula:

F-measure= (2*Recall*precision)/ (Recall + Precision)

What is Cyber Crime?

Cybercrime, also called computer crime, the use of a computer as an instrument to further illegal ends, such as committing fraud, trafficking in child pornography and intellectual property, stealing identities, or violating privacy. Now it’s become one of the biggest problems of the world. They cause serious financial damages to countries and people every day.

Here, are some most commonly occurring Cybercrimes:

  • The fraud did by manipulating computer network
  • Unauthorized access to or modification of data or application
  • Intellectual property theft that includes software piracy
  • Writing or spreading computer viruses or malware
  • Digitally distributing child pornography

Cybercriminals typically rely on other actors to complete the crime. This is whether it’s the creator of malware using the dark web to sell code, the distributor of illegal pharmaceuticals using cryptocurrency brokers to hold virtual money in escrow or state threat actors relying on technology subcontractors to steal intellectual property (IP).

We can see the example of Logic bombs :

A logic bomb, also known as “slag code”, is a malicious piece of code which is intentionally inserted into software to execute a malicious task when triggered by a specific event. It’s not a virus, although it usually behaves in a similar manner. It is stealthily inserted into the program where it lies dormant until specified conditions are met. Malicious software such as viruses and worms often contain logic bombs that are triggered at a specific payload or at a predefined time. The payload of a logic bomb is unknown to the user of the software, and the task that it executes unwanted. Program codes that are scheduled to execute at a particular time are known as “time-bombs”. For example, the infamous “Friday the 13th” virus which attacked the host systems only on specific dates; it “exploded” (duplicated itself) every Friday that happened to be the thirteenth of a month, thus causing system slowdowns.

Logic bombs are usually employed by disgruntled employees working in the IT sector. You may have heard of “disgruntled employee syndrome” angry employees who’ve been fired use logic bombs to delete the databases of their employers. Most logic bombs stay only in the network they were employed in. So in most cases, they’re an insider job. This makes them easier to design and execute than a virus. It doesn’t need to replicate; which is a more complex job. To keep your network protected from the logic bombs, you need constant monitoring of the data and efficient anti-virus software on each of the computers in the network.

A binary classification model can be used to identify what is happening in the network i.e., if there is any attack or not. For evaluation of the model a metric used is Confusion Metrix.

THANK YOU FOR READING MY ARTICLE HOPE U WILL FIND SOMETHING NEW TODAY . . . . . . . .

🔹 !! HAPPY LEARNING !! 🔹

🔹 Keep sharing keep learning 🔹

🔹 Thank You 🔹

--

--

Sweta Sardar
Sweta Sardar

No responses yet