Machine Learning In Fraud Detection: An In-Depth Analysis

Solving financial
fraud detection
with machine
learning
methods

November 28, 2022

12 min read

Software engineering

Harnessing the power of machine learning for effective fraud detection: a study of algorithms and techniques

For decades, financial organizations relied on rule-based monitoring systems for fraud detection. These legacy solutions, deployed in SQL or C/C++, were engineered to translate the expertise of domain specialists into complex SQL queries. However, these systems often needed to be more cohesive and brittle, and any attempt to modify them, such as updating a threshold, could lead to the collapse of the entire codebase. This rigidity hindered banks’ ability to combat fraud effectively, as criminals continually devised new methods to evade the fraud detection system used by these rule-based platforms.

In response to these challenges, many financial firms have abandoned their legacy tools in favor of new-age Machibe Learning (ML) solutions. Machine learning algorithms can swiftly process millions of data objects and identify suspicious patterns by linking instances from seemingly unrelated datasets. These algorithms are among the few remaining tools capable of helping banks and healthtech organizations keep pace with increasingly sophisticated defrauding schemes. However, choosing the correct machine learning algorithm to identify illicit transactions can be unclear for those with a data science background. This article describes a few popular choices to aid in this decision-making process.

Fraud detection using machine learning algorithms

In the realm of fraud detection, both supervised and unsupervised machine learning techniques have been employed by financial institutions to identify anomalies in data. This section delves into supervised and machine learning fraud detection algorithms, including Random Forest, K-Nearest Neighbors (KNN), Logistic Regression, and Support Vector Machine (SVM). Each of these methods has unique strengths and weaknesses, and their effectiveness can vary depending on the specific characteristics of the data and the nature of the fraud detection problem at hand.

Random Forest

The method leverages a set of randomized decision trees and averages across their predictions to create outputs. It has multiple trees producing different values, and this prevents the algorithm from overfitting to training datasets (something standard decision tree algorithms tend to do) and makes it more robust to noise.

Numerous comparative studies have proven RF’s effectiveness in fraud detection relative to machine learning solutions and other models. The results of this research show that an RF-based model outperforms a support vector, machine learning algorithms, and even a neural network in terms of AP, AUC, and PrecisonRank metrics (all of the models made predictions on an actual transaction data from a Belgian payment provider).

K-nearest neighbors (KNN)

The algorithm predicts which class an unseen instance belongs to based on K (a predefined number) of most similar data objects. The similarity is typically defined by Euclidean distance, but for specific settings, Chebyshev and Hamming distance measures can be applied too, when that is more suitable.

So, after being given an unseen observation, KNN runs through the entire annotated dataset and computes the similarity between the new data object and all other data objects in it. When an instance is similar to objects in different categories, the algorithm picks the class with the most votes. If K=10, for example, and the object has 7 nearest neighbors in category a and 3 nearest neighbors in category b – it will be assigned to a.

Though quite noise-sensitive, KNN performs well on real financial transaction data. Over the years, studies have demonstrated that KNNs have a lower error rate than Decision Trees and Logistic Regression models and can beat Support Vector Machines regarding fraud detection rates (sensitivity) and Random Forests in balance classification rate.

Logistic Regression

An easily explainable model that enables us to predict the probability of a categorical response based on one or a few predictor variables. LR is quick to implement, which might make it seem like an attractive option. However, the empirical evidence shows it could perform better when dealing with non-linear data and that it tends to overfit training datasets.

This paper, for instance, describes how neural nets have a clear edge over LR-based models in solving credit card payment fraud detection and problems. Similarly, IEEE states LR machine learning models cannot provide predictions as accurate as those produced by a deep learning model and Gradient Boosted Tree (for this experiment, the researchers had all three models making predictions on a dataset containing about 80 million transactions with 69 attributes).

Support Vector Machine

SVMs, advanced yet simple in implementation, derive optimal hyperplanes that maximize a margin between classes. They utilize kernel functions to project input data onto high-dimensional feature spaces, wherein it is easier to separate instances linearly. This makes SVMs particularly effective regarding non-linear classification problems such as financial and fraud detection systems.

In this study, the performance of an SVM in investigating a time-varying fraud problem is compared to that of a neural net. The researchers write that though the neural networks and models show similar results (in terms of accuracy) during training, the neural net tends to overfit training datasets more, making the SVMs a superior solution for detecting fraud in the long run.

Besides, an imbalance class weighted SVM-based fraud detection model is more suitable for working with real-world credit card transactional data (which is an imbalance in nature) and shows higher accuracy rates in the fraud detection problem than Naive Bayes, Decision Tree, and Back Propagation Neural Network classifiers.

It should be noted that while SVMs work great in complicated domains with distinct margins of separation, their performance on large data sets is generally average. If there is noise in data, it can hamper SVM’s accuracy tremendously, so when there are many overlapping classes of more data (and we need to count independent evidence), other supervised algorithms would probably make a better choice.

Long short-term memory (deep learning fraud prevention and detection)

LSTM is a type of Recurrent Neural Network architecture that was designed specifically to learn long-range dependencies; it tackles the vanishing error problem (which RNNs are particularly prone to due to using the same processing units on every layer) by applying constant error carousels to enforce a constant error flow within cells. The model’s fundamental property is that it has multiplicative gates determining when to grant access to cells and which parts of input to ignore.

LSTMs are challenging to integrate into real-world applications learning in learning for fraud detection at this point, so they have not yet become a mainstream tool for financial fraud detection among banks. However, there are already scientific papers published that formulate credit card fraud detection as a sequence classification task for which LSTMs, due to their unique properties, are a perfect solution. This publication, for example, suggests that when compared to a random forest classifier, an LSTM can increase fraud detection accuracy even on offline transactions (the situations where cardholders are present physically at the bank).

Also, according to Nitin Sharma, Paypal has achieved remarkable results in classifying clients’ behavior using the architecture. Rather than concentrating on researching the transactions alone (which gives quite a limited amount of information), the payment provider has decided to study, through LSTMs, the long sequences of event-based user behavior to see the bigger picture.

Instead of manually engineering features and hardcoding timelines, the company uses raw event data and applies LSTMs to learn temporal representations. This enables PayPal to model the problem at the event level and analyze the actions that might lead to a fraudulent transaction (they look into clues such as whether the user has changed their home, shipping, or billing address, replaced their contact details, etc.)

This switch from hand-coded features to using raw event data and LSTMs has given PayPal a more granular perspective on the fraud detection problem, Nitin says, increasing their performance in anomaly detection and fraud detection algorithm by 7-10%.

Fraud detection algorithms: unsupervised methods

Unsupervised machine learning systems and methods such as K-means and Self-Organizing Maps (SOM) have been utilized to identify unusual patterns in data. K-means, one of the oldest and most well-known unsupervised techniques, partitions instances of unlabeled data into clusters. SOM, an unsupervised deep learning method, is used for clustering high-dimensional data and reducing it to one or two-dimensional surfaces.

K-means

One of the oldest, most well-known unsupervised techniques, K-means, is still widely used. The method involves partitioning instances of unlabeled data into a number (K) of clusters to minimize the square distance between the data objects and centroid in each group.

The basic flow of the algorithm goes like this: we pick K (the number of clusters the algorithm will be trying to produce), and the model chooses, at random, K of points to be the centers of these clusters.

Then, each centroid claims all the data points closest to it, and after the results of the first attempt at clustering are obtained, the algorithm recomputes the centroids by averaging the cluster points. It then keeps looping through these two actions until the convergence is reached.

The model’s weak point is that it is ultra-sensitive to the initial center points and thus vulnerable to outliers. Also, the knowledge of someone with deep financial expertise would be needed to pick the optimal value for K.

That being said, several studies describe the successful application of k-means to the anomaly identification task. Here, researchers generated an extensive dataset consisting of credit card numbers, merchant category IDs, transaction dates, countries, and amounts and had the model try to divide the data points into four clusters: low, high, risky, and high risk.

The results of data analysis were encouraging, the researchers say, as fraudulent activities were spotted most of the time, and there was but a slight false positive rate.

There are also more complex approaches involving the architecture machine learning model, such as this one. The framework proposed in the paper combines K-means with a Hidden Markov Model to tackle criminal activity detection. The prior is applied to the historical data from a financial services industry provider to categorize customers based on how much money they usually spend (the categories are: low, medium, and high transactions), and then the latter model generates outputs that are probabilities of fraudulent transactions. For those looking to delve deeper into advanced machine learning solutions or seeking professional assistance in implementing such sophisticated models, Onix Systems offers a comprehensive range of services tailored to meet these needs. Their expertise in crafting and deploying intricate machine learning algorithms can provide invaluable support in tackling challenges like fraud detection. Explore their offerings at ml development where you can find a wealth of resources and expert guidance to elevate your projects.

Self-Organizing Map (SOM)

This unsupervised deep learning method is used for the clustering of high-dimensional data. It tries to project data down (the data does not need to be linear) to one- or two-dimensional surfaces while capturing as much information about the dataset’s inner structure as possible.

Here is how it works: we first find a neuron in the network that has similar weights to the input feature values (the input vector is sampled at random), and then we calculate the neighborhood of that neuron.

After we have found the best matching unit, we update the weights of the neuron and the neurons closest to it to make them more like the input vector (the more intimate the neurons are, the more their weights are modified, the farther away, the less.) We repeat this by sampling a new input vector each time to go through the entire dataset.

The neurons representing input instances act similarly to centroids in K-Means, which is why some call SOM a constrained K-means.

Due to its inherent capability to reduce dimensionality, the algorithm is uniquely poised to deal with high-dimensional inputs such as transaction data. When applied to the detection of abnormal transactional activities, the model first groups data into categories of “fraudulent” and “legitimate” through self-organization (which is the iterative updating of neurons’ weights to capture the best possible input representations) and then, after being given a new instance, assigns it to one of the groups based on how similar the input is to genuine or fraudulent transactions.

An interesting SOM-based method for identifying payment fraud is proposed here. The researchers visualize multidimensional data (the matrices that store records that reflect sequential activities of users’ multiple payment methods) through a self-organizing map and then apply a threshold-type system for fraud detection. The method shows clear benefits of SOM-produced visualization for transaction classification.

K-means and SOM have proven effective in identifying fraudulent activities in financial data. K-means has been successfully applied to anomaly identification tasks, and SOM has shown clear benefits in visualizing transaction classifications in fraud scenarios. However, the choice of method depends on the specific characteristics of the data and the nature of the fraud detection problem at hand.

To conclude

Various methods have been proposed for fraud prevention with machine learning in the fraud detection machine learning itself, both supervised and unsupervised. The supervised approaches rely on explicit transaction labels, i.e., machines need to repeatedly be shown what genuine transactions look like during training to distinguish the fraudulent ones later.

In contrast, unsupervised models capture normal data distribution in unlabeled data sets when they are being trained. And then, when given a new data instance, they try to determine whether the sample is legitimate or abnormal (suspicious) based on the patterns and structures they have derived.

In this article, we have reviewed 7 ML models. Still, there needs to be a telling which method will suit your processes and data science best in a particular setting without doing research and experimentation. We would have to assess what data and features you have readily available to figure out which model can help you detect fraud efficiently.

Looking to implement a machine learning system to reduce losses caused by credit card misuse, identity theft, unauthorized access to your payment systems, and other types of fraud? Contact us right away to get even more in-depth insults and establish the foundation for further cooperation.