Logotype Logotype

The limitations of fraud detection today, and its future with Bolt

This article can also be found on Medium

Part 1: The problem of fraud is greater (and different) than you think

Today’s online payments experience is powered by dozens of unique tools — from payment gateways to fraud detection services and checkout tools. In the enterprise ecommerce space, this approach is called “layering,” in which business layer on suites of different tools to create “robust” payments and fraud detection stacks [Gartner].

The results, however, are abysmal, especially when it comes to fighting fraud. In spite of the proliferation of tools for preventing and managing fraud, the fraud problem continues to grow [Experian]. In this post, the “fraud problem” refers not just to fraud loss, but also to the cost of “false positives”, or customers falsely rejected for fraud concerns, as well as the labor and time overhead of having staff and software allocated to reviewing orders.

1.1 An overview of credit card fraud: the basics

Credit card fraud is a massive problem for ecommerce retailers. To better understand, let’s do a quick review of how credit card fraud happens. If you’re an ecommerce business, you’re likely all-too-familiar with these problems, so feel free to skip ahead.

When a stolen credit card is used, the original cardholder will call their bank to reverse the transaction(s). This initiates a chargeback. Merchants incur the liability for chargebacks, having to return the money, lose out on shipped goods, and pay extra fees to their payment processors. Having a chargeback rate over 1% causes extra fees, assessments, and eventually termination by the card networks [VISA].

Fraud has many origins: black markets of stolen cards, card numbers obtained during a major hack (e.g. Target or Sony), sophisticated credit card rings, petty thieves, and more.

1.2 The true cost of fraud and false positives

In 2017, total ecommerce sales globally was an estimated $2.3 trillion, a 24.8% increase over the previous year. [eMarketer] Of this, the Global Fraud Index estimates that there was $57.8 billion in potential fraud across eight industries studied by the Index.

However, the true cost of fraud extends far beyond the direct value of lost merchandise. This estimate doesn’t include labor of humans doing manual order review, opportunity costs from false positives, or the massive overhead of implementing fraud-fighting best practices.

False positives are perhaps the most punitive. They are not only costly in terms of lost direct sales, but also have significant 2nd and 3rd order effects:

  • Merchants lose out on valuable Customer Lifetime Value, whereby the customer has the potential to make multiple orders after the first successful order
  • Merchants lose out on valuable referrals by that customer, who now is not a brand ambassador
  • The customer is often times extremely frustrated and goes to a competitor. All this value is not only lost, but now also handed to a competitor.
  • The customer is a black mark, and may talk or post negatively about their experience on the merchant’s site.

Overall, factoring in the above costs every $1 of direct fraud costs organizations between $2.48 to $2.82 in loss. This means that roughly 2.5x the amount lost to fraud is lost fighting fraud [LexisNexis]. And, merchants overall will spend 3%-5% over their overall revenue combating fraud operationally [BigCommerce].

1.3 Current fraud solutions have limited data visibility, and therefore produce significant false positives

This fragmentation of toolsets not only complicates the products that are built – it also impacts data visibility. Each tool has a limited purview into data and is thus a limited ability to decision accurately. For example, a siloed fraud detection company misses out on valuable payments and checkout data, and has only a limited purview into a subset of that data.

Working from limited information, providers are forced to be extra conservative in their decision-making for fear that they let fraud through. In fact, their only job is to prevent fraud from happening. So, a business will sign up with a major fraud detection vendor, see fraud rates decline, but understand very little about their newfound, difficult to measure false positive problem.

On top of this, payment processors and ecommerce companies will implement several checks including AVS (Address Verification System) and CVV (Card Verification Value) which additionally block good customers from checking out. Address verification, for example, often times blocks people who frequently travel, don’t have permanent homes, or are students.

To draw users’ attention away from these blind spots, commercial fraud solutions herald access to consumer data that they harvest from social sites, the “deep web,” and a coterie of third party data services. They value the “amount of transactions [they] see across their entire network.” While the value of such data is plausible, most of it is demonstrably counterproductive. Depth and clarity of data, for any type of machine learning will always outweigh breadth of data, which industry-leaders misguidedly pride themselves in.

Simply put, the situation is bleak. Fraud companies lack access to critical data about customers, and care little to help businesses (their customers) much outside of their sphere of influence. In fact, the fraud-prevention industry, with few exceptions, is typically predatory in how they treat businesses, and their incentives could not be more misaligned. They rely on fear-tactics to over-inflate the amount of fraud that’s actually happening, scare merchants about the threats of fraud as they scale, and sell safety/security instead of a focus on approving good customers. The space is overdue for change.

Part 2: Building a better fraud detection engine

This is a very high-level overview given that a lot of our work here is Bolt’s “secret sauce.”

2.1 How today’s industry-leaders in fraud detection work

In today’s market, most fraud detection tools do merely that: detect fraud. They play no other role or function as it pertains to the user journey.  Every fraud tool today is integrated, typically, in two ways:

  1. A JavaScript tracker. This provides a sparse amount of on-page context. It tracks basic things like time per page, total # of pages visited, session IDs, etc. It can understand very little about what the user is actually doing on the page.
  2. An API. This loses all behavioral context, but can capture information about the user/order/etc.

Here’s an example of features passed to a current industry-industry fraud provider’s API:

Order: IP address, session ID, discount codes, time of order, payment gateway, payment method, currency, AVS response code, CVV response code, price
Buyer: Name, Email, Phone, Company name, Shipping address
Card: Name, BIN, last 4 of card, expiration date, billing address
User Account: Name, email, phone, sign up date, account #, last order, total orders created

In order to be able to parse the data, providers have formatting requirements for emails/phones/addresses/etc.

The problem with such a hefty integration is twofold:

  1. This takes a lot of time and engineering resources to integrate.
  2. Even after all this work, the result is suboptimal since it’s a small subset of the total data available.

2.2 More Data Depth with Bolt

At Bolt, our difference is fundamental. As the first full-stack market solution to handle checkout, payments, and fraud, we have unique visibility into the full suite of checkout payment information to use for fraud detection purposes. Bolt’s competitive access to data produces ample benefits to using Bolt in terms of time, labor-hours, and saved revenue.

Fraud modeling, like any other statistical modeling or machine learning, relies on having clear, consistent, normalized access to data. Very simply, without access to a particular feature (variable) that might be a strong signal, a machine-learning model is at a disadvantage. Bolt, as a payments processor and checkout flow, sees everything a merchant sees and more.

Consider the types of insights Bolt might pick up that a typical fraud tool could not:

  • Because we have a normalized checkout experience, we can ingest behavioral patterns as features. Mouse movements, keystrokes, capitalization, clipboard usage, and more make sense, because we understand all the elements of the DOM.
  • Because we’re the payment processor, we get extra data about the payment itself.
  • Input data is valuable, but understanding outcomes in detail is perhaps even more important. Bad outcomes data has crippled the fraud detection industry’s performance. We’re the payment processor/gateway and receive chargebacks directly and in full detail.

We also get data from across the Bolt network. Do we have more breadth of visibility than other multi-thousand person companies? Not right now. However, we have an order of magnitude greater depth of visibility, and this makes all the difference.

2.3 How research confirms Bolt’s advantage

Research confirms the sorry state of the current fraud prevention ecosystem, and the competitive advantage of Bolt’s technology. In a large-scale survey of traditional methods of fraud detection, the research found them to be an unequivocal failure in a setting where ecommerce margins apply [Elsevier]. While no fraud protection whatsoever is clearly untenable, neither is unstructured machine learning. With the best-performing algorithms, researchers successfully identified 99% of fraudulent transactions (in a sample size of 50 million transactions, they successfully pinpointed 495 out of 500 fraudulent transactions), but they incorrectly identified 500,000 legitimate transactions from good customers as fraudulent. Simply put, there were too many false positives to make the approach useful. These types of numbers are simply untenable for anyone trying to run an ecommerce store, where the average profit margin is as low as 5%, or as low as 0.5%-3.5% for ecommerce-only operations [Investopedia]. Bolt throws out these methods in favor of an approach that heavily weights behavioral analysis and complements it with a layer of human review.

2.4 Leveraging machine-learning

We built our fraud engine from the ground up to avoid most of modern fraud tools’ pain points. We used the following approach in developing our training pipeline:

  • Extensive transactional data is gathered as transactions are being authorized. This extra data is an important part of our models. Note that new data (i.e. features) are being evaluated and added continuously. The data is split between for training and validating (testing). Intuition is extremely important for figuring out which features are valuable, which largely comes from the ample industry experience on our team.
  • After gathering, data is prepared for model consumption. Our models are trained; the combination of models, input data, and training parameters being our ‘secret sauce’. We currently train several different base models that we use in different scenarios. We also use techniques such as boosting, bagging, and cross validation to continuously improve performance.
  • Models are pushed to production in “monitor” mode, meaning they run against live transactions, but their decisions do not affect any outcome. This allows us to test the model performance.
  • To maximize approvals and minimize false positives (good orders marked as fraud), our model only performs auto approvals, and riskier transactions are left for professional review (described below). If our models have high confidence of a transaction being fraud, we warn reviewers.
  • After the monitoring period, models that perform better are switched to “active” and are allowed to affect the outcome of transactions. Our models not only decision, but spit out reasons for their decisions in human-readable form that our merchants are able to see.

Our codebase is Python, both for training and serving model decisions. Future blog posts will contain code snippets and greater detail.

2.5 Incorporating professional human review

In addition to review by the fraud engine, Bolt caps its process with a select manual review of the most high-risk orders. Unlike most fraud vendors, we begin our review with the assumption that an order is good, and try to find definitive evidence of fraud. When in doubt, we will release an order. We use our custom built OMS (Order Management System) to process orders by priority based on a number of dynamic points. A critical difference in our manual order review process: our our analyst decisions often times train our risk models. Not the other way around.

2.6 A holistic approach to fraud detection

On top of our key advantages from a data and machine learning perspective, Bolt has several other fundamental advantages given the way we handle fraud. Consider the following comparison of Bolt and the major fraud solutions in the market:
Fraud Vendor Comparison Chart

2.7 Case studies

The proof of Bolt’s success rests in its case studies. Let’s take Invicta Watches as an example: Before Bolt, Invicta Watches was using a standard tool suite: Authorize.net for payments, Magento for checkout, and Signifyd plus an in-house review system for fraud. After switching to Bolt, Invicta saw a 153% lift in checkout conversion rate, a 16.7% lift in order approval rate, and a decrease to $0 spend on fraudulent chargebacks. See the full case-study here.

Invicta Bolt Case Study Results

Here are more case studies with similar results, and dozens more available if you contact our sales team: [email protected].

2.8 The Bolt Difference

We’ve reviewed how Bolt’s visibility into checkout gives it superior data access and allows it to defeat false positives. What this represents, fundamentally, is a frame shift from fraud vendors being at odds with their customers, to being aligned with them:

  • Payment companies are incentivized to maximum volume without much regard for helping with fraud detection since they take no liability.
  • Fraud vendors are incentivized to block as many fraudulent orders as possible, otherwise they will be seen as bad at their job. This which maximizes false positives and loses the business revenue.

Bolt is aligned both as the payments company and the fraud detection company with full fraud liability coverage. As a result, Bolt is incentivized to minimize false positives and bring the maximum revenue back to the merchant.

Our competitive advantage in data visibility at the checkout stage lets us make surprising fraud decisions based on behavioral data like mouse movements, keystrokes, capitalization, and clipboard usage. Let’s look at some examples where major fraud detection providers falsely recommended declines:

Note: we’ve used dummy data here to obscure personal data, but the examples are real.

Example 1

Here’s a result in Sift Science recommending that the merchant decline an order.  Notice the “Sift Score” of 100 is a “definite reject” recommendation. However, Bolt approved this order. No chargeback was ever issued.

Sift Science Rejected Order

Example 2

Let’s look at another example where Sift Science wrongly recommended a decline. In this case, the adverse score is the association of a gift card recipient’s salutation name with multiple emails. “Victoria” is a common name, so it’s no surprise it was linked to multiple emails (emails anonymized for privacy reasons). This inaccurate result is a troubling artifact of a poor integration between Sift Science and the merchant. The more 3rd party tools that are involved in a merchant’s ecommerce stack, the greater the likelihood of such errors.

Sift Science Rejected Order 2Sift Score of 100 likely tied to the connection of the name “Victoria” with multiple user emails, itself an artifact of a poor integration. Such issues could render Sift Scores inaccurate.

Example 3

An order for which popular fraud detection tool Signifyd recommended a decline, precisely because it lacked personal identifying information (device, email history), that they rely on for fraud decisioning. After running an analysis with Bolt’s fraud engine, the order was approved by Bolt. No chargeback was issued on this order:

Signifyd Rejected Order

Lack of user information (address, device, email) played a role in the low Signifyd score. Scores like these are a recommended “decline”. This same transaction was approved by Bolt with no issues, and no chargeback. Perfect example of a fraud “false positive”.

These two cases form a snapshot of a recurring problem: false positives. To put it simply: fraud tools on the market today ultimately cause more revenue loss than the chargeback costs they mitigate.

2.9 Why we do what we do

Bolt was built by ecommerce pioneers. It counts the former Chief Risk Officer at CashStar alongside key engineers who built payments and fraud systems at Google, Facebook, and Twitter. Our DNA includes retail, ecommerce, consumer finance, and payments–in addition to computer and hard sciences. Cofounders Ryan Breslow and Eric Feldman dropped out of Stanford’s Computer Science program, Ryan previously spending years running a software development agency, working with ecommerce merchants, and learning their pain points. As a team, we’ve brought the best of what we’ve learned to Bolt, and we won’t stop until our vision becomes a reality. While payments and fraud infrastructure on the internet is fragmented and broken, as a full-stack solution for payments, checkout, and fraud, Bolt brings much-needed clarity and a real solution into the ecosystem.

If building the future of checkout and fraud excites you, we’re hiring across all roles – see our jobs page.