Distil's Edward Roberts, Director of Product Marketing and Anna Westilius, Senior Director of Security explain GiftGhostBot, a sophisticated bot attack on gift card balances, which affected nearly 1,000 customer websites around the world.
The following is a full transcript and video of GiftGhostBot Explained.
Edward: You've probably heard during the last U.S. presidential election was that bots were affecting political campaigns via Twitter and Facebook. But bots are not just in that arena– they're in the web, all over it. If fact, over half the traffic on the internet is some sort of bot. Generally, a bot is any script or a tool that is built to automatically interact with a website, an app, or a system, trying to mimic that it's a real human. A classic example of a bot that you might not have thought about as a bot is Google.
Google’s Googlebot is a bot that goes around the web, constantly around the clock, scraping everyone's website so that people can find the information they’re searching for. It’s creating a good service for people. Googlebot would be classified as a good bot. You would want Google to crawl your site because you want to be searchable on the web.
Unfortunately, in that mix of traffic, there's also bad bots on your site, and they're doing all manner of things. The Open Web Applications Security Project (OWASP) created the Automated Threats Handbook, categorizing and itemizing the 20 automated threats on the web.
Let’s start with scraping. If you have a price or if you have content, you've probably got somebody scraping that price or that content for some reason that you don't want them doing it. For example, a competitor continuously scrapes your site, bringing the price back and making sure that their price was not lower than yours. There are many stories of Amazon using scraping as a tactic to pull other online retailers’ prices and making sure they're always beating them.
Another bot example is seen via spamming. You've probably seen comments sections on websites before. Those comments can be spammed by bots, inserting a link with malware or some other malicious end.
Looking at the the top-right, you can see account credentials. If your site has a login (e.g., username and password), then malicious users will definitely take advantage of it. Every time there's a data breach where you hear about millions of usernames and passwords are made available, those credentials are then made available on the dark web. Since many normal users reuse their login credentials (generally their email address and an easy-to-remember password) on this other site, then virtually every other website on the web is going to be hit by bots trying to log in using those hijacked usernames and passwords. Now, since humans are inefficient and/or inaccurate at those repetitive tasks, malicious users would rather use a bot.
Up next, cardholder data, which is related to any credit card and payment information. In this threat, any checkout area or other process related to a credit card transaction is liable for fraud or abuse. If you can log in and steal somebody’s account, then you’ve got access to their credit card information inside of that account. Then, you can buy things, transfer things out, and do any other manner of malicious activity in the commandeered account. In a way, getting account access is the holy grail, since you could change the password and lock the original account owner out entirely.
So as you can imagine, protecting those areas are very important for businesses, and these are the issues that we deal with at Distil all the time. We have companies that come to us and say, "We have somebody scraping our prices and they're doing it around the clock. It's bringing our site down. It's slowing the server responses, and we've got people logging into our account pages. We're getting spikes that are bringing us down. We don't have the bandwidth to be able to handle 10 million requests that are hitting our login page. Our fraud has gone through the roof because people are logging into our accounts, and then committing fraud within them."
Those are only a few reasons why our customers come to us, but the basic, common issue comes down to the fact that there's some piece of automation–what we call a bad bot–doing the negative activity on their site.
Taking a look at the bottom of the diagram, you can see the kinds of the effects of the bots. For example, denial of service, which is when you have too many bots, causing your website to slow down. It could lead to a brownout, or even a blackout.
Another problem deals with incorrect data and analytics. If you think half of the traffic on your website is a bot–not actually a human–if you're making decisions based on that information in your Google Analytics or any other analytics tool, then you're making those decisions based on flawed information.
Now, I’m going to tell you a bit more about a specific attack we saw at Distil. The technology actually detects those simple bots and prevents them. But the problem is when they get more and more sophisticated, until they become what we call "advanced persistent bots." Advanced persistent bots are going to go after you around the clock, 24/7. They're not going away. They're so advanced that they try to mimic humans. They're trying to evade whatever technology is in place. That's where our team of analysts comes in. The team works with our clients and our customers to solve the problem with them and make changes to beat the advanced behavior.
Anna Westelius is one of those analysts who looks at our customers data, getting into the fight every day, and protecting many companies' sites with them. She’s changing the technology, changing the way we defend things, and really looking at the logs and the information of bad bot traffic in order to prevent them and getting successfully through whatever defenses are there.
Moderator: Edward, a question just came in from the audience. What is an example of a sophisticated bot?
Edward: So let’s say you have a business and have products with prices on their product pages. And as a shopper goes through your site, filling out delivery information and applying discounts as they add things into their shopping cart. Well, some of the most sophisticated bots can perform those same steps, adding items into a cart, applying discounts, and then scraping the price within the final shopping cart. They can automated these steps and perform them much faster than a real human user. Once they know what the steps need to be, they can just make those things happen. Another level of sophistication is that they can change how they look. For example, they may be saying that they're coming from a Chrome browser when they're not.
They may be using popular automation tools like Selenium and PhantomJS in order to access the website. But those could also used by bots, and that's behavior that we would detect, and that's a more sophisticated level. There's sophistication in what they actually do on the website, and there's sophistication in the way that they try and hide themselves, and make themselves appear more human.
There's also sophistication with how they move. For example, some bots don't move with a normal mouse movement, but they will move in a direct straight line, and that's a giveaway. So now bots are trying to make themselves move a mouse in a way that looks human, and then pause before they click the next page. So there's many ways of sophistication and ways they're trying to hide themselves. It's not one thing that they do.
So, now we're going to tell you about this attack that we coined "GiftGhostBot," and it's sort of the great gift card heist that we called in early 2017. We were protecting in front of many companies’ websites, and one of our customers called us and said, "We just wanted to say thank you," and we sort of replied and said, "Well, what do you mean to thank you?" And then they said, "We've been absolutely inundated on our gift card part of our website, and we're doing everything we can to make sure that it doesn't bring the site down, but right now, you're cleaning it all out, and you're helping it, and we haven't gone down. So we want to say thank you."
So we responded, "Well, that’s what we’re meant to do here. Why would you say thank you?" And they said, "Well, we've gone and checked all our competitors, and none of them are up."
It turns out, all of their competitors–who we were not in front of and protecting–had gift card account pages where their users were unable to check their gift card balance because that function was being attacked. The competitors all basically said, "This is so bad that we've taken this function out of the website." And they put up a message of something along the lines of, "Sorry, this access is unavailable right now."
Now, it wasn't a competitive attack. It was somebody who was trying to access the gift card checking functionality. And so it was happening across many, many domains and many companies. So with that setup, I'll hand you over to Anna to hear about what happened next and what Anna's team went through.
Anna: Thank you, Edward, and no pressure at all saying this is the exciting part, but I do believe that it's true. I'm gonna talk you guys through the attack as it happened. As a little bit of background, the detection that was used, how we saw the attack, and how it evolved over time.
So to start, you might be wondering, why gift cards? As Edward said before, bots are not really a security issue. It is a way for people to abuse resources and websites. They were built to provide functionality to users. It isn’t breaking or inherently exploiting anything, but they’re doing something on a much larger scale to gain information or things that they shouldn't necessarily have access to. And, gift cards are basically the same as cash, right?
Think about any gift cards that you get from Macy’s or any other large retailers, or even toy stores where you can buy gift cards for your kids. They can use that as cash, almost literally. These gift cards are more or less anonymous, basically untraceable. They can be used and transferred between users, and some entities actually accept them as cash for merchandise that isn't related to the stores that they're originally tied to. Actually, just a couple of months ago, my motorcycle dealership accepted a bunch of Toys “R” Us gift cards as cash money to buy a motorcycle, which is a very expensive thing to buy.
But they were fine with that because the gift cards are like cash. And because of the digitalization of this informal currency such as gift cards, they are made easily available online. So if you're an attacker or a hacker and you want to make some easy money, you can go in and more or less get free money online if you just know how to abuse these resources. Companies make these services readily available online, and they're not treated as normal money. Yet a gift card isn't considered with the same scrutiny as your bank accounts online, right?
It doesn't have to stand up to the same level of security, and these companies are payment providers, so they don't enforce the same level of security towards these things. That's really why someone would want to start attacking gift cards or getting gift cards. So the goal for these attackers was to go in, get as much money as they could possibly get, and get out. Like Edward said, we were only alerted by customers who said, "Thanks, guys. All of our competitors are down. Thank you so much for protecting us."
And that happened pretty early on in the timeline that would eventually end up been a 20-day attack that we saw distributed over more than 1,000 different websites. So these guys were building these little scripts, distributing them all over the world, and attacking more than 1,000 websites at the same time to get these gift cards. They knew what they wanted. They wanted this cash. And if you could just go in and check the gift card balance on the website by entering the gift card number, and you know that there's cash on it, you can steal that number, and reuse it somewhere else.
As this attack progressed, we saw it distributed worldwide. The detection portion was mainly due to the fact that it was simultaneous. We saw them attacking so many different websites at the same time, but within a specific industry. They were specifically targeting fashion and clothing retail online stores, because people generally buy really expensive gift cards.
If you go into a large retail site that sells really expensive shoes or dresses, people get really expensive gift cards. If you're a hacker, you don't want to go into somewhere where you get a $10 gift card, you want go in and see what you have as $500 or $1,000 gift cards. So they were targeting the fashion-clothing industry, and we saw that happening at the same time. And that kind of set the alarms off a little bit and for us internally.
We knew it was the same attack, not only from the timing perspective but also because they showed the same type of behavior where they were rotating through large number of IP addresses. The IP address is the address that identifies you online, but they’re readily available for purchase. So you can subscribe, more or less, to a large number of IP addresses today and do a small number of requests per each IP to avoid detection.
Malicious users can buy 200 IP addresses and then make five searches per IP so that it just looks like 200 users are doing 5 searches. That's not suspicious, right? And so what we could see was that these attackers were rotating through IP addresses. They were switching between different hosting providers and ISPs, or the organizations providing all of the IP addresses. They were distributing themselves all over the world.
They also cloaked themselves to appears as though they were using different browsers for each request. So, when they send the request to the web server, they say, "Hey, I'm Chrome," or, "Hey, I'm Firefox," and they're rotating through these to seem like they're different users and not one and the same.
Now, I'm going to talk a little bit about the analysis portion of this. What makes this attack very interesting at a high level, just from a summary perspective, is that it was evolving. Meaning, as we started detecting this attack, we started blocking each one as we identified it? We identified them, we start putting in rules, making sure that they can't access the resources that they're trying to get to. But they evolve. They change. The attackers are sitting on the other end, seeing that they can no longer get these gift cards, so they change their scripts. They pay more expensive services. They change the way that they behave so that they will seem more human. And they were doing this in a very controlled way, which I will talk a little bit more in detail. What was also interesting, from a high-level perspective, is that it was obviously using multiple channels to execute their attacks.
One channel they have is to subscribe to botnets, which basically involves paying people to distribute your bots for you. Usually, if you buy the bots, you aren't the same person that owns these networks that you can use later. You rent the space, and then they distribute your scripts. What we saw during this attack was that one of the channels they used seemed not to allow for change, so when they started to change behavior, they couldn't do that through one of the channels they were using.
So now, let’s take a step back and review the timeline again, but with a little bit from an analysis perspective. It started on February 26th, it peaks around March 8th, continues at a very high frequency until March 13th, and then it’s mitigated until it eventually dies down on March 19th. But if you look at the data itself, you are seeing the bots change their behavior as we're battling these attacks and are starting to introduce a little bit of blocking and friction.
Moderator: Another question from the audience. What responses are made when an attack is detected and still in progress?
Anna: Many of the identifiers that we use are proprietary to what we do. It's part of the fingerprinting process that we do that allows us to identify the tools and scripts that they're using. But as we're working with our customers, when we isolate a behavior or a feature of an attacker, we validate that it does not affect any normal users. This is to make sure that there will not be any user impact from these types of attacks. But as you may understand, too, is that in these attack scenarios, companies need to be able to adapt to a more maybe aggressive security posture.
There has to be a little bit more acceptance for false positives if you are under such a big attack. And so we're working very closely with our customers to communicate that and get them to understand where there might be user impact, but we work very hard to make sure there isn't. And for each of these rules that we isolate, we usually go in and suggest it to them. We talk to their incident response team and we say, "Hey, this is what we're seeing. What do you want us to do with this?" And with large-scale attacks like this, most people just want them to drop the requests, which means that you don't even serve content back to the user.
You just completely remove the requests because it puts such a high strain on their servers. It's a huge cost, not only because you are potentially losing from a PR perspective and from a user safety perspective from these actual gift cards, but you're also paying for all of the requests that these bots are generating to your website. A large load like this is expensive. A lot of our customers just want to drop the requests during an attack like this. They want to remove the load from the web servers so that it doesn't become an additional cost for them.
But when we isolate behavior that might not necessarily be as clearly and definitely bot behavior, we suggest things like introducing a CAPTCHA or other means. We've partnered with a very successful and specific CAPTCHA company who developed CAPTCHAs that are very difficult for programs to solve, but much easier for humans to do. And so we make sure to present information and CAPTCHAs, and other things that we can to refer to as “friction,” any smaller things that make it more difficult for the bots to access the information.
When we deploy such a rule, we immediately analyze the output and the impact of that rule, and we start to ask questions. How many people are attempting to solve the CAPTCHA? Is there a large number of users who do solve the CAPTCHA? If that’s the case, then we should not activate this rule at a more aggressive setting. We shouldn't be dropping these requests because there is a chance that there is a user behind that. In addition, a lot of companies do different things. They introduce they make the connections slower for the bots, which isn't usually something that users would recognize, but it would make it much more costly and difficult for the bots to get there.
What some people do, and in this case that wouldn't really be applicable, but some people choose to show different information on their websites when a bot is detected. So for people who have problem with price scrapers, they sometimes make the decision to show false prices to identify bots. And we always recommend to have a broad strategy and a dynamic strategy.
And then for suspicious bots, perhaps you introduce a CAPTCHA, and you make sure to always change that if and when the bots are evolving, and always trying to do different things because it is an ever-evolving bot landscape, and it's necessary to battle the problem through changing the responses that we give.
Moderator: How difficult is it to recover the money lost from a stolen gift card?
Anna: That would be more or less impossible. What we can do is help our customers correlate data in their logs to say, "Hey, these are the identifiers that we've seen connected to these cards. Cancel them." Or, "Reissue them." But on a such a large scale as this, it's very difficult for them to know how successful these attacks were, because half of these requests might not be functioning gift cards. They're testing all these different numbers, but we can't see whether or not they're actually successful in retrieving that information back because of the way these things are designed.
In general, this problem that we're solving is a pretty new problem. People generally think a lot about security and security vulnerabilities, but they don't think about how these services and resources that they provide on their websites can be abused even if it's through the way that it was intended.
Companies may think, “Okay, we built this, it works, and there are no new known security vulnerabilities. No one should be able to inject code into this." But they may not think about it as, "Hey, what if someone spits an automated program and does this 400,000 times to just check and see if they can get some cash back?"
I think there are a lot of different design portions to this, where you can make it more and more difficult for these bots, instead of just having the resource readily available in the very–I would say from a hacker's perspective–inviting way. It's almost literally, "Come take our money." And there would have been certain steps that they could have taken to reduce that threat.
Moderator: Can you use biometrics, such as facial recognition, to authenticate a request? And are there any added risks to using this approach?
Anna: I’ll try not to go too deep into how that technology works. The problem is that no matter what you do, like if you're a human taking a picture of yourself on your webcam, that is still an image that is then sent to the server and authenticated in a way that you can automate. The portion of the actual picture-taking process isn't difficult to reproduce in an automated sense. So I wouldn't say that it necessarily does help in making it more secure even if it adds any type of friction, or any type of thing that makes it more difficult for the bots to access their information, makes other people more likely targets.
Let’s say you have a security system in your house. A burglar most likely will go to your neighbors’ house if they don't have a security system instead of your house because it's easier. So if you introduce things like biometrics or two-factor authentication, you're less likely to be targeted because you're not as easy to abuse or attack.
Moderator: Why do some companies decide to store less customer data?
Anna: There are a lot of new companies out there that talk about things like password-less logins and other things like it, and those are really good ways to go about it. If you don't store the passwords, you're not a target because you don't have that information. It's all about looking at it holistically and trying to identify any portion of it that would remove you as a target. And on that note, I wanted to do a little bit of a comparison here.
In this case, the attackers are people on the other end who are trying to make money. Instead of simple and advanced, you can look at it as cheap and expensive. So for these attackers, for every piece of friction that you introduce, we start introducing much more difficult things to solve. It becomes more and more expensive for the attackers to continue.
Sometimes we get the question, "Yeah, but this thing you're going to do, is this going to block all the bots?" And we say, "Well, we hope so, but we might not be able to with these first three things that we want to introduce." Look at it as a cost analysis for the attackers. Anything that we do to make it more expensive for them to continue attacking, lowers their incentive to continue attacking you.
Moderator: Another question for you, Anna. Is it better to find the single silver bullet defense against bots, or is it better to have smaller points of friction and protective measures in place?
Anna: It's about introducing smaller portions wherever you're able to. And some companies have a continuous, really advanced bot problem. The money that people make from abusing their websites is enormous, and so they need to have all these really, really good expensive systems in place. We have some customers who use four different vendors that combine to try and solve their problems from various perspectives because it's just so expensive. We work with airlines where it's up to $40,000 a day in revenue for these attackers.
It's loss of revenue for the airline. So it's easier for them, internally, to kind of backup the claim that they need all these services. But, like you said, you do an evaluation of your own business and your risk, and what that would cost you. And then you take all the steps necessary to minimize that risk. Don't unnecessarily store data you don't need. Don't unnecessarily make resources available for abuse. Every time we introduce new functionalities in the digital space, we should be thinking, "What would happen if someone did this 400 times?"
Or, "What is the risk form not only a security flaw perspective but from an abuse perspective?" And so now we've looked at my overly-simple overview graphics. This what will be the attack actually looked like for one of our customers. This just shows the involvement of attackers over time. This green line here shows how many IP addresses were active on the websites at the same time. So even prior to the attack, there's a lot of users. It's about 15,000 IPs per hour throughout the attack and they are increasing and at one point reaching 26,000 IPs per hour for a single domain.
Automating things that can execute on these things, things that can respond with, "Yes, I can render images," this is much more expensive for an attacker to build than just start a little terminal, write the scripts. That doesn't do this, right? And so as we move in here, we introduce more targeted rule sets, but, eventually, we get to a point where this is peaking, right? They're under attack, we need to figure something out that has a lot of impact. And this is where the customer made the decision.
One, where they suddenly could change all of their behavior, and one that restarted. So it stops here, which triggers a restart. And then it starts again a day later, but with the same behavior that is no longer allowed and unchanging. So this is most likely someone who has they sent the script to their distributor, the shady guy who owns the botnet somewhere, and it just continues. It doesn't evolve. It doesn't change. So they probably couldn't make changes to their scripts after they were distributed, which is often the case if you're using these types of services.
And so what we could see in this traffic dive into that involvement a little bit later, they moved into more legitimate networks. They started looking more like normal users. This graph does not represent the amount of which we blocked, but a large amount of this throughout the time was being blocked. This is more representing the change in behaviors that we were seeing. When we introduce that hard set rule, the number of IP addresses spikes immediately. So what they do is that they go, "Oh no, they did something. We don't know what. Let's try and become as distributed as possible, so we seem like as if we're more users, so maybe they don't block us."
But another thing we did was also look at the distribution of ISPs, the internet providers. Initially, you have quite a lot of distribution into things that we wouldn't necessarily see on these websites.
There's hundreds of different ones here, which then normalizes into what would have been considered normal providers. So we use these visualization tools to have the same timeline, but can we split it in a way that helps us isolate what the bad behavior is? Well, we could split this graph to look at the different browsers that they're using. Maybe they're using an outdated version of something, or the screen resolution or anything that we can use to our benefit to block. Being able to look at the same timeline, but in different viewpoints, allows us to isolate the attackers and write rules to block them.
Being able to mitigate spy bot traffic requires you to react both, more or less, on the first requests, the first time you see them. In this case, if they're doing a request to this gift card balance checker, we have to be able to identify them on that one request. Because if they had 200,000 IP addresses, they only need to make 1 per IP.
We have to find the features that help identify them all on the first request. Part of our system does a lot of behavioral modeling and things like that, but in these cases where you have one attacker going after one resource, you need to be able to find those little differences in how they try and interact with that one request that our normal users would show, generally.
Now, let’s talk about the the evolvement of these profiles from a very high level. For the first portion of the attack, we stopped three different distinct profiles: one was running an old Linux system, one was running on a Mac, and one was running on a Windows machine. They showed very similar behaviors. They were rotating through different browser versions, etc. But the behavior was very distinguished.
They began to evolve into actual smartphones as we started blocking them, which is considerably more expensive for these users to pretend to be proper folks. They moved from a shady hosting provider somewhere in the United States to Comcast, and then Verizon, and then to AT&T. And access to those networks–which you have to go through botnets that have been installed on normal home user's computers, most of the time–is very expensive.
They were pouring money into this attack because the margins were just so wide. I’d also like to highlight the fact that they continuously added money to their attack to make sure that they would not be blocked, even though in the end we ended up blocking them anyway. At a high level, for one of the retailers we saw a lot of different devices, user agents, thousands of IP addresses, and millions of requests. So you can see that it’s so difficult to provide a silver bullet solution to these problems because when they are this distributed, it's more or less impossible for any normal security vendor to do something about this.
You can see that this is an important part of the threat model as you introduce new functionality. When you put a service on your website, no matter how great it would be if all the users could do this, you need to always keep in mind that that can be abused. And if you have a problem, you need to get a vendor that can address this problem specifically. It's not a security issue, it's an abuse issue.
They won't spend any time or money to originate from anywhere else because that would be easily detected. If you have or use gift cards, then make sure you check your balances often, and take screenshots as proof. Because if you are the victim of one of these attacks, the company doesn't always know that this has happened.
Don't use unused money on cards. It's not good practice. And for retailers or other e-commerces, introduce any types of friction. Normal CAPTCHA is good for most bots, but not all bots. If your competitor does not have anything in place, then that is a step in the right direction. Try and control how much usage is allowed for these types of resources, but evaluate what your business can afford to do. Always treat it as a balance of usability versus security.
Most importantly, it’s important to remember that this isn’t a security vulnerability. A lot of companies deal with security issues. They look for exploitable resources and other ways they can break the website. This is not breaking the website; it's using the website as it was designed. But we need to control and include that in the way that we think about our resources. There needs to be awareness, and I think that as you have these conversations in your businesses or as part of a decision-making process, there always needs to be that evaluation every time.
What do we gain from presenting this and what do we lose in case someone tries to abuse this? I put together little summary. It is one of the largest server attacks targeting websites at the same time. The total loss and impact is more or less unknown because of the untraceable nature of gift cards. A lot of companies actually put this into their financial models as some kind of leakage. Gift cards are considered something where people buy them and then they don't use them.
It's actually a large portion in the financial planning that they don't know if these are going to be used or not, so that makes them very difficult to know what the actual financial impact was. It's also really, really bad PR for all these retailers that were affected. People couldn't check their balances, and a lot of people lost money. And some people do move away to other retailers. They don't want to come back if they have been a victim of an attack. If you are a retailer in e-commerce that does not provide a level of security for your user's resources, your customers will leave you and you will lose like market share and potential revenue from sales.
Also, I would say that this attack was successful due to design flaws. It's like these resources were readily available without any type of limitations to many of these websites. Again, the guys who called us and said, "Thank you," told us that all of their competitors were down, and they had made these resources and built these websites where there's no thought whatsoever into how to protect their resources.
Moderator: One more question from the audience. Are there any general traits to look for in an attack?
Anna: Yes, there three high levels. One is a lot of these things actually self-identify as bots, so going through logs and see who's telling you that they aren't human, and are they following the guidelines that you put in?
Another is site traversal. When you look at your website, how does a user go through it? Where do they start and how do they get to the thing that's important? Do they go directly to it? Then, perhaps, they aren't a normal user, right? If they go directly to the thing that they need, and there's no way for a normal user to get to that point without going to anything before then, that's a very good indicator it's bad.
And finally, the frequency of requests. That's something that is pretty easy to identify, too, without going into too many details of the deep level of security analysis that we do for our customers, but those three things, I would say, is getting you quite a long way.
About the AuthorMore Content by Edward Roberts