4.8 million records from a Hong Kong toy company were compromised.
I suspect we’re all getting a little bit too conditioned to data breaches lately. They’re in the mainstream news on what seems like a daily basis to the point where this is the new normal. Certainly the Ashley Madison debacle took that to a whole new level, but when it comes to our identities being leaked all over the place, it’s just another day on the Web.
When it’s hundreds of thousands of children including their names, genders and birthdates, that’s off the charts. When it includes their parents as well—along with their home addresses—and you can link the two and emphatically say “Here is 9-year-old Mary, I know where she lives and I have other personally identifiable information about her parents (including their password and security question),” I start to run out of superlatives to even describe how bad that is.
This is the background on how this little device and other online assets created by VTech requested deeply personal info from parents about their families which they then lost in a massive data breach.
Breach source, verification, and (attempted) disclosure
Let me set some context first because this is clearly a very serious incident, and it all began when I was contacted by Lorenzo Bicchierai earlier this week. Lorenzo writes for Motherboard and has often approached me for comments on security incidents in the past. This time, he wanted some help verifying a data breach that had allegedly come from VTech and contained millions of customer records. Someone had gotten in touch with him (I assume as they thought it might make a good story) and he was doing his journo due diligence thing.
Lorenzo passed on the data and I checked it out. I found 4.8 million unique customer e-mail addresses in one of the files and it “smelled” good, that is it didn’t have the typical hallmarks that often accompany a fabricated breach. However it wasn’t quite clear where the data had come from, I mean it’s not like you can just go to vtech.com and there’s a login box that tells you whether or not the account exists (incidentally, I did later discover an API that confirms the presence of an e-mail address at login time). I needed further verification, so I invoked the help of some Have I been pwned? (HIBP) subscribers.
In order to verify the data via HIBP, I had to call on some supporters. One of the features I added to HIBP very early on was the ability to subscribe to notifications:
This is a free service that sends you an e-mail if your account pops up in a data breach. To date, 290,000 people have signed up and verified their e-mail addresses (they need to receive an e-mail at that address and click a unique link). Now I’d always intended for this to be a feature that simply notifies people of breaches as appropriate, but I’ve realized lately that it also means I have an excellent source of individuals supporting the project who can help me verify future data breaches as well.
I took the e-mail addresses from the alleged VTech breach and found 18 recent HIBP subscribers who had a comprehensive set of data in the dump. I e-mailed them asking for support, essentially saying that I’d been passed a data breach that included their details and if they were willing to assist, I’d send them some non-sensitive data attributes to verify. This was usually their month of birth, the city they live in and the name of their ISP based on their IP address. All of these attributes were in the data breach and if the HIBP subscriber could confirm them and acknowledge they had a VTech account, I’d be confident it was legitimate.
I received six responses within 24 hours, every one of them confirming their data:
Yes. That’s accurate. I did register at vtech so I could download addons for a toy laptop.
Yes. That’s my data.
no doubt about that, I registered a vtech account within the last few months.
Yes, That looks like legitimate information. The service would be VTech’s Learning Lodge.
Yes, that looks like me. I lived near [redacted city] at the time and my daughter had one of their pads. I believe we logged in so that we could download apps from their app store and possibly for firmware updates etc.
Yes that is correct. It’s an old address, I was with [redacted ISP] at the time so can verify this info ! I would have used the VTECH website for my daughter around that time too !
Yes I did access the VTech learning lodge in 2014 after purchasing a “Cora Cub” for my child. In order to personalize it’s voice activated feature, you had to join the learning lodge.
I was with the broadband provider [redacted ISP] at that time. I have since changed services, unfortunately to TALKTALK!
Can’t help but feel sorry for the last person!
This was more than enough to now have complete confidence in the legitimacy of the data. But before loading it into HIBP, it was essential that VTech be aware of the incident, too, so I pushed Lorenzo on what steps he’d taken. He’s detailed his attempts to get in touch with VTech in the article he published Friday titled “One of the Largest Hacks Yet Exposes Data on Hundreds of Thousands of Kids.”
For many days, he simply couldn’t get anyone to talk with him despite the fact they did actually respond (and redirect him) multiple times. As we discussed this incident in the days following his initial contact, at multiple points we talked about means of getting in touch with them and he reached out via various channels time and time again.
It was reminiscent of my trials with 000webhost last monthand frankly, I’m both staggered and appalled by the negligence these organizations are showing. Data breaches like this can be enormously damaging for both the customers and the online business alike, but while I’m enormously sympathetic to the former, when the latter actively ignores multiple attempts at private disclosure even when they know it relates to a serious security incident, it’s hard to feel too sorry for them.
But to their credit, VTech did eventually respond to Lorenzo and acknowledged that prior to his contact they were not aware of a data breach but have since identified an incident on November 14. This roughly corresponds with the dates in the files, although as I’ll show shortly, there are records allegedly created many days after this. In their response, VTech explains the following:
Upon discovering the breach, we immediately made modifications to the security settings on the site to defend against any further attacks.
Unfortunately, this is insufficient and I’ll explain why shortly. They go on to reassure Lorenzo that financial data is just fine:
It is important to note that no payment card or banking information was obtained. Our database does not contain any credit card information and VTech does not process or store any customer credit card data on the Learning Lodge website. To complete the payment or check-out process of any downloads made on the Learning Lodge website, our customers are directed to a secure, third party payment gateway.
Frankly, I couldn’t care less about credit cards, and as I’ve explained before, these statements are designed to appease the likes of PCI and are of little consequence to consumers when genuinely sensitive things—irreplaceable things—are lost by a company that suffers a data breach. Let’s take a look at just what they lost.
Understanding the data breach
Here’s what was originally provided to Lorenzo:
The file that immediately jumps out is the big guy at the top—parent.csv. This file has 4,862,625 rows and column headings as follows:
One of the first things I look at in a breach like this is how many unique e-mail addresses there are, as it helps establish whether there are duplicate records. In this case there were 4,833,678 occurrences matching the pattern I extract e-mail addresses on, a few less than the total rows, which is normal due to either duplicates, missing addresses or strings that don’t conform to what an e-mail address looks like (I have a pretty liberal regular expression pattern I use).
The next thing I checked was the passwords, and, while the column heading implies they’re encrypted, they’re not. The easiest way to check what’s going on with password storage is just to Google a few of the values stored in the database. For example, let’s take the very first one in the dump: 835af17f41292ba8ea3270f6859757ab
And here it is:
Their password is “welcome81.” It’s that simple. It’s just a straight MD5 hash, not even an attempt at salting or using a decent hashing algorithm. The vast majority of these passwords would be cracked in next to no time; it’s about the next worst thing you do next to no cryptographic protection at all. Speaking of which…
All secret questions and answers are in plain text. The questions are typical (albeit poor) examples such as your favorite color, where you were born, and your first school. In fact, you can see them in context in this screen from a video I’ll show a bit later:
This aligns with the columns from the parent.csv file I referenced earlier and gives me a high degree of confidence it’s at least one of the locations where parents would have entered data.
Normally this would be the end of the story when it comes to processing a data breach. I’d make the data searchable on HIBP, notify impacted subscribers, and that would be it. But it’s a different story this time and it’s because of those member CSV files. Let’s take a closer look.
This will all make more sense if you watch the VTech Kid Connect: Getting Started Tutorial first. Just take a few minutes to understand the workflow when you first set up one of their tablets:
As you watch the video, you’ll see the Learning Lodge appear at the 1 minute mark. You would have seen this mentioned earlier on in the feedback I got from impacted customers, and you can also see a browser-based version of it.
At about the 1 minute 40 mark you’ll see where a child can be set up with a Kid Connect ID:
It then goes on to show how to create a parent account per the earlier image, then a little bit later, shows how to create an avatar by taking a photo of your child. This is all consistent with the data that then appears in the master_account.sql file as follows:
This is a self-referencing table, that is, it has keys that refer back to itself. What it then means is that you get one record like this for the parent including their e-mail address with an encoded @ symbol:
You can see how the parent_id of the child – 2700413 – relates back to the ID of the parent. This is not just a relational parent/child structure, it’s a literal real-world biological implementation of that. The parent’s record can then be pulled from the parent CSV file I mentioned earlier on and contains all the values such as address, password, and security questions.
So that establishes parents and kids, but the latter doesn’t have much data here, only a parent-created username which (fortunately) doesn’t tell you much about them. However, those member CSV files are altogether more worrying; here’s what’s in them:
There are 227,622 records in those five CSV files, and yes, those columns are exactly what they look like – names, birth dates, and genders, among other things. But where did they come from? I mean they’re not in the earlier Learning Lodge video, what other assets might VTech have that could collect this sort of data? The answer lies in the registration_url column, and it includes these values:
That’s ordered from the most popular down with 77k instances of www.planetvtech.com down to only 41 examples of es.vsmilelink.com. Each of these is hosted on the same IP address and has the same fundamental implementation; a single Flash file embedded in an ASP.NET page. Clearly they cater for different demographics in terms of kids’ interests and language preferences. Here’s how the most popular one looks (incidentally, all entries in the data breach for this domain were in a file called memberuk.csv, which appears to tie back to the United Kingdom):
No, I didn’t make up the cyber spies bit! Each of the implementations above provides the ability for parents to register on the site:
After registering, there’s a confirmation e-mail and then the ability to start adding kids:
Now we see name, birth day, and gender, the sensitive personally identifiable fields consistent with what was disclosed in the leak. It goes on to request an optional “Citizen ID Registration,” which I don’t have but the pattern matches the product_code column in the member’s file from the data dump (60 percent of the records have a null value in this column). A bit of Googling suggests that this is available for purchasers of VTech products:
This is an important thing to note; products such as the Cyber Rocket are physical products (like the InnoTab at the start of this blog post), which then encourage the creation of digital accounts. This creates a relationship between physical and digital assets by virtue of the Citizen ID.
Further verification that the kids’ data in the form above is what goes into the member tables comes by looking at the ID the new record is assigned. Here’s the response after logging in:
I’m returned an internal identifier of 130,460. The last ID in the dumped member data is 130,446, and it was created on November 16, about ten days before the time of writing. The sequence checks out, and, remember, that table has the registration_url column stating that most of the records came from www.planetvtech.com. The point is that it establishes a high degree of confidence as to where the data in the breach may have originally been entered.
Now here’s where I need to be intentionally vague because despite their assurances that their system is now secure, they still have gaping holes that allow every kid to be matched with every parent. The details of this have been passed on to VTech, and I’ll say this much here: there’s no simple fix. The flaws are fundamental, and the recommendation I’ve passed on is to take it offline ASAP until they can fix it properly. You just can’t take chances with other people’s data in this way,especially not when they’re kids.
The average age of kids when their account was created is just five years old. They have the sorts of login names you’d expect a parent to give their children; affectionate “pet names” in many cases. The kids are almost precisely split between girls and boys, and not only has their data already been leaked in this breach, it remains at serious risk due to the implementation of the site.
Major security failings on VTech’s behalf
Let me caveat what I’m about to detail by saying that everything I’m about to share is publicly observable when the systems are used in their intended way. This is all discoverable by using their websites precisely as they were intended to be used, which on the one hand means that it’s easily obtainable information by anyone, yet on the other, means that they could also have readily identified a whole raft of flaws themselves if only they’d looked.
For example, there is no SSL anywhere. All communications are over unencrypted connections including when passwords, parent’s details, and sensitive information about kids is transmitted. These days, we’re well beyond the point of arguing this is ok—it’s not. Those passwords will match many of the parent’s other accounts, and they deserve to be properly protected in transit.
Of course once the passwords hit the database we know they’re protected with nothing more than a straight MD5 hash which is so close to useless for anything but very strong passwords (which people rarely create), they may as well have not even bothered. The kids’ passwords are just plain text, but you could almost argue this is ok insofar as it’s not exactly going to open their eBay account or get attackers into their Gmail. I’m sure some people will vehemently disagree with me on that and insist they should be strongly hashed as well; my point is simply that while they would be easy to protect properly, they don’t present the same risk profile as the passwords of their parents.
Lack of cryptographic protection for sensitive data is yet another example of where it’s all gone wrong. Those security question and answer pairs are irrevocable pieces of personal information used to establish identity in all sorts of different places. By comparison, have a look at how Patreon handled personal data in my recent post on extortion; addresses, birth dates, and other personal info was all encrypted so that when the worst case scenario did eventuate, it wasn’t as bad as it could have been.
Then there are some genuinely alarming practices. For example, here’s the response after I got the password wrong when logging in to the LittleMary account I created earlier on planetvtech.com:
Why they’re returning a SQL statement is absolutely beyond me. Lorenzo was told by the person that provided him with the data that the initial point of compromise was due to a SQL injection attack, and, even without seeing the behavior above, I would have agreed that was the most likely attack vector. On seeing the haphazard way that internal database objects and queries are returned to the user, I’ve no doubt in my mind that SQL injection flaws would be rampant.
The other rampant practice that’s increasingly frowned upon in the security space is the extensive use of Flash. It appears consistently throughout their online assets, and, while it would have made sense many years ago, the continuous stream of security vulnerabilities it’s presented coupled with the lack of mobile support and ready availability of alternative rich UIs via HTML 5 make it an increasingly rare thing to see such a dependency. There’s a sense of “systems from a bygone era” throughout their assets, not just with the Flash dependency, but things like the site still reporting ASP.NET 2.0 which was superseded almost six years ago now (.NET 3 or 3.5 will still report as 2.0 based on the X-AspNet-Version header) and the extensive use of WCF and SOAP services. That’s not to say all of this is security gone bad, more that you get the distinct sense VTech’s assets were created a long time ago and then just… left there.