从 000webhost 泄露的数据来学写 SQL

2,359 阅读2分钟
原文链接: medium.com

Learn SQL by analyzing dump data from 000webhost

Recent data breach on free web hosting company 000webhost has leaked about 13 million user data (customer name, email and plain text passwords).

A hacker used an exploit in old PHP version (LOL!) to upload some files and gain access to system.

I found a .csv dump for about ~1 million users and import it into local mysql database into the table webhost.

So let’s get started with the tasks.

Task 1: 5 most worst passwords

Worst passwords are ones that are repeated multiple times on different account.

Attackers are usually brute-forcing with common used passwords such as 123456.

SQL:

SELECT password, count(*) as cnt FROM webhost GROUP BY password ORDER BY cnt DESC LIMIT 5;

Explanation:

First we select column field “password” and using GROUP BY we can add to the group of password ones that are the same. Our “cnt” is count for each of that group and we are ordering descending (from highest to lower value). Last is the LIMIT telling that we want only 5 records.

The winners:

  1. abc123 = 3059
  2. 123456a = 1615
  3. 123abc = 1326
  4. a123456 = 1162
  5. 123qwe = 1085

Task 2: 5 Common usernames

SQL:

SELECT name, count(*) as cnt FROM webhost GROUP BY name ORDER BY cnt DESC LIMIT 5;

Explanation:

Same as the password, now only using “name” column field.

The winners:

  1. Alex = 3503
  2. Daniel = 2331
  3. David = 2051
  4. John = 2031
  5. Carlos = 1874

Task 3: 5 most used email providers

We can find also, how many users are using same email provider (ex. gmail, yahoo, hotmail etc..)

SQL:

SELECT SUBSTRING_INDEX(email, ‘@’, -1) as sub_email, count(*) as cnt FROM webhost GROUP BY sub_email ORDER BY cnt DESC LIMIT 5;

Explanation:

We use same technic as for 2 task above, but we have problem with the email field that are represent as “username@email_provider.domain”, so we need to extract only “email_provider.domain”. To achieve this, we need to use builtin function SUBSTRING_INDEX. The parametaras are our string we want to cut, then delimiter (this is our “@” ) and from which position (1 from left start to rigth end, -1 from rigth start to left end). We represent this as “sub_email”.

The winners:

  1. gmail.com = 291888
  2. hotmail.com = 199766
  3. yahoo.com = 150888
  4. yahoo.co.id = 20877
  5. mail.ru = 18235

Task 4: Searching domains

I live in Croatia, so my domain is .hr and I want to find out how many email accounts has email provider with domain .hr

SQL:

SELECT distinct(SUBSTRING_INDEX(email, ‘@’, -1)) as sub_email FROM webhost WHERE SUBSTRING_INDEX(email, ‘.’, -1) = ‘hr’;

Explanation:

We don’t need to select all email address, only provider and domain names so we will substring it as explained task before. For WHERE clause, we will cut the string to get only domain name (ex. [email protected] will return com) and match it to our wanted domain that is hr

Result:

77 Croatian email providers

The End

So far, this dump on about ~ 1 million users tells us that using password abc123 is very dumb, Alex is common used name/username and gmail is very used email provider.

This analysis was very fun and if you think same, share it :)