You are here

web scraping

ivan's picture

Bypassing the "testcookie" anti-webscraping protection

A few days ago, I noticed that ApkTrack (an Android app I maintain) could no longer query one of the websites it usually obtains data from.
The app works mostly through web scraping and once in a while, the target websites set up new countermeasures to prevent bots from accessing their contents (even innocuous bots such as this app). In this post, we'll see how the protection I encountered this week-end was bypassed.
Subscribe to RSS - web scraping
Error | Borderline

Error

Error message

  • Warning: Cannot modify header information - headers already sent by (output started at /var/blog.kwiatkowski.fr/includes/common.inc:2821) in drupal_send_headers() (line 1551 of /var/blog.kwiatkowski.fr/includes/bootstrap.inc).
  • Error: Call to undefined function each() in SMTP->Data() (line 393 of /var/blog.kwiatkowski.fr/sites/all/modules/smtp/smtp.transport.inc).
The website encountered an unexpected error. Please try again later.