captcha protect your website using Apache’s mod_rewrite to expel Google, Facebook and Co.

Problem:

Yor website has three sources of traffic:

  • SOMEONE: people browsing the web you don’t know
  • FRIENDS: your friends
  • MACHINES: search engines, facebook (when a link is posted, the content of the link is fetched by Facebook), etc.

You want that your FRIENDS have full access to your website, whereas MACHINES should not. The SOMEONEs you don’t particularly care and therefore allowing access as well.

Requirement:

For any defined part of your website you’re asked to enter a Captcha in order to prevent MACHINES to access this data. Your FRIENDS clicking on a hyperlink in facebook should not be asked to enter a captcha to avoid annoyances. SOMEONE else has to enter the Captcha to distinguish them to MACHINES.

Solution:

Prerequisites:
  • the captchas are created using re-captcha
  • apache webserver with mod_rewrite

Locate the config file (e.g. /etc/apache2/sites-available/somedomain.com) and add the following part to your virtual host:

Now the following happens during every request:

When the request uri contains somePrivateStuff or noMachinesShouldSeeThat (the folders you do not want to be accessible by google, fb, etc.), there is no cookie named noauth (actually the string of the key-value pairs of cookies do not match “noauth”), and the request is not pointing to yourwebsite.com/howdy, the request is forwarded to yourwebsite.com/howdy?target=/somePrivateStuff, i.e. presenting a captcha challenge to keep out MACHINES.

Take a look at the /howdy/index.php. Depending whether we have already set the “noauth=IF-ONLY-MACHINES-KNEW-THIS” cookie (note that the cookie is called noauth to stress the point that it is actually no real authentication and provides no real security!), the answered captcha challenge and the referrer of the request, the cookie might be set and the user might be forwarded to the requested ressource.

Now, your website is at least safe from machines as they cannot pass the captcha entry, without annoying your friends as they will not notice this simple way of protecting your website. Copy-paste a hyperlink pointing to a protected directory on Facebook. Facebook will connect to that link to create a preview of the content. You will notice, that the Facebook server will be forwarded to /howdy ! So even though you share information, your data remains in your possession.

 Test:

Assume that http://manuelbaumann.com/gallery is one a directory I don’t want to be accessible by non-humans:

Googling the protected page yields the expected result. The crawler was presented the captcha. Yet, clicking on the link is forwarding you to the correct resource.

Pasting a link to my website on Facebook has the same effect, yet every friend following the link will be presented the information immediately.

 

Important notes:

  1. Note that this is security trough obscurity
  2. I just figured out that recaptcha can be found at http://www.google.com/recaptcha. Google could actually bypass a captcha easily, as they obviously “know” all the captcha challenges.
  3. There is no evidence, that this kind of information-protection works in all kind of conditions. See Disclaimer.

Leave a Reply

Your email address will not be published. Required fields are marked *