PHP and HTTP HEAD Requests

Filed under: — 4:07 am

The HTTP protocol used by Web servers supports a few different commands. The typical GET command returns a whole Web page, while the HEAD command just returns its headers. Smart clients use this to determine whether a document has been modified before retrieving the whole thing.

I never really considered this, but it turns out that a HEAD request for a dynamic page using PHP will execute the entire script, just like a GET request, by default. This has a couple of implications: first, a script that counts page views may be incorrectly counting HEAD requests. Second, a HEAD request puts the same load on the server as a full GET request, despite not sending the full output of the script.

I found out about this because the W3C Link Checker issued over ten thousand HEAD requests to various pages on SlashNot in a single day. These requests came rapid-fire and nearly crashed the server a couple of times. The link checker racked up a total of 61000 requests over the last week. It still seems to be running a link check every 35 minutes or so, although now with far fewer requests. I don’t know if this is a well-meaning recursive link check gone horribly wrong or someone using it against us on purpose, but we certainly didn’t request any link-checking.

Lessons learned: (1) Check $SERVER[‘REQUESTMETHOD’] in PHP programs and respond appropriately to HEAD requests so they can’t overload the server. (2) Block the W3C-checklink user agent or use robots.txt in case the link checker falls madly in love with our site again.

One response to “PHP and HTTP HEAD Requests”

  1. RedAndy says:

    Thanks, that is exactly what I was looking for to finish off my little caching project (:

(c) 2001-2007 Michael Moncur. All rights reserved, but feel free to quote me.
Powered by WordPress