Rowan's World, et Cetera

Cached Redirects Considered Harmful (and how browsers can fix them)

by Rowan on 9 October, 2011

There are a lot of URLs out there on the Web; and a pretty big number of those URLs are either alternative names for something, or old locations that have been superseded. So “redirects” from one URL to another are a common feature of the web, and have been for many years. But recently, the way these redirects behave has been changing, because performance-conscious browser developers have started caching redirects, rather than re-requesting them from the server every time.

In theory, this makes perfect sense, but in practice, it causes web developers like me a lot of pain, because nothing “permanent” is actually that permanent. I’m not saying no browser should ever cache a redirect, but I do have a few suggestions of ways they could be a little more helpful about it.

The Technology

Let’s be clear what we’re talking about – the HTTP/1.1 specification defines a class of status codes grouped as “Redirection” statuses, in the range 3xx. The main candidate for browser caching is status code 301, which the specification labels “Moved Permanently”.1 The specification specifically states that “this response is cacheable unless indicated otherwise” – that is, unless the response also includes headers specifically relating to cache control, a User Agent can assume that it’s OK to cache this response and not re-request the original URL. It doesn’t go into any more detail, and neither this nor the next sentence (about “clients with link editing capabilities”) is couched in the standard RFC form of “User Agents MAY …” (let alone “SHOULD”), but the clear message is “this old URI is irrelevant, just use the new one”.

So, newer versions of Firefox, Chrome, and Internet Explorer have all started “obeying” this part of the standard, and opting to cache all HTTP 301 responses if not told otherwise.

The Problem

So, what’s the big problem? Well, the fact that a “resource has been assigned a new permanent URI” doesn’t mean that there won’t be some point in time where someone wants to use the old URI for something else. Worse, people have an unhelpful habit of changing their decisions later.

Consider this scenario:

  1. A  company has a product called a “Thingummy™”, with a description at http://example.com/thingummy/
  2. They decide that the name is too unwieldy, so re-brand it as “Widget™”. Knowing that cool URIs don’t change, the developers permanently redirect the old URL to the new page about the same product, at http://example.com/widget/ Old links remain valid, and direct customers to the information they were looking for.
  3. A couple of years down the line, the company decides to boost sales by releasing a “classic” version of the Widget™ under the old Thingummy™ brand. They put up a new page at http://example.com/thingummy/ Sadly, some customers continue being redirected to http://example.com/widget/ by their browsers’ cache.
  4. With Thingummies™ massively out-selling Widgets™, the company comes full circle, and abandons the new brand. The developers put in a redirect that permanently points http://example.com/widget/ back to http://example.com/thingummy/. At this point, all hell breaks loose. Well, maybe not, but customers trying to access the product page find mysterious messages on their screen about “infinite redirects”. Management are not impressed.

How Permanent is “Permanently”?

The biggest problem in all this is that developers seem to have taken the word “permanent” rather too literally. The Random House dictionary2 lists 4 definitions of permanent; the first is:

existing perpetually; everlasting, especially without significant change

In theory, this seems a reasonable definition, but in reality nothing lasts forever. Nothing physical, nothing electronic, and certainly not the structure of a website. So clearly, when the HTTP specification says “a new permanent URI”, it doesn’t actually expect you to guarantee its perpetual existence. Let’s try the second definition:

intended to exist or function for a long, indefinite period without regard to unforeseeable conditions

Ah, now that’s more like it – by that definition, a 301 status indicates a redirect which will remain valid for “a long, indefinite period”, unless there are “unforeseeable conditions”.

I did a quick test earlier, using Firefox 7 to visit a URL which returned a 301 status and no cache instructions, then inspecting the resulting cache entry using the built-in “about:cache page”. Among the details is this – “expires: No expiration time”. That’s pretty permanent, by the first definition.

Effectively, having once seen that 301 response, it is not going to request the original URL ever, simply because the web developer didn’t mention an expiry date, because they didn’t foresee any conditions in which they would want to change the decision. (And I thought 24-hour TTLs on DNS entries were annoying…)

→ Suggestion 1: Cache entries for an unadorned 301 response should have a reasonable default life time, not last forever.

→ Filed as Mozilla Bugzilla Bug 696595

Chasing your own Tail

If you’re foolish enough to have created an immortal redirect, or unlucky enough to have inherited one, you might find yourself wanting to put a new redirect pointing back the other way, as in my example above. But if you do, any browser which saw (and cached) the old redirect will simply see the new one as well, and follow both back and forth, back and forth, back and forth … until eventually it decides there’s an infinite loop and throws the user an error.

So instead, you have to resort to all sorts of confusing workarounds to keep everything working.

But the browser knows something is wrong, and it’s not the user’s fault, it’s bad data in the browser cache (depending how you look at it, that’s the web developers fault, but a browser that doesn’t cut developers some slack won’t get very far rendering real-world HTML…)

→ Suggestion 2: When an infinite redirect is detected, try skipping the cache.

→ Filed as Mozilla Bugzilla Bug 696646

Escaping the Trap

The first hurdle for developers is that it’s not obvious what the hell is going on, even when they’re just testing out a few different URL schemes in their development copy. I think whatever unacceptable language this developer used on MozillaZine probably sums up the feeling most of us had when first encountering this invisible magic.

But even once you’ve figured it out, it’s not that obvious what to do about it – do you do a deep clean of the browser’s cache every time something’s not quite right? I see a sledgehammer approaching a nut. And if you’ve released your mistake to a non-technical user, perhaps on a preview version of the site, you’re going to have to talk them through this process as well.

Caching is a pain sometimes, but it’s been around for a long time, and we have UIs for dealing with it. The most common of these is the “hard refresh” – hold down Ctrl, or Shift, and hit the reload button or keyboard shortcut, and the cache is by-passed completely and content is reloaded. Brilliant. Oh, but it only reloads the page you’re looking at, not the URI you originally requested, so it’s useless for cached redirects.

→ Suggestion 3: Make a “hard refresh” request the original URI navigated to, not the one currently being viewed.

→ Filed as Mozilla Bugzilla RFE 696650

Doing the Right Thing

Apparently, what we’re all doing wrong, as developers, is not sending appropriate cache headers along with the 301 status code. If you’re writing, say, a PHP script, and using header('Location: foo'), you should probably be doing it in some kind of wrapper function, so you can make up your own default expiry, and make sure you send a whole bunch of control headers whenever you redirect.

But a lot of redirects are not written in a rich server-side scripting language, they use specific tools built into the web server, like Apache’s incredibly powerful mod_rewrite. I just checked, and there is no function in the latest version of mod_rewrite that lets you control caching, or send arbitrary HTTP headers, when it generates a 301 response. I’m sure there are ways of stringing together a whole bunch of Apache directives to achieve the desired effect, but it would take me a while to come up with the right combination, and it would look a mess.

→ Suggestion 4: Web server redirect functions, such as mod_rewrite, should build in control over caching headers.

→ Posted to Apache HTTPD dev list

Inconsistent Behaviour

Finally, let’s assume we’re using a tool for our redirects that lets us control the cache headers, and we don’t have any old immortal redirects to avoid infinite loops with. All is fine, and the browsers will do the right thing and save us some network traffic, right?

Well, according to this test result table from someone who is in favour of redirect caching, probably not. Like everything out there in web land, you have to deal with every browser handling things just a little bit differently. Combinations which should work might or might not actually be reliable across browsers.

Even client-side caching of standard resources is, as one comment I just came upon puts it “a bit of a black art”

→ Suggestion 5: Somebody3 needs to work out what the most common scenarios for redirect caching actually are, and how to achieve them.

Conclusion

My gut instinct the first, second, and twelfth times I ran up against redirect caching was that it was stupid, and the browser developers should take it out immediately. But I do see that it makes sense, and could save a lot of unnecessary network traffic and server load. So instead, all I ask is that people don’t brush the problems off with “oh, well, you should have known what permanent means”, and look at how to make redirect caching work in the real world.


Update 2011-10-23: Bugs / RFEs filed against Firefox and Apache. I am less certain of how to test and report other browsers’ behaviour.

Footnotes

  1. On a pedantic note, while that is the heading in the RFC, and the suggested human-readable “Reason Phrase”, it is not actually the definition of the status, as is sometimes implied when discussing it. []
  2. by which I mean dictionary.com, because I couldn’t be bothered to go downstairs and find the OED []
  3. Sorry, not me – I’m one of the people who needs the crib sheet, not one who can write it!  []

13 thoughts on “Cached Redirects Considered Harmful (and how browsers can fix them)

  1. Why don’t you just use 302 redirect’s rather? They give you exactly what you want and avoid issues like the problem scenario you have described.

  2. Lars Janssen says:

    Interesting reading and a great caveat if nothing else.

    Suggestion 1:

    Whether its “permanent” or “indefinite”, I guess it lasts until the browser cache is cleared. That could be the lifetime of the device (computer, phone) in some cases.

    What is “reasonable” though? DNS records also vary wildly, and they all have a TTL so are more analogous to using HTTP redirects with an explicit expires.

    Not sure.

    Suggestion 2:

    Sounds like a good idea.

    I guess there is a possibility of side effects, e.g. if an application were to do a form submission (not necessarily POST method), then do the redirect, and the user somehow gets back to that URL.

    The current browser behaviour effectively forces the user to abandon the, possibly broken, website, but is unlikely to have side effects.

    Not sure.

    Suggestion 3:

    There could be a whole chain of redirects. I guess it’s possible to go back to the start of the chain and hard refresh them all.

    This could result in the user ‘hard refreshing’ a page and then ending up on a completely different site.

    Not sure.

    Suggestion 4:

    Sounds like a good plan. I think developer awareness and control of this situation, and a reduced chance of accidentally making things permanent, can only be a good thing.

    Agreed.

    Suggestion 5:

    Not me either. ;)

  3. Rowan says:

    @Robert MacLean:

    Are you suggesting that nobody should use 301 redirects ever? I don’t think anyone can truly say that they know for certain that the new URI they are directing to is going to “exist perpetually”. What if control of the entire domain passes to a different organisation?

    There are plenty of valid reasons to use 301 redirects, including the fact that indexers such as Google will handle them differently. Even on a “semantic” level, saying that the redirect I’m setting up might conceivably change in the distant future is not the same as saying that it is “temporary”, so a 302 status does not meet my requirements at all.

    Any solution that says that web admins need to change their code also ignores the long-established fact that browsers have to work with the web as it already exists. There must be millions of redirects out there affected by this change that can’t or won’t be “fixed”, or for which fixing would now already be too late as users already have them cached.

  4. Rowan says:

    @Lars

    Suggestion 1:

    Even if we accept that the standard says that a 301 response without caching headers can be considered eternal (and there is plenty of room for debate on that point); and even if changing code to produce redirects with cache control was trivial (which it is not, for instance, with mod_rewrite); even then, the fact is that this is not the current behaviour of websites in the Real World.

    Changing the behaviour of code which previously wasn’t cached at all to being cached for the lifetime of a user’s browser profile is a pretty major change. Again, I can’t actually think of a scenario where an eternal redirect would even be desirable, but it should definitely be something that a developer opts into, not out of. I can’t think of any other example of a cache that’s quite so immortal.

    My suggestion is just to pick a lifetime somewhere below infinity for these “legacy” redirects which don’t specify an expiry. Even expiring after a year would give developers a cut-off date after which they could rely on “bad” redirects having expired.

    Suggestion 2:

    Interesting point, but whatever side effects this would have would also happen in browsers which don’t cache redirects. Any site specifically relying on redirect caching behaviour would break for IE<9, Firefox<5, etc. I’m therefore confident no such site exists (or is worth worrying about).

    Suggestion 3:

    Yes, it’s definitely a change in behaviour. Personally, I think that if you’re specifically opting for a hard refresh, going back and requesting the URI you originally navigated to is probably a good thing. It would certainly need careful implementation, though, in case it created any new side effects. Of all of my suggestions, I consider this the least likely to be implemented, which is a shame, because it would make testing Apache configurations a lot less painful!

  5. “Are you suggesting that nobody should use 301 redirects ever?” – I did not suggest or imply that. I suggested you want the functionality of a 302 on a 301.

    “There are plenty of valid reasons to use 301 redirects, including the fact that indexers such as Google will handle them differently.” – This is not something you pointed out in the article. Your article merely looked at the problems it caused you and the suggestions you have for it. If there are good reasons for a 301 you should state them and provide a balanced view.

    “Any solution that says that web admins need to change their code” – Maybe they should’ve thought about the implications beforehand, as you have clearly and correctly done and made the better choice for them. Trying to fix ignorance is not a technical issue.

  6. Rowan says:

    Hi Robert,

    Perhaps I worded that rather more confrontationally than was necessary. What I was trying to say was that the points I’ve raised don’t just apply to 301 redirects under a particular scenario that I have encountered, they apply to *any* use of HTTP status 301. It is not the case that developers only use status 301 because they are not aware of status 302, so clearly they are choosing it for some reason, based on their understanding of the differences between the two.

    I considered the advantages of status 301 to be somewhat beside the point – either it has a use, and the issues need exploring, or it has none, can safely be removed from the HTTP spec, and the whole issue is moot.

    > “Maybe they should’ve thought about the implications beforehand”

    Again, this assumes that web developers have had it “wrong” for years, and Firefox is now “right”.

    The HTTP standard has included status code 301 since at least 1996 (http://www.w3.org/Protocols/HTTP/1.0/spec.html#Code301). [Interestingly, the statement that “this response is cacheable unless indicated otherwise” is not in that spec, so was presumably added when drafting HTTP/1.1.] As far as I know, until 2010, no major browser cached 301 redirect information in the way we’re discussing.

    During that 14 year period, developers therefore didn’t think about caching implications because there were none. Even if the spec had unambiguously stated that an unqualified 301 response was liable to eternal caching (which it does not), common practice does not include sending cache-control headers with every redirect, because doing so would have made no difference in practice.

    > “Trying to fix ignorance is not a technical issue.”

    No, but compatibility with existing implementations is. If a browser came out that refused to render any web page that did not strictly adhere to its declared doctype, it would be considered broken, even though it was adhering to well-established specifications. Fixing every 301 redirect in the world is not a trivial task, however desirable, so dealing with the ones that are there is unavoidable.

    [Sorry if this is a bit long and rambling, I’m not very good at being concise!]

  7. Dominick says:

    There is no logical reason – EVER – to make it so that the owner of a domain no longer has control of an asset, even if by accident or misinterpretation.

    Why is there no override? What sense can it possibly make to provide no override whatsoever?

    Amazing… and disappointing…

  8. harry says:

    > “There is no logical reason – EVER – to make it so that the owner of a domain no longer has control of an asset”
    what a load of rubbish, what about url shorteners, these are exactly the reason that 301 created and shows developers using it properly.

    • Shayne O says:

      Erm, domain shortners usually use 302 redirects because 301 redirects essentially break the model by cutting the shortner service out of the picture, and redirect services actually make their money by monetizing the redirect clicks.

      • Rowan says:

        I just did a quick test on a couple of random URLs from my history, and bit.ly sent a 301 (but with “Cache-Control: private; max-age=90”, so not an immortal one as discussed on this page), while tinyurl.com actually rendered an HTML page with a meta refresh header (and a whole load of JS).

        Both allow “monetizing” (by which I assume you mean “counting and charging for”) clicks to within a reasonable tolerance. Note that with the correct caching headers, a 301 and 302 will behave exactly the same in terms of how often a browser re-requests the resource.

  9. Rowan says:

    I don’t follow. Why would it be OK for the owner of tinyurl.com (the domain of the shortened URL) to lose control of that redirect (the asset in question)? The owner of the target domain doesn’t have any more control of the incoming redirect than any other inbound link.

    And given that the mnemonic in the spec is “Moved Permanently”, I don’t think URL shortening was “the reason that 301 was created”.

    I believe some URL shorteners also allow editing of the redirect, so would need to be careful of cache control!

  10. Mike says:

    I’m with Rowan and Dominick on this. I’m not following harry’s argument. Dominick’s assertion makes perfect sense.

  11. Mike says:

    For those of you who use mod_rewrite I’ve found a great workaround! (works for me).

    http://mark.koli.ch/2010/12/set-cache-control-and-expires-headers-on-a-redirect-with-mod-rewrite.html

    Yeah! Of course this doesn’t fix existing problems, but at least moving forward I’ve found a new best practice that I’ll be following.

Leave a Reply

Your email address will not be published. Required fields are marked *