{"id":200,"date":"2011-10-09T21:57:18","date_gmt":"2011-10-09T20:57:18","guid":{"rendered":"https:\/\/rwec.co.uk\/blog\/?p=200"},"modified":"2011-10-23T19:01:51","modified_gmt":"2011-10-23T18:01:51","slug":"cached-redirects-considered-harmful","status":"publish","type":"post","link":"https:\/\/rwec.co.uk\/blog\/2011\/10\/cached-redirects-considered-harmful\/","title":{"rendered":"Cached Redirects Considered Harmful (and how browsers can fix them)"},"content":{"rendered":"<p>There are a lot of URLs out there on the Web; and a pretty big number of those URLs are either alternative names for something, or old locations that have been superseded. So &#8220;redirects&#8221; from one URL to another are a common feature of the web, and have been for many years. But recently, the way these redirects behave has been changing, because performance-conscious browser developers have started caching redirects, rather than re-requesting them from the server every time.<\/p>\n<p>In theory, this makes perfect sense, but in practice, it causes web developers like me a lot of pain, because nothing &#8220;permanent&#8221; is actually <em>that<\/em>\u00a0permanent. I&#8217;m not saying no browser should ever cache a redirect, but I do have a few suggestions of ways they could be a little more helpful about it.<br \/>\n<!--more--><\/p>\n<h2>The Technology<\/h2>\n<p>Let&#8217;s be clear what we&#8217;re talking about &#8211; <a href=\"http:\/\/www.w3.org\/Protocols\/rfc2616\/rfc2616-sec10.html#sec10.3\">the HTTP\/1.1 specification defines a class of status codes grouped as &#8220;Redirection&#8221; statuses<\/a>, in the range 3xx. The main candidate for browser caching is status code 301, which the specification labels &#8220;Moved Permanently&#8221;. (( On a pedantic note, while that is the heading in the RFC, and the suggested human-readable <a href=\"http:\/\/www.w3.org\/Protocols\/rfc2616\/rfc2616-sec6.html#sec6.1.1\">&#8220;Reason Phrase&#8221;<\/a>, it is not actually the definition of the status, as is sometimes implied when discussing it. ))\u00a0The specification specifically states that &#8220;this response is cacheable unless indicated otherwise&#8221; &#8211; that is, unless the response also includes headers specifically relating to cache control, a User Agent can assume that it&#8217;s OK to cache this response and not re-request the original URL. It doesn&#8217;t go into any more detail, and neither this nor the next sentence (about\u00a0&#8220;clients with link editing capabilities&#8221;)\u00a0is couched in the standard RFC form of &#8220;User Agents MAY &#8230;&#8221; (let alone &#8220;SHOULD&#8221;), but the clear message is &#8220;this old URI is irrelevant, just use the new one&#8221;.<\/p>\n<p>So, newer versions of Firefox, Chrome, and <a href=\"http:\/\/blogs.msdn.com\/b\/ie\/archive\/2010\/07\/14\/caching-improvements-in-internet-explorer-9.aspx\">Internet Explorer<\/a> have all started &#8220;obeying&#8221; this part of the standard, and opting to cache all HTTP 301 responses if not told otherwise.<\/p>\n<h2>The Problem<\/h2>\n<p>So, what&#8217;s the big problem? Well, the fact that a &#8220;resource has been assigned a new permanent URI&#8221; doesn&#8217;t mean that there won&#8217;t be some point in time where someone wants to use the old URI for something else. Worse, people have an unhelpful habit of changing their decisions later.<\/p>\n<p>Consider this scenario:<\/p>\n<ol>\n<li>A \u00a0company has a product called a &#8220;Thingummy\u2122&#8221;, with a description at http:\/\/example.com\/thingummy\/<\/li>\n<li>They decide that the name is too unwieldy, so re-brand it as &#8220;Widget\u2122&#8221;. Knowing that <a href=\"http:\/\/www.w3.org\/Provider\/Style\/URI\">cool URIs don&#8217;t change<\/a>, the developers permanently redirect the old URL to the new page about the same product, at http:\/\/example.com\/widget\/ Old links remain valid, and direct customers to the information they were looking for.<\/li>\n<li>A couple of years down the line, the company decides to boost sales by releasing a &#8220;classic&#8221; version of the Widget\u2122 under the old Thingummy\u2122 brand. They put up a new page at\u00a0http:\/\/example.com\/thingummy\/\u00a0Sadly, some customers continue being redirected to\u00a0http:\/\/example.com\/widget\/ by their browsers&#8217; cache.<\/li>\n<li>With Thingummies\u2122 massively out-selling Widgets\u2122, the company comes full circle, and abandons the new brand. The developers put in a redirect that permanently points\u00a0http:\/\/example.com\/widget\/ back to http:\/\/example.com\/thingummy\/. At this point, <strong>all hell breaks loose<\/strong>. Well, maybe not, but customers trying to access the product page find mysterious messages on their screen about &#8220;infinite redirects&#8221;. Management are not impressed.<\/li>\n<\/ol>\n<div>\n<h2>How Permanent is &#8220;Permanently&#8221;?<\/h2>\n<p>The biggest problem in all this is that developers seem to have taken the word &#8220;permanent&#8221; rather too literally. The Random House dictionary (( by which I mean <a href=\"http:\/\/dictionary.reference.com\/browse\/permanent\">dictionary.com<\/a>, because I couldn&#8217;t be bothered to go downstairs and find the OED )) lists 4 definitions of permanent; the first is:<\/p>\n<blockquote><p>existing\u00a0perpetually;\u00a0everlasting,\u00a0especially\u00a0without significant\u00a0change<\/p><\/blockquote>\n<p>In theory, this seems a reasonable definition, but in reality <strong>nothing lasts forever<\/strong>. Nothing physical, nothing electronic, and certainly not the structure of a website. So clearly, when the HTTP specification says &#8220;a new permanent URI&#8221;, it doesn&#8217;t actually expect you to guarantee its perpetual existence. Let&#8217;s try the second definition:<\/p>\n<blockquote><p>intended\u00a0to\u00a0exist\u00a0or\u00a0function\u00a0for\u00a0a\u00a0long,\u00a0indefinite\u00a0period without\u00a0regard\u00a0to\u00a0unforeseeable\u00a0conditions<\/p><\/blockquote>\n<p>Ah, now that&#8217;s more like it &#8211; by that definition, a 301 status indicates a redirect which will remain valid for &#8220;a long, indefinite period&#8221;, unless there are &#8220;unforeseeable\u00a0conditions&#8221;.<\/p>\n<p>I did a quick test earlier, using Firefox 7 to visit a URL which returned a 301 status and no cache instructions, then inspecting the resulting cache entry using the built-in &#8220;about:cache page&#8221;. Among the details is this &#8211; &#8220;expires: No expiration time&#8221;. That&#8217;s pretty permanent, by the first definition.<\/p>\n<p>Effectively, having once seen that 301 response, it is not going to request the original URL <strong>ever<\/strong>, simply because the web developer didn&#8217;t mention an expiry date, because they didn&#8217;t <em>foresee<\/em>\u00a0any conditions in which they would want to change the decision. (And I thought 24-hour <abbr title=\"Time To Live\">TTL<\/abbr>s on <abbr title=\"Domain Name System\">DNS<\/abbr> entries were annoying&#8230;)<\/p>\n<p style=\"padding-left: 30px;\"><span style=\"color: #008000;\"><strong>&rarr; Suggestion 1: Cache entries for an unadorned 301 response should have a reasonable default life time, not last forever.<\/strong><\/span><\/p>\n<p style=\"padding-left: 30px;\"><span style=\"color: #800000;\">&rarr; Filed as <a href=\"https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=696595\">Mozilla Bugzilla Bug 696595<\/a><\/span><\/p>\n<h2>Chasing your own Tail<\/h2>\n<p>If you&#8217;re foolish enough to have created an immortal redirect, or unlucky enough to have inherited one, you might find yourself wanting to put a new redirect pointing back the other way, as in my example above. But if you do, any browser which saw (and cached) the old redirect will simply see the new one <em>as well<\/em>, and follow both back and forth, back and forth, back and forth &#8230; until eventually it decides there&#8217;s an infinite loop and throws the user an error.<\/p>\n<p>So instead, you have to resort to all sorts of <a href=\"http:\/\/getluky.net\/2010\/12\/14\/301-redirects-cannot-be-undon\/\">confusing workarounds to keep everything working<\/a>.<\/p>\n<p>But the browser <em>knows<\/em>\u00a0something is wrong, and it&#8217;s not the user&#8217;s fault, it&#8217;s bad data in the browser cache (depending how you look at it, that&#8217;s the web developers fault, but a browser that doesn&#8217;t cut developers some slack won&#8217;t get very far rendering real-world HTML&#8230;)<\/p>\n<p style=\"padding-left: 30px;\"><strong><span style=\"color: #008000;\">&rarr; Suggestion\u00a02: When an infinite redirect is detected, try skipping the cache.<\/span><\/strong><\/p>\n<p style=\"padding-left: 30px;\"><span style=\"color: #800000;\">&rarr; Filed as <a href=\"https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=696646\">Mozilla Bugzilla Bug 696646<\/a><\/span><\/p>\n<\/div>\n<h2>Escaping the Trap<\/h2>\n<p>The first hurdle for developers is that it&#8217;s not obvious\u00a0what the hell is going on, even when they&#8217;re just testing out a few different URL schemes in their development copy. I think\u00a0<a href=\"http:\/\/forums.mozillazine.org\/viewtopic.php?f=38&amp;t=621842\">whatever unacceptable language this developer used on MozillaZine<\/a>\u00a0probably sums up the feeling most of us had when first encountering this invisible magic.<\/p>\n<p>But even once you&#8217;ve figured it out, it&#8217;s not that obvious what to do about it &#8211; <a href=\"http:\/\/www.sadev.co.za\/content\/redirected-down-one-way-clearing-internet-explorer-host-redirect-cache\">do you do a deep clean of the browser&#8217;s cache every time something&#8217;s not quite right?<\/a>\u00a0I see a sledgehammer approaching a nut. And if you&#8217;ve released your mistake to a non-technical user, perhaps on a preview version of the site, you&#8217;re going to have to talk them through this process as well.<\/p>\n<p>Caching is a pain sometimes, but it&#8217;s been around for a long time, and we have UIs for dealing with it. The most common of these is the <a href=\"http:\/\/en.wikipedia.org\/wiki\/Wikipedia:Bypass_your_cache\">&#8220;hard refresh&#8221;<\/a> &#8211; hold down Ctrl, or Shift, and hit the reload button or keyboard shortcut, and the cache is by-passed completely and content is reloaded. Brilliant. Oh, but it only reloads the <em>page you&#8217;re looking at<\/em>, not the URI you originally requested, so it&#8217;s useless for cached redirects.<\/p>\n<p style=\"padding-left: 30px;\"><span style=\"color: #008000;\"><strong>&rarr; Suggestion\u00a03: Make a &#8220;hard refresh&#8221; request the original URI navigated to, not the one currently being viewed.<\/strong><\/span><\/p>\n<p style=\"padding-left: 30px;\"><span style=\"color: #800000;\">&rarr; Filed as <a href=\"https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=696650\">Mozilla Bugzilla <abbr title=\"Request For Enhancement\">RFE<\/abbr> 696650<\/a><\/span><\/p>\n<h2>Doing the Right Thing<\/h2>\n<p>Apparently, what we&#8217;re all doing wrong, as developers, is not sending\u00a0appropriate cache headers along with the 301 status code. If you&#8217;re writing, say, a PHP script, and using <tt>header('Location: foo')<\/tt>, you should probably be doing it in some kind of wrapper function, so you can make up your own default expiry, and make sure you send a whole bunch of control headers whenever you redirect.<\/p>\n<p>But a lot of redirects are not written in a rich server-side scripting language, they use specific tools built into the web server, like Apache&#8217;s incredibly powerful mod_rewrite. I just checked, and <a href=\"http:\/\/httpd.apache.org\/docs\/trunk\/mod\/mod_rewrite.html\">there is no function in the latest version of mod_rewrite<\/a> that lets you control caching, or send arbitrary HTTP headers, when it generates a 301 response. I&#8217;m sure there are ways of stringing together a whole bunch of Apache directives to achieve the desired effect, but it would take me a while to come up with the right combination, and it would look a mess.<\/p>\n<p style=\"padding-left: 30px;\"><span style=\"color: #008000;\"><strong>&rarr; Suggestion\u00a04: Web server redirect functions, such as mod_rewrite, should build in control over caching headers.<\/strong><\/span><\/p>\n<p style=\"padding-left: 30px;\"><span style=\"color: #800000;\">&rarr; Posted to <a href=\"http:\/\/httpd.apache.org\/lists.html#http-dev\">Apache HTTPD dev list<\/a><\/span><\/p>\n<h2>Inconsistent Behaviour<\/h2>\n<p>Finally, let&#8217;s assume we&#8217;re using a tool for our redirects that lets us control the cache headers, and we don&#8217;t have any old immortal redirects to avoid infinite loops with. All is fine, and the browsers will do the right thing and save us some network traffic, right?<\/p>\n<p>Well, according to\u00a0<a href=\"http:\/\/www.stevesouders.com\/blog\/2010\/07\/23\/redirect-caching-deep-dive\/\">this test result table from someone who is<em> in favour<\/em> of redirect caching<\/a>, probably not. Like everything out there in web land, you have to deal with every browser handling things just a little bit differently. Combinations which <em>should<\/em> work might or might not actually be reliable across browsers.<\/p>\n<p>Even client-side caching of standard resources is, as <a href=\"http:\/\/blogs.atlassian.com\/developer\/2007\/07\/when_caching_is_not_caching.html#comment-109589\">one comment I just came upon puts it<\/a> &#8220;a bit of a black art&#8221;<\/p>\n<p style=\"padding-left: 30px;\"><span style=\"color: #008000;\"><strong>&rarr; Suggestion\u00a05: Somebody (( Sorry, not me &#8211; I&#8217;m one of the people who needs the crib sheet, not one who can write it!\u00a0)) needs to work out what the most common scenarios for redirect caching actually are, and how to achieve them.<\/strong><\/span><\/p>\n<h2>Conclusion<\/h2>\n<p>My gut instinct the first, second, and\u00a0twelfth\u00a0times I ran up against redirect caching was that it was stupid, and the browser developers should take it out immediately. But I do see that it makes sense, and could save a lot of unnecessary network traffic and server load. So instead, all I ask is that people don&#8217;t brush the problems off with &#8220;oh, well, you should have known what <em>permanent<\/em>\u00a0means&#8221;, and look at how to make redirect caching work in the real world.<\/p>\n<hr \/>\n<p>Update 2011-10-23: Bugs \/ <abbr title=\"Request For Enhancement\">RFE<\/abbr>s filed against Firefox and Apache. I am less certain of how to test and report other browsers&#8217; behaviour.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>There are a lot of URLs out there on the Web; and a pretty big number of those URLs are either alternative names for something, or old locations that have been superseded. So &#8220;redirects&#8221; from one URL to another are a common feature of the web, and have been for many years. But recently, the [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[150,151,148,149,145,90,146,147,57],"class_list":["post-200","post","type-post","status-publish","format-standard","hentry","category-uncategorized","tag-cached-redirects","tag-browsers","tag-cache","tag-caching","tag-http","tag-programming","tag-redirect","tag-redirects","tag-web","post-preview"],"_links":{"self":[{"href":"https:\/\/rwec.co.uk\/blog\/wp-json\/wp\/v2\/posts\/200","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/rwec.co.uk\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/rwec.co.uk\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/rwec.co.uk\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/rwec.co.uk\/blog\/wp-json\/wp\/v2\/comments?post=200"}],"version-history":[{"count":16,"href":"https:\/\/rwec.co.uk\/blog\/wp-json\/wp\/v2\/posts\/200\/revisions"}],"predecessor-version":[{"id":216,"href":"https:\/\/rwec.co.uk\/blog\/wp-json\/wp\/v2\/posts\/200\/revisions\/216"}],"wp:attachment":[{"href":"https:\/\/rwec.co.uk\/blog\/wp-json\/wp\/v2\/media?parent=200"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/rwec.co.uk\/blog\/wp-json\/wp\/v2\/categories?post=200"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/rwec.co.uk\/blog\/wp-json\/wp\/v2\/tags?post=200"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}