<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>rwec.co.uk &#187; web</title>
	<atom:link href="http://rwec.co.uk/blog/tag/web/feed/" rel="self" type="application/rss+xml" />
	<link>http://rwec.co.uk/blog</link>
	<description>Rowan&#039;s World, Et Cetera</description>
	<lastBuildDate>Sun, 27 Nov 2011 19:13:17 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Cached Redirects Considered Harmful (and how browsers can fix them)</title>
		<link>http://rwec.co.uk/blog/2011/10/cached-redirects-considered-harmful/</link>
		<comments>http://rwec.co.uk/blog/2011/10/cached-redirects-considered-harmful/#comments</comments>
		<pubDate>Sun, 09 Oct 2011 20:57:18 +0000</pubDate>
		<dc:creator>Rowan</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA["cached redirects"]]></category>
		<category><![CDATA[browsers]]></category>
		<category><![CDATA[cache]]></category>
		<category><![CDATA[caching]]></category>
		<category><![CDATA[http]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[redirect]]></category>
		<category><![CDATA[redirects]]></category>
		<category><![CDATA[web]]></category>

		<guid isPermaLink="false">http://rwec.co.uk/blog/?p=200</guid>
		<description><![CDATA[There are a lot of URLs out there on the Web; and a pretty big number of those URLs are either alternative names for something, or old locations that have been superseded. So "redirects" from one URL to another are a common feature of the web, and have been for many years. But recently, the [...]]]></description>
			<content:encoded><![CDATA[<p>There are a lot of URLs out there on the Web; and a pretty big number of those URLs are either alternative names for something, or old locations that have been superseded. So "redirects" from one URL to another are a common feature of the web, and have been for many years. But recently, the way these redirects behave has been changing, because performance-conscious browser developers have started caching redirects, rather than re-requesting them from the server every time.</p>
<p>In theory, this makes perfect sense, but in practice, it causes web developers like me a lot of pain, because nothing "permanent" is actually <em>that</em> permanent. I'm not saying no browser should ever cache a redirect, but I do have a few suggestions of ways they could be a little more helpful about it.<br />
<span id="more-200"></span></p>
<h2>The Technology</h2>
<p>Let's be clear what we're talking about - <a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.3">the HTTP/1.1 specification defines a class of status codes grouped as "Redirection" statuses</a>, in the range 3xx. The main candidate for browser caching is status code 301, which the specification labels "Moved Permanently". The specification specifically states that "this response is cacheable unless indicated otherwise" - that is, unless the response also includes headers specifically relating to cache control, a User Agent can assume that it's OK to cache this response and not re-request the original URL. It doesn't go into any more detail, and neither this nor the next sentence (about "clients with link editing capabilities") is couched in the standard RFC form of "User Agents MAY ..." (let alone "SHOULD"), but the clear message is "this old URI is irrelevant, just use the new one".</p>
<p>So, newer versions of Firefox, Chrome, and <a href="http://blogs.msdn.com/b/ie/archive/2010/07/14/caching-improvements-in-internet-explorer-9.aspx">Internet Explorer</a> have all started "obeying" this part of the standard, and opting to cache all HTTP 301 responses if not told otherwise.</p>
<h2>The Problem</h2>
<p>So, what's the big problem? Well, the fact that a "resource has been assigned a new permanent URI" doesn't mean that there won't be some point in time where someone wants to use the old URI for something else. Worse, people have an unhelpful habit of changing their decisions later.</p>
<p>Consider this scenario:</p>
<ol>
<li>A  company has a product called a "Thingummy™", with a description at http://example.com/thingummy/</li>
<li>They decide that the name is too unwieldy, so re-brand it as "Widget™". Knowing that <a href="http://www.w3.org/Provider/Style/URI">cool URIs don't change</a>, the developers permanently redirect the old URL to the new page about the same product, at http://example.com/widget/ Old links remain valid, and direct customers to the information they were looking for.</li>
<li>A couple of years down the line, the company decides to boost sales by releasing a "classic" version of the Widget™ under the old Thingummy™ brand. They put up a new page at http://example.com/thingummy/ Sadly, some customers continue being redirected to http://example.com/widget/ by their browsers' cache.</li>
<li>With Thingummies™ massively out-selling Widgets™, the company comes full circle, and abandons the new brand. The developers put in a redirect that permanently points http://example.com/widget/ back to http://example.com/thingummy/. At this point, <strong>all hell breaks loose</strong>. Well, maybe not, but customers trying to access the product page find mysterious messages on their screen about "infinite redirects". Management are not impressed.</li>
</ol>
<div>
<h2>How Permanent is "Permanently"?</h2>
<p>The biggest problem in all this is that developers seem to have taken the word "permanent" rather too literally. The Random House dictionary lists 4 definitions of permanent; the first is:</p>
<blockquote><p>existing perpetually; everlasting, especially without significant change</p></blockquote>
<p>In theory, this seems a reasonable definition, but in reality <strong>nothing lasts forever</strong>. Nothing physical, nothing electronic, and certainly not the structure of a website. So clearly, when the HTTP specification says "a new permanent URI", it doesn't actually expect you to guarantee its perpetual existence. Let's try the second definition:</p>
<blockquote><p>intended to exist or function for a long, indefinite period without regard to unforeseeable conditions</p></blockquote>
<p>Ah, now that's more like it - by that definition, a 301 status indicates a redirect which will remain valid for "a long, indefinite period", unless there are "unforeseeable conditions".</p>
<p>I did a quick test earlier, using Firefox 7 to visit a URL which returned a 301 status and no cache instructions, then inspecting the resulting cache entry using the built-in "about:cache page". Among the details is this - "expires: No expiration time". That's pretty permanent, by the first definition.</p>
<p>Effectively, having once seen that 301 response, it is not going to request the original URL <strong>ever</strong>, simply because the web developer didn't mention an expiry date, because they didn't <em>foresee</em> any conditions in which they would want to change the decision. (And I thought 24-hour <abbr title="Time To Live">TTL</abbr>s on <abbr title="Domain Name System">DNS</abbr> entries were annoying...)</p>
<p style="padding-left: 30px;"><span style="color: #008000;"><strong>&rarr; Suggestion 1: Cache entries for an unadorned 301 response should have a reasonable default life time, not last forever.</strong></span></p>
<p style="padding-left: 30px;"><span style="color: #800000;">&rarr; Filed as <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=696595">Mozilla Bugzilla Bug 696595</a></span></p>
<h2>Chasing your own Tail</h2>
<p>If you're foolish enough to have created an immortal redirect, or unlucky enough to have inherited one, you might find yourself wanting to put a new redirect pointing back the other way, as in my example above. But if you do, any browser which saw (and cached) the old redirect will simply see the new one <em>as well</em>, and follow both back and forth, back and forth, back and forth ... until eventually it decides there's an infinite loop and throws the user an error.</p>
<p>So instead, you have to resort to all sorts of <a href="http://getluky.net/2010/12/14/301-redirects-cannot-be-undon/">confusing workarounds to keep everything working</a>.</p>
<p>But the browser <em>knows</em> something is wrong, and it's not the user's fault, it's bad data in the browser cache (depending how you look at it, that's the web developers fault, but a browser that doesn't cut developers some slack won't get very far rendering real-world HTML...)</p>
<p style="padding-left: 30px;"><strong><span style="color: #008000;">&rarr; Suggestion 2: When an infinite redirect is detected, try skipping the cache.</span></strong></p>
<p style="padding-left: 30px;"><span style="color: #800000;">&rarr; Filed as <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=696646">Mozilla Bugzilla Bug 696646</a></span></p>
</div>
<h2>Escaping the Trap</h2>
<p>The first hurdle for developers is that it's not obvious what the hell is going on, even when they're just testing out a few different URL schemes in their development copy. I think <a href="http://forums.mozillazine.org/viewtopic.php?f=38&amp;t=621842">whatever unacceptable language this developer used on MozillaZine</a> probably sums up the feeling most of us had when first encountering this invisible magic.</p>
<p>But even once you've figured it out, it's not that obvious what to do about it - <a href="http://www.sadev.co.za/content/redirected-down-one-way-clearing-internet-explorer-host-redirect-cache">do you do a deep clean of the browser's cache every time something's not quite right?</a> I see a sledgehammer approaching a nut. And if you've released your mistake to a non-technical user, perhaps on a preview version of the site, you're going to have to talk them through this process as well.</p>
<p>Caching is a pain sometimes, but it's been around for a long time, and we have UIs for dealing with it. The most common of these is the <a href="http://en.wikipedia.org/wiki/Wikipedia:Bypass_your_cache">"hard refresh"</a> - hold down Ctrl, or Shift, and hit the reload button or keyboard shortcut, and the cache is by-passed completely and content is reloaded. Brilliant. Oh, but it only reloads the <em>page you're looking at</em>, not the URI you originally requested, so it's useless for cached redirects.</p>
<p style="padding-left: 30px;"><span style="color: #008000;"><strong>&rarr; Suggestion 3: Make a "hard refresh" request the original URI navigated to, not the one currently being viewed.</strong></span></p>
<p style="padding-left: 30px;"><span style="color: #800000;">&rarr; Filed as <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=696650">Mozilla Bugzilla <abbr title="Request For Enhancement">RFE</abbr> 696650</a></span></p>
<h2>Doing the Right Thing</h2>
<p>Apparently, what we're all doing wrong, as developers, is not sending appropriate cache headers along with the 301 status code. If you're writing, say, a PHP script, and using <tt>header('Location: foo')</tt>, you should probably be doing it in some kind of wrapper function, so you can make up your own default expiry, and make sure you send a whole bunch of control headers whenever you redirect.</p>
<p>But a lot of redirects are not written in a rich server-side scripting language, they use specific tools built into the web server, like Apache's incredibly powerful mod_rewrite. I just checked, and <a href="http://httpd.apache.org/docs/trunk/mod/mod_rewrite.html">there is no function in the latest version of mod_rewrite</a> that lets you control caching, or send arbitrary HTTP headers, when it generates a 301 response. I'm sure there are ways of stringing together a whole bunch of Apache directives to achieve the desired effect, but it would take me a while to come up with the right combination, and it would look a mess.</p>
<p style="padding-left: 30px;"><span style="color: #008000;"><strong>&rarr; Suggestion 4: Web server redirect functions, such as mod_rewrite, should build in control over caching headers.</strong></span></p>
<p style="padding-left: 30px;"><span style="color: #800000;">&rarr; Posted to <a href="http://httpd.apache.org/lists.html#http-dev">Apache HTTPD dev list</a></span></p>
<h2>Inconsistent Behaviour</h2>
<p>Finally, let's assume we're using a tool for our redirects that lets us control the cache headers, and we don't have any old immortal redirects to avoid infinite loops with. All is fine, and the browsers will do the right thing and save us some network traffic, right?</p>
<p>Well, according to <a href="http://www.stevesouders.com/blog/2010/07/23/redirect-caching-deep-dive/">this test result table from someone who is<em> in favour</em> of redirect caching</a>, probably not. Like everything out there in web land, you have to deal with every browser handling things just a little bit differently. Combinations which <em>should</em> work might or might not actually be reliable across browsers.</p>
<p>Even client-side caching of standard resources is, as <a href="http://blogs.atlassian.com/developer/2007/07/when_caching_is_not_caching.html#comment-109589">one comment I just came upon puts it</a> "a bit of a black art"</p>
<p style="padding-left: 30px;"><span style="color: #008000;"><strong>&rarr; Suggestion 5: Somebody needs to work out what the most common scenarios for redirect caching actually are, and how to achieve them.</strong></span></p>
<h2>Conclusion</h2>
<p>My gut instinct the first, second, and twelfth times I ran up against redirect caching was that it was stupid, and the browser developers should take it out immediately. But I do see that it makes sense, and could save a lot of unnecessary network traffic and server load. So instead, all I ask is that people don't brush the problems off with "oh, well, you should have known what <em>permanent</em> means", and look at how to make redirect caching work in the real world.</p>
<hr />
Update 2011-10-23: Bugs / <abbr title="Request For Enhancement">RFE</abbr>s filed against Firefox and Apache. I am less certain of how to test and report other browsers' behaviour.</p>
]]></content:encoded>
			<wfw:commentRss>http://rwec.co.uk/blog/2011/10/cached-redirects-considered-harmful/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Top 10 Things Not To Do with Links</title>
		<link>http://rwec.co.uk/blog/2011/06/top-10-bad-web-links/</link>
		<comments>http://rwec.co.uk/blog/2011/06/top-10-bad-web-links/#comments</comments>
		<pubDate>Sun, 12 Jun 2011 19:06:02 +0000</pubDate>
		<dc:creator>Rowan</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[coding]]></category>
		<category><![CDATA[html]]></category>
		<category><![CDATA[hyperlinks]]></category>
		<category><![CDATA[link]]></category>
		<category><![CDATA[links]]></category>
		<category><![CDATA[pet hates]]></category>
		<category><![CDATA[rant]]></category>
		<category><![CDATA[satire]]></category>
		<category><![CDATA[top 10]]></category>
		<category><![CDATA[web]]></category>

		<guid isPermaLink="false">http://rwec.co.uk/blog/?p=185</guid>
		<description><![CDATA["Hyperlinks" are probably the single most important thing on the World Wide Web. They are, affter all, what the "web" is woven from; they are what makes it something more than the document retrieval systems that came before.

And yet, some people seem to do their utmost to make all the hyperlinks in their documents entirely useless. Here are my Top 10 Things Not To Do with Links...]]></description>
			<content:encoded><![CDATA[<p>"Hyperlinks" are probably the single most important thing on the World Wide Web. They are, affter all, what the "web" is woven from; they are what makes it something more than the document retrieval systems that came before.</p>
<p>And yet, some people seem to do their utmost to make all the hyperlinks in their documents entirely useless. Here are my Top 10 Things Not To Do with Links...<br />
<span id="more-185"></span></p>
<div id="bad_link_demo">
<ol>
<li>Don't have any. Refer to other websites by name, mention other blog posts, but leave your article entirely unlinked to the outside world, as though you've scanned it in from a typewriter.</li>
<li>Don't include any links or footnote markers in the main text, but at the end of the page have a long "useful links" section, sorted alphabetically by URL.</li>
<li>Only ever link to homepages of other sites, never the article you're talking about. It's also important not to mention the exact title of the article, or your readers would be able to find it too easily.</li>
<li class="invisible_links">Make your links <a href="http://www.thisamericanlife.org/radio-archives/episode/178/Superpowers">invisible</a> until hovered over, so that your article becomes a <a href="http://www.kingdomofloathing.com/">point-and-click adventure</a> where readers have to guess where links <em>might</em> be.</li>
<li class="fading_links">When hovered over, make your links <a href="http://www.flickr.com/photos/debbcollins/5021251012/">fade into the background</a>, to test your readers memory skills.</li>
<li>Automatically link random <a href="http://rwec.co.uk/blog/tag/language">words</a> in the middle of unrelated sentences <a href="http://rwec.co.uk/q/wherefore">because</a> it's good for <abbr title="Search Engine Optimisation">SEO</abbr>. Make these look identical to <a href="http://www.positioniseverything.net/explorer.html">genuinely useful links</a> so that readers learn to ignore all inline links on your site (see point 1).</li>
<li>Use advertising software that automatically <a class="irritating_advert" href="http://www.youtube.com/watch?v=5pidokakU4I">picks<span class="dodgy_popup_content"><img src="http://rwec.co.uk/media/blog/link-thumbs/picks.png" alt="" /></span></a> keywords out of your text to <a class="irritating_advert" href="http://www.travisperkins.co.uk/">turn<span class="dodgy_popup_content"><img src="http://rwec.co.uk/media/blog/link-thumbs/turn.jpeg" alt="" /></span></a> into fake links which pop up an advert on top of the whole page when hovered over.</li>
<li class="link_thumbs">Install code that pops up a "preview" of any link the reader hovers over. This should preferably be easy to activate <span class="dodgy_popup_trigger">by mistake, <a href="http://tlipsum.appspot.com/">incredibly slow to load</a><span class="dodgy_popup_loading"><img title="P L E A S E    W A I T" src="http://rwec.co.uk/media/blog/link-thumbs/loading.gif" alt="P L E A S E    W A I T" /></span><span class="dodgy_popup_content"><img src="http://rwec.co.uk/media/blog/link-thumbs/essay.png" alt="Sorry, what's alt text for? I'm told I have to have some!" /></span>, and too</span> small to be in any way useful.</li>
<li>In a sentence which refers to a particular article, put the link not around the article reference, but around other parts of the sentence, crossing between <a href="http://specgram.com/">clauses so that the link</a> text makes no grammatical sense.</li>
<li><a class="external text" rel="nofollow" href="http://en.wikipedia.org/w/wiki.phtml?title=Hyperlink&amp;oldid=8089525">Link</a> <a class="mw-redirect" title="Every" href="http://en.wikipedia.org//wiki/Every">every</a> <a title="Word" href="http://en.wikipedia.org/wiki/Word">word</a> <a class="mw-redirect" title="In" href="http://en.wikipedia.org/wiki/In">in</a> <a class="mw-redirect" title="The" href="http://en.wikipedia.org/wiki/The">the</a> <a title="Text" href="http://en.wikipedia.org/wiki/Text">text</a> <a class="mw-redirect" title="To" href="http://en.wikipedia.org/wiki/To">to</a> <a title="A" href="http://en.wikipedia.org/wiki/A">a</a> <a title="Different" href="http://en.wikipedia.org/wiki/Different">different</a> <a title="Page" href="http://en.wikipedia.org/wiki/Page">page</a>, <a class="mw-redirect" title="Making" href="http://en.wikipedia.org/wiki/Making">making</a> <a title="Sentences" href="http://en.wikipedia.org/wiki/Sentences">sentences</a> <a class="mw-redirect" title="Unreadable" href="http://en.wikipedia.org/wiki/Unreadable">unreadable</a>.</li>
</ol>
</div>
<p><script src="https://ajax.googleapis.com/ajax/libs/jquery/1.6.1/jquery.min.js" type="text/javascript"></script> </p>
<p><script type="text/javascript">
jQuery(function(){
	jQuery('#bad_link_demo a.irritating_advert')
		.bind('mouseenter', function() {
			jQuery(this).find('.dodgy_popup_content').show();
		} )
		.bind('mouseleave', function() {
			jQuery(this).find('.dodgy_popup_content').hide();
		} );
	var dodgy_timeout;
	jQuery('#bad_link_demo .dodgy_popup_trigger')
		.bind('mouseenter', function() {
			jQuery(this).find('.dodgy_popup_loading').show();
			var $this = jQuery(this);
			dodgy_timeout = setTimeout( function()
			{
				$this.find('.dodgy_popup_loading').hide();
				$this.find('.dodgy_popup_content').show();
			}, Math.floor(Math.random()*25000) + 5000 );
		} )
		.bind('mouseleave', function() {
			clearTimeout(dodgy_timeout);
			$(this).find('.dodgy_popup_loading').hide();
			$(this).find('.dodgy_popup_content').hide();
		} );
});</script></p>
]]></content:encoded>
			<wfw:commentRss>http://rwec.co.uk/blog/2011/06/top-10-bad-web-links/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>The Non-Semantic Web: A blog entry is not a database</title>
		<link>http://rwec.co.uk/blog/2010/01/a-blog-entry-is-not-a-database/</link>
		<comments>http://rwec.co.uk/blog/2010/01/a-blog-entry-is-not-a-database/#comments</comments>
		<pubDate>Sun, 31 Jan 2010 20:58:38 +0000</pubDate>
		<dc:creator>Rowan</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[ai]]></category>
		<category><![CDATA[artificial intelligence]]></category>
		<category><![CDATA[html]]></category>
		<category><![CDATA[language]]></category>
		<category><![CDATA[markup]]></category>
		<category><![CDATA[semantic]]></category>
		<category><![CDATA[semantic web]]></category>
		<category><![CDATA[web]]></category>

		<guid isPermaLink="false">http://rwec.co.uk/blog/?p=84</guid>
		<description><![CDATA[There has long been a hope - an expectation, even - that the Web will somehow develop into something "smart"; that it will move from being a mere store-house of information to something that will actually "know the answers". But the vision tends to overlook the nature of both computers and humans. On the one [...]]]></description>
			<content:encoded><![CDATA[<p>There has long been a hope - an <em>expectation</em>, even - that the Web will somehow develop into something "smart"; that it will move from being a mere store-house of information to something that will actually "know the answers".</p>
<p>But the vision tends to overlook the nature of both computers and humans. On the one hand, humans have a limited memory, and a flawed ability to  apply consistent logic; on the other hand, we have abilities at creatively interpreting knowledge and ideas that are far beyond the capacity  of any computer so far designed.</p>
<p><span id="more-84"></span></p>
<h2>Big Calculators</h2>
<p>Artificial Intelligence is a fascinating area of research - it's what I studied at University - but it is one that has had something of a reality check over the years. At first, it seemed like only a matter of time before computers of some sort or another would exceed all human abilities, ushering in a technological utopia - or perhaps an apocalypse. But it turns out there are things computers are <em>really good at</em> and things they're <em>really bad at</em>.</p>
<p>The classic example is chess computers: when researchers started building them, the reasoning was that since it takes an intelligent human to play chess, a computer that could play chess would be intelligent too. Eventually, they built a computer that could consistently beat the best human chess players; unfortunately, they discovered that that's <em>all</em> they'd built - they hadn't broken through to a new level of computing, they'd just broken down chess into a very long list of sums. And <strong>computers are good at sums</strong>.</p>
<p>Another example is neural networks - the wonderfully futuristic sounding "Multi-Layer Perceptron" is a learning system based on simplified mathematical models of neurons, the building blocks of the human brain. You can "teach" such a network to recognise things, and it starts seeming very clever. But researchers realised that you don't really need the clever model at all, because it's all just maths - and indeed you can make a so-called "N-tuple Neural Network" that consists entirely of lookup tables, with no real maths involved, and certainly no "brain". And while there are now very handy machines that can recognise faces, it's still <strong>no use trying to have a conversation with one</strong>.</p>
<h2>Whose Markup is it Anyway?</h2>
<p>The "Semantic Web", at the moment, is like that early AI research - it all seems just around the corner, if only we all try a little bit harder. In fact, the Web has already come a long way, and while the basic language of HTML remains the same, its usage has changed dramatically over the years.</p>
<p>At first, <em>HyperText Markup Language</em> was a very simple way of "marking up" documents in the big text repository that was the World Wide Web - separating out paragraphs, marking headlines and important bits, and, of course, turning text into "anchors" which "hyperlinked" documents together. As it grew, the Web demanded more, prettier, pages, and HTML grew into a rich, <em>presentational</em> language - with colours, tables, images, even simple animations. Then, the needs to automate complex and interactive systems meant HTML had to become a <em>structural</em> language as well - one that could identify blocks of a page, take them apart, and join them back together in a different order.</p>
<p>At this stage, a question arises, which is just <strong>how much structure does a document need?</strong> Two technologies readily associated with <em>structural</em> HTML are XHTML - HTML with the added rigidity and tools of XML - and the <abbr title="Document Object Model">DOM</abbr> - a way of look at a document as a big tree of "nodes". This is an extremely useful way of looking at a page with lots going on - you can work dynamically with the navigation controls, or the interactive comment system, without touching the main content area. But how useful is it <em>inside</em> the content? In the sentence "I <em>really</em> like structure", the word "really" is wrapped in its own HTML element, but not because of any <em>structural significance</em> - it's just there for presentation.</p>
<p>Now, we're told, HTML needs to evolve further, into a <em>semantic</em> markup, which doesn't just separate out the blocks, it labels them all <em>meaningfully</em>. I've used italics as an example deliberately, because a common misconception is that the &lt;i&gt; tag is bad - it's <em>presentational</em> - and &lt;em&gt; is better - it means <em>emphasise this</em>, rather than <em>draw this in italics</em>: much more <em>semantic</em>, surely? And so the Rich-Text control I'm typing in right now has a button labelled "I" - the recognised icon for italics - which inserts an &lt;em&gt; tag; it even replaces &lt;i&gt; with &lt;em&gt; if you edit the HTML! This is, obviously, a complete misunderstanding - <strong>a straight find-and-replace can't magically gain us meaning</strong> - but the fact is, it's what <em>people</em> are used to, and it's <em>people</em> that write the content.</p>
<p>So this is the second conflict facing the Web - authors don't want to add all this information. The real reason &lt;i&gt; is discouraged is that when it's <em>not</em> being used for emphasis, the machine has no way of knowing why it <em>is</em> being used: it might be to indicate a foreign phrase, like <em>in situ</em>, in which case we could indicate what language it is; or it might be the name of a publication, like <em>The Times</em>, in which case we could label it with some appropriate global identifier. But the fact is, a human reader won't gain anything from this, and <strong>content is written by humans, for humans</strong>.</p>
<h2>Playing to our Strengths</h2>
<p>In the end, the Semantic Web will not be built by insisting that I tell WordPress that <em>in situ</em> is Latin - who would it help if I did? A blog entry is not, and never will be, a database - it's a big block of text, or perhaps a few medium-sized blocks of text, with some decoration and links thrown in. The <em>structural</em> markup can help us divide up those blocks, and mark off the bits of the page which aren't part of the blog entry at all; the <em>semantic </em>markup will improve how those <em>blocks </em>get labelled.</p>
<p>If you want to find pages about a particular topic, what you need is a  really big index, and that's a good job to give a computer; if you want to find "the right page" about a particular topic, it's up to you to decide what "right" means. We can help the computers to help us - by making articles that explicitly label their own topics, for instance (as long as you can trust them, but that's another issue). We can even decide that sometimes we do want to embed little bits of <em>data</em> in our <em>content</em> - if we know an address might be useful to someone, we could label it with an appropriate <a href="http://microformats.org/">microformat</a>, so they can more easily feed it to another piece of software. But if they want to look up what <a href="http://en.wiktionary.org/wiki/in_situ"><em>in situ</em></a> or <a href="http://foldoc.org/perceptron">Perceptron</a> actually means, we can leave it to a human to look it up in an appropriate index.</p>
<p>This may be a disappointing prognosis if you were hoping the web could tell  you the answers without you having to read the articles, but think of it  as a division of labour: the computers can get better at <em>categorising </em>and <em>filtering </em>the content, we can carry on being good at <em>understanding </em>it.</p>
]]></content:encoded>
			<wfw:commentRss>http://rwec.co.uk/blog/2010/01/a-blog-entry-is-not-a-database/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

