Your <body> is in your <head>

Michael Geary | Fri, 2006-04-28 19:16

I was chasing down a bizarre bug. My JavaScript code was working fine in IE, Firefox, and Safari, until I tried using it here on mg.to. It still worked in Firefox and Safari, but it blew up in IE, with behavior that made no sense at all. I was seeing duplicate copies of DOM elements I created using my DOM creation plugin for jQuery, and all kinds of strange behavior. It was almost as if something was fundamentally wrong with the way the DOM was working.

I’m usually pretty good at tracking down problems, but this one had me stumped. I wanted to look around the the DOM structure, but since this was IE, I couldn’t use any of the usual Firefox plugins such as the DOM Inspector or FireBug. Then I found a great tool for IE troubleshooting, the DebugBar.

I looked at the DOM with the DebugBar, and my jaw dropped when I saw this:

DebugBar DOM

(This screen shot and the others below are from simplified test cases.)

This had to be impossible! The document <body> was inside the <base> tag, which in turn was inside the <head>. The latter I’d expect, but why was <body> not a sibling of <head> as it should be? And how was <base> involved in this?

I got lucky with some searches and found these articles by Justin Rogers from the IE team:

This part of the “Implied tags” article seemed to explain what was going on:

The set of implied rules has impacts in other areas as well. You can, for instance, end up using document.writeln to prematurely terminate your HEAD element and move a bunch of stuff out into the BODY. So, if you are doing inline document writes you should probably do them where you want the content to go. Writing the content out in script blocks that appear in the head is the wrong way to go about it. You could hook up to some events or have a container element that you write into, and that is acceptable, but with inline writes you could get unexpected behavior.

Recently I noticed a site that was doing a document.writeln in their HEAD element about half-way through the head content. End result? Well, the content got moved into the BODY element and the object model tree for the page was completely wrong. Good thing they weren’t navigating the object model looking for stuff and good thing the extra META/LINK elements weren’t being used as well. With a static parse of the page you wouldn’t even notice these problems, but when DHTML becomes involved it can change the structure of your document on the fly and rewrite what the object model tree looks like.

Indeed, this site runs on Drupal, and Drupal 4.6.x does use a <base> tag. (The forthcoming Drupal 4.7 eliminates the <base> tag.)

Justin’s examples showed an unclosed <base> tag like this:

<base href="foo">

That should be OK; the W3C HTML specification defines the BASE element as EMPTY, so it shouldn’t require any kind of closing tag (except for XHTML compatibility).

The <base> tag that Drupal generates is self-closing in the XHTML style:

<base href="https://mg.to/" />

However, IE6 does not seem to recognize that the tag is closed (or EMPTY), and it puts everything after that inside the BASE element.

(You may also notice that strictly speaking, Drupal’s <base> tag is incorrect. It should include a filename, but it seems to work OK without it—except for the IE problem.)

On a hunch, I tried closing the tag the old fashioned way:

<base href="https://mg.to/"></base>

and presto! Everything started working, and DebugBar revealed that the <head> and <body> elements were siblings, both direct children of <html> as expected.

The bottom line: Every HTML document in the world that uses a <base> tag is being parsed in this odd way by IE, unless an explicit closing </base> tag is used. It doesn’t affect ordinary HTML rendering, but any kind of DOM manipulation may go haywire.

Here are the three test cases. First, the unclosed/empty <base> tag:

Unclosed BASE tag

The XHTML-style <base /> tag is no better:

Self-closing BASE tag

And the one that works, with a closing </base> tag:

BASE tag with closing tag

There is one remaining problem. If you validate your pages as HTML 4.01 Transitional, there is no way to use a <base> tag that works correctly in IE and also validates. The validator barfs on the closing </base> tag, because it figures the tag is already closed (being an EMPTY element).

If you use XHTML 1.0 (either Transitional or Strict), then you can use the closing </base> tag and it will validate. Since most people who validate their pages are probably using XHTML anyway, this shouldn’t be a problem for many.

However, the W3C’s XHTML/HTML Compatibility Guidelines offer this warning:

…use the minimized tag syntax for empty elements, e.g. <br />, as the alternative syntax <br></br> allowed by XML gives uncertain results in many existing user agents.

Well, that’s just great. The only syntax that works in IE and validates is <base></base>, but W3C warns against it. I haven’t actually seen any problems caused by using this syntax with the <base> tag, though, even in old browsers. So for now, I’m using it and hoping for the best.

Major thanks are due to the DebugBar for pointing me toward the problem, and Justin Rogers for explaining it.

Submitted by Ashutosh (not verified) on Fri, 2006-05-12 21:11.

Hi I looked up debugbar.com that you mentioned, and it appears to be a licensed software. I just came across a DOM viewer plugin by MS itself - IE Developer Toolbar. It did the job perfectly, and for free! Regards Ashutosh