Programming

Doing an IAmA

Michael Geary | Thu, 2011-05-05 16:08

I’m doing an IAmA on Reddit:

I was one of the first people Steve Jobs ever fired

Geek history! :-)

Annotate your YouTube video with AnnoTube

Michael Geary | Tue, 2008-06-03 19:57

AnnoTube is a jQuery plugin that makes it easy to embed a YouTube video in your page along with an index and notes that are displayed and synchronized with the video as it plays. Each note can be an HTML snippet, or a URL to be loaded into an IFRAME, or even a JavaScript function to be run when a specific time in the video is reached.

Here’s a demo: the Mapping the Votes talk I gave at Google in April, with annotations provided by AnnoTube. This solves a mistake I made in this talk: Like many speakers, I left the text too small in my code examples. It looked fine to the people in the room, but the code is really hard to read in the YouTube video. But with the annotations I can put big, readable text right next to the video at the right time.

Fair warning: At this moment, I’ve only annotated the first 15 minutes of the video, and all of the notes so far are actually web pages in an IFRAME. Also, the AnnoTube plugin itself isn’t quite ready for general use. This is all a bit of work in progress, which started with something I put together for fun during the Google I/O conference. I’m only posting now because the YouTube API team is mentioning it in a post of their own, so please watch this page for updates over the next few days.

For a start, here’s what the timeline for my talk looks like (with URLs shortened to avoid long lines):

$.annotube({
    "video": "QIPKmkeMuz4",
    "timeline": [
        "00:00|Introduction: Pamela Fox",
        "00:14|GAsync() API|https://mg.to/...",
        "00:58|Iowa Mapplet|https://maps.google.com/maps/mpl?moduleurl=...",
        "01:37|Mapplet Performance",
        "02:07|Iframes and Security",
        "02:46|The Hash Hack",
        "04:56|Iowa Maps API Map|https://gmaps-samples.googlecode.com/...",
        "05:55|Mouseovers in API and Mapplet",
        "06:25|New Hampshire API Map|https://gmaps-samples.googlecode.com/...",
        "07:06|My Biased Map",
        "07:35|Three Kinds of Bias",
        "07:54|Winner Takes All?",
        "08:42|Where are the Delegates?",
        "08:50|Big County, Little County",
        "09:45|A Good-Looking Map",
        "10:10|Another Form of Bias?",
        "11:18|Tiles and Tweets|https://maps.google.com/maps/mpl?moduleurl=...",
        "11:45|A Twittervision Clone",
        "13:05|Proportional Pins|https://maps.google.com/maps/mpl?moduleurl=...",
        "14:35|The Gadget Version|https://gmodules.com/ig/creator?synd=...",
        ""
    ]
});

As you can see, it’s just a simple list of times, titles, and links or HTML snippets. AnnoTube takes care of connecting the events and watching for the times you specify.

More details soon… Thanks for your interest and patience!

Pennsylvania Primary Google Map

Michael Geary | Tue, 2008-04-22 02:48

Another day, another primary, another Google map. This time we added a bunch of demographic information using little sparkline graphs, with help from Jim Barnes of the National Journal. I think the voter registration by age is especially interesting. Check it out:

(The map probably won’t load in an RSS feed, so click through to the article to see it.)

You can get this map for your own site!

Mapping the Votes - resources

Michael Geary | Thu, 2008-04-03 21:46

I want to thank everyone who came to my Mapping the Votes talk at Google. The talk is available on YouTube - with apologies for the small font size in the code samples!

Here are some links and information that I referred to in the talk.

Maps and mapplets

Decision 2008 - the current election mapplet
Decision 2008 Gadget - the election map as a Google Gadget
Iowa Republican Caucus - an early API map
Iowa mapplet - an early mapplet
Twitter election map - the Super Tuesday twitter map (showing tweets from that day)
Campaign Trail - candidate calendars
New Hampshire in Google Earth - a KML file

Editors and desktop tools

The editor I used for the code samples is the one I use every day, Komodo IDE. Komodo’s debuggers for Ruby, Python, and PHP make it really easy to test my batch/script/server code. I’m especially fond of coding in the debugger. For the code that converts shapefiles and vote data into JSON output, I’d write the input part first, set a breakpoint and stop in the debugger after it reads the data, then write the conversion code with live data to look at while I code. Komodo also has a JavaScript debugger that works equally well, but most of the time I just use Firebug because of its simplicity.

Komodo IDE isn’t cheap, but I figure it paid for itself really fast. There’s also a free Komodo Edit that everyone should install even if you already have a favorite editor. Both versions have real-time syntax checking, where you get squiggly red underlines for syntax errors and squiggly green underlines for warnings, just like the spelling and grammar checkers in a word processor. This has saved me literally thousands of page reloads when testing, since Komodo catches my syntax errors before I even save the file. Komodo runs on Linux, Mac, and Windows.

One nice thing about GUI editors is that the basic editing works the same in all of them (or should), so it’s easy to switch back and forth if some other editor has a feature you want to take advantage of. Besides Komodo, I also use PSPad (free, Windows only), mostly because of its nice HTML/XML pretty-printer. It cleans up unreadable web page source code real quick.

Another expensive-but-well-worth-it tool for Windows and Mac is Araxis Merge, a terrific file compare and merge program with live editing. I use Merge as the diff/merge program for TortoiseSVN, which makes source control a dream.

A couple of free Windows tools I use every day are Zoom+ for screen zooming and my own JKLmouse for precise cursor control with the keyboard of your notebook computer. With JKLmouse, I can use the TrackPoint for fast cursor motion and then the keyboard for fine pixel-by-pixel movement, seamlessly and with no “modes”. (Sorry, I had to brag!)

Source code

The election map code is open source and is in two Google Code projects. The current code is in the primary-maps-2008 project, and the code for earliest caucuses and primaries is in the gmaps-samples project. (We moved the code to a new project to avoid filling up gmaps-samples!)

If you look at the code, go easy on me: much of it was written under severe time pressure. I asked if the elections could be delayed when I wasn’t quite ready, but even the mighty Google couldn’t seem to arrange that.

Also, if you read the code using the links provided here, there’s an awful lot of indentation, thanks to Google Code displaying my tab indentation using 8 spaces per tab. Shades of K&R! (So, why do I use tabs instead of two-space indents like everyone else? Well, one of the other benefits of Komodo is that unlike most code editors, it lets me edit in a proportional font. Two spaces in a proportional font is almost like not indenting at all.)

Shapefiles

Shapefiles are a wacky file format used for geographic data. Be thankful that other people have already written programs to pick them apart, so you and I don’t have to.

At first, I was using shp2text to convert shapefiles to an easy-to-use XML format (using the --gpx option), but this loses some of the information in the shapefile. More recently, Zachary Forest Johnson, author of the interesting indiemaps blog, wrote shpUtils.py, which decodes shapefiles into usable Python data.

I extended shpUtils.py to calculate correct centroids, area and other information about the shapes, and to fix a few bugs. The updated version is in the primary-maps-2008 project.

Centroids

The election maps use the centroids of the state and county polygons to position markers for those states.

Centroids are one of those things that you think you understand and then find out you were completely wrong. My first guess was the same as Zachary’s, to take the arithmetic mean of all the points (X and Y separately). The Wikipedia article even seems to say this, but it’s talking about the centroid of the points, not the centroid of the polygon that those points define. If you read it carefully, the article does give the correct algorithm, but it’s better explained on this page, along with sample implementations in various languages.

Census bureau shapefiles

The state and county outlines in the election maps come from shapefiles provided by the Census Bureau. Most states report votes by county, but a few New England states report by town (County Subdivisions in the Census Bureau page), and a few other states report by congressional district.

Shapefile simplification

D’oh! I completely forgot to talk about this important topic. The Census Bureau shapefiles have too much detail to be usable in a browser-based map. If you draw polygons from them, it will be much too slow. A tile layer can handle more detail, but the graphic files will be larger than they could be, because of the excess detail.

MapShaper is a free online tool to simplify shapefiles. It is pretty neat—you can see the effect of your simplification in realtime as you try different settings. I used MapShaper for the election maps, with various levels of simplification: simpler for JavaScript and more detailed for tile layers. More recently I discovered the Map Simplification Program which looks ideal for programmed simplification.

The code that processes shapefiles for the election maps is in makepolys.py which generates JSON output, and maketiles.py which generates tiles from that JSON data using ImageMagick.

Votes and delegates

The code to convert vote data from the latest primaries is in voter.py. This processes CSV files provided by the Boston Globe and converts them to JSON data.

Twitter map

The Ruby script that gathers the Twitter updates uses the Jabber::Simple module written by Blaine Cook to create a custom Jabber client that talks to Twitter, and uses the Twittervision API to get geographic information. It parses the XML data with sweet Hpricot, then generates JSON data (but you probably saw that coming). If you like jQuery, you’ll like Hpricot.

Mapplet code

The election mapplet code is in decision2008.xml and map.js. The code for the Campaign Trail mapplet is in campaign-trail.xml and campaign-trail.js. The latter file has the latest versions of the Array.mapjoin(), Array.index(), Object.sort(), S(), and related functions that I talked about. They are at the top of the file, and not yet documented, but you can find examples of each in the code.

More to come

That’s it for now! I’ll be posting more detailed articles on some of these topics. If there is a particular area you’re interested in, please let me know in the comments.

Thanks!

Google Maps talk

Michael Geary | Wed, 2008-04-02 13:38

Update: I posted some notes and links from the talk.

I’m giving a talk at Google tonight at 6pm about the election maps I’ve been working on. I’ll be talking about:

  • How to use the same code for a mapplet, a Google Gadget, and a Maps API map
  • Turn shape files into map tiles, polygons, and markers
  • Collect voting results into JSON objects
  • Marker madness - can we make it fast enough?
  • Hosting on Google Code and Amazon S3
  • A custom Twitter map using Jabber to track keywords

I’ll follow up tomorrow with links to the resources I mention in the talk, and then will post a series of articles going into some of the topics in more detail. If you are at the talk or watch the YouTube video, let me know in the comments what areas are of most interest for follow-on articles.

To register for the talk: https://sv-gtug-4.eventbrite.com/

Thanks!

My little Google map

Michael Geary | Thu, 2008-02-07 20:39

I’ve been working on a project for Google this last month, a mapplet with primary election and caucus results. We’ve done different versions for the primaries so far. For previous states, the emphasis was on mapping the county-by-county results. The latest one is different, a bit of a Twittervision clone, but filtered for messages related to the elections instead of all Twitter messages.

Some people said it was lame and useless; others complained that they spent all day Tuesday watching it.

We report, you decide. :-)

There are plenty of stories to tell about this project, more later…

MakeProcInstance

Michael Geary | Thu, 2008-02-07 20:22

Wow, this was a blast from the past. Raymond Chen reminisces about a 16-bit Windows function, MakeProcInstance.

Thanks, Raymond, I think! I’ve been trying to forget the horror of 16-bit Windows programming. Be thankful that you don’t have to work with it, and neither does anyone else. :-)

Social Scripting from IBM

Michael Geary | Sat, 2007-09-08 22:56

Here is a script from IBM’s new CoScripter, to update your Facebook status:

* go to "https://www.facebook.com"
* enter your "e-mail address" (e.g. tlau@tlau.org) into the "Email:" textbox
* enter your password into the "Password:" textbox
* click the "Login" button
* click the "Profile" link
* click the "Update your status..." link
* enter your status into the status field

It reads just like the instructions you might write down for someone, but it’s an actual executable script. All the scripts are stored on a wiki so anyone can share and update them.

Very interesting… And definitely not your grandfather’s IBM.

Via Jon Udell.

Google adopts my GAsync() API

Michael Geary | Fri, 2007-07-27 11:02

Ben Appleton of the Google Maps API team posted today that Google has added my GAsync() function to the Mapplet API. I don’t see the function listed in the Mapplet API documentation yet, but it should be there soon.

In the meantime, you can read my post that describes the API and how to use it.

One thing not mentioned in Ben’s post: you can use GAsync() not only to improve your mapplet code, but also to write common code for both a mapplet and a regular Maps API application. To do this, you will probably need to include the GAsync() source code in your application—I don’t know if it’s been made part of the standard Maps API.

Thanks Ben and the Maps API team!

A fast and simple async API for Google Mapplets

Michael Geary | Wed, 2007-06-06 16:18

Update 2007-06-22: Version 2 now supports portable code that runs as both a mapplet and a Maps API app. Read about the update.

If you’re building a Google Mapplet that responds to map movement and resizing, you will soon find yourself writing code like this recent gem of mine:

map.getSizeAsync( function( size ) {
    map.getBoundsAsync( function( bounds ) {
        map.getCenterAsync( function( center ) {
            search( size, bounds, center );
        });
    });
});

What is going on here? I have a search() function that takes the current map size, bounds and center, runs a search and displays pins on the map. In a normal Google Maps application I could have simply coded:

search( map.getSize(), map.getBounds(), map.getCenter() );

But a Google Mapplet lives in a strange and different world. To isolate mapplet code from the Google domain, Google runs the mapplet in an IFRAME loaded from the gmodules.com domain. Cross-domain browser security prevents your code from communicating directly with the Google Maps frame loaded from maps.google.com.

The mapplet API uses the iframe fragment hack to allow limited communication between the mapplet and the Google map. This has two consequences:

  • The communication is asynchronous. This doesn’t affect the API for functions that simply set map state—they operate on a “fire and forget” basis. But functions that return information can’t do it directly. You have to provide a callback function that receives the information when it is ready.

  • The communication is slow. Everything is serialized through the fragment identifier (hash) of a hidden IFRAME. The map page and the mapplet frame each have interval timers running to watch for changes to this hash. A single getSomethingAsync() function call requires all these steps:

    1. Mapplet frame sets the hash to represent the function call.
    2. Map page timer wakes up, makes the actual Maps API call, and sets the hash to represent the return value.
    3. Mapplet frame timer wakes up, gets the value from the hash, and calls the callback function.

My code listed above makes three of these round trips to the maps page one after the other, because the callback for each function triggers the next step in the series. That’s a lot of timeouts—enough to cause a noticeable delay.

What if we could somehow combine all three information requests into a single round trip? That should speed things up quite a bit. Imagine a different Mapplet async API where you provide a list of Maps API functions and get back all of their responses in a single callback with multiple arguments. My three nested function calls and callbacks could be reduced to:

GAsync( map, 'getSize', 'getBounds', 'getCenter',
    function( size, bounds, center ) {
        search( size, bounds, center );
    });

(The sharp-eyed reader will note that the search() function could be used directly as the callback because it takes the same arguments:

GAsync( map, 'getSize', 'getBounds', 'getCenter', search );

But we’ll stick with the longer form for this discussion, because it makes it clear what the function arguments are.)

While we’re at it, we can provide a way to retrieve information for more than one object in a single call:

// Get the map center and the location of a marker,
// and find out if the marker is hidden
GAsync(
    map, 'getCenter',
    marker, 'getPoint', 'isHidden'
    function( mapCenter, markerPoint, markerHidden ) {
        // ...
    });

And for functions such as map.fromContainerPixelToLatLngAsync() which take an additional argument, we can allow an optional arguments array after any function name:

// Get the map center, top left corner, and zoom level
GAsync(
    map,
        'getCenter',
        'fromContainerPixelToLatLng', [ new GPoint(0,0) ],
        'getZoom'
    function( center, topleft, zoom ) {
        // ...
    });

Compare that with the equivalent nested functions using the existing API, which would take about three times longer to run:

map.getCenterAsync( function( center ) {
    map.fromContainerPixelToLatLngAsync( new GPoint(0,0), function( topleft ) {
        map.getZoomAsync( function( zoom ) {
            // ...
        });
    });
});

Good news: We don’t have to wait for Google to implement this zippy GAsync API or something like it. Although the public API only exposes individual xyzAsync() functions, the underlying iframe fragment dispatcher can queue up multiple function calls and return values into a single round trip.

The Google Maps team was kind enough to provide me with a nifty makeBarrier() function that allows us to queue up a number of async calls and get a single callback when all their values are ready. Using this function, my first example can be written as:

function makeBarrier( numCalls, callback ) {
    return function() {
        if( ! --numCalls ) callback();
    };
}

var size, bounds, center;

// The 3 here refers to the 3 callbacks that we want to synchronize below
var barrier = makeBarrier( 3, function() {
    search( size, bounds, center );
});

// Fire the 3 callbacks
map.getSizeAsync( function( returnedSize ) {
    size = returnedSize;
    barrier();
});

map.getBoundsAsync( function( returnedBounds ) {
    bounds = returnedBounds;
    barrier();
});

map.getCenterAsync( function( returnedCenter ) {
    center = returnedCenter;
    barrier();
});

As you can see, it’s up to us to count the functions and keep track of the values, but having done that, we can get all three values in a single round trip through the API. It’s literally three times faster than the nested API calls.

Armed with that information, could we code GAsync() as a layer on top of the existing mapplet async APIs? Indeed we can!

You can try out the code right now and see the speed difference with my test mapplet. Go to the Google Maps Developer Preview page, log into your Google account, and click the Add Content link under the Mapplets tab (or click the Browse Content button if that is what is there).

The next page will show a number of existing mapplets. Click the tiny Add by URL link next to the search button at the top of the page, and paste this URL into the URL box that opens up. You can also click this link to see the mapplet source code:

https://mg.to/mapplet/async/async.xml

(When you paste the link, make sure the https:// isn’t duplicated because of the text already in the box.)

Click the Add button and click OK on the confirmation dialog. Then click Back to Google Maps at the top left corner of the page, and you should see a new entry titled A fast simple mapplet async API. Click it to load the mapplet.

An info window should open in the map, displaying several items of information about the map, and the time it required to collect the information using the GAsync() API. Then try the Slow Async API radio button to see the performance using nested async calls.

The GAsync code used in the test mapplet is:

GAsync(
    map, 'getZoom', 'getSize', 'getBounds', 'getCenter',
        'fromContainerPixelToLatLng', [ new GPoint(0,0) ],
    marker, 'getPoint', 'getIcon',
    function info( zoom, size, bounds, center, topleft, point, icon ) {
        // ...
    });

and the corresponding nested async code is:

map.getZoomAsync( function( zoom ) {
    map.getSizeAsync( function( size ) {
        map.getBoundsAsync( function( bounds ) {
            map.getCenterAsync( function( center ) {
                map.fromContainerPixelToLatLngAsync( new GPoint(0,0), function( topleft ) {
                    marker.getPointAsync( function( point ) {
                        marker.getIconAsync( function( icon ) {
                            // ...
                        });
                    });
                });
            });
        });
    });
});
}

Finally, here is the GAsync source code. First, a compact version suitable for pasting into your own mapplet (or download async.js):

// GAsync v2 by Michael Geary
// Commented version and description at:
// https://mg.to/2007/06/22/write-the-same-code-for-google-mapplets-and-maps-api
// Free beer and free speech license. Enjoy!

function GAsync( obj ) {

    function callback() {
        args[nArgs].apply( null, results );
    }

    function queue( iResult, name, next ) {

        function ready( value ) {
            results[iResult] = value;
            if( ! --nCalls )
                callback();
        }

        var a = [];
        if( next.join )
            a = a.concat(next), ++iArg;
        if( mapplet ) {
            a.push( ready );
            obj[ name+'Async' ].apply( obj, a );
        }
        else {
            results[iResult] = obj[name].apply( obj, a );
        }
    }

    var mapplet = ! window.GBrowserIsCompatible;
    var args = arguments, nArgs = args.length - 1;
    var results = [], nCalls = 0;

    for( var iArg = 1;  iArg < nArgs;  ++iArg ) {
        var name = args[iArg];
        if( typeof name == 'object' )
            obj = name;
        else
            queue( nCalls++, name, args[iArg+1] );
    }

    if( ! mapplet )
        callback();
}

And a heavily commented version that explains how it works:

// GAsync v2 by Michael Geary
// https://mg.to/2007/06/22/write-the-same-code-for-google-mapplets-and-maps-api
// Free beer and free speech license. Enjoy!
//
// Call one or more xyzAsync() functions from the Google
// Mapplet API, with a single callback that receives the
// values from all of the called functions. The first argument
// to GAsync is the object to be used. The next arguments
// are each of the function names, without the 'Async'
// suffix. The last argument is the callback function.
// To call xyzAsync() functions for more than one object,
// list another object in the argument list and the function
// names after that will use that object.
// To call an xyzAsync() function that takes arguments of
// its own (other than the callback argument), place an
// array of those arguments after the function name in
// GAsync's argument list.

// Example calls, given existing map and marker objects
/*
    // Get the size, bounds, and center of the map
    GAsync( map, 'getSize', 'getBounds', 'getCenter',
        function( size, bounds, center ) {
            // ...
        });

    // Get the zoom level, size, bounds, and center for
    // the map, as well as the lat/long of the top left
    // corner of the map. Also get the point and icon
    // for marker.
    GAsync(
        map, 'getZoom', 'getSize', 'getBounds', 'getCenter',
            'fromContainerPixelToLatLng', [ new GPoint(0,0) ],
        marker, 'getPoint', 'getIcon',
        function( zoom, size, bounds, center, topleft, point, icon ) {
            // ...
        });

    // Equivalent code using nested xyzAsync calls:
    map.getSizeAsync( function( size ) {
        map.getBoundsAsync( function( bounds ) {
            map.getCenterAsync( function( center ) {
                // ...
            });
        });
    });

    map.getZoomAsync( function( zoom ) {
        map.getSizeAsync( function( size ) {
            map.getBoundsAsync( function( bounds ) {
                map.getCenterAsync( function( center ) {
                    map.fromContainerPixelToLatLngAsync( new GPoint(0,0), function( topleft ) {
                        marker.getPointAsync( function( point ) {
                            marker.getIconAsync( function( icon ) {
                                // ...
                            });
                        });
                    });
                });
            });
        });
    });
*/

function GAsync( obj ) {

    // Call the callback function provided in the GAsync() call
    function callback() {
        args[nArgs].apply( null, results );
    }

    // Queue a single xyzAsync() function call.
    // 'iResult' is the index into the final results array.
    // 'name' is the name of the Maps API function without
    // the Async suffix.
    // 'next' is the next argument to GAsync following name;
    // If next is an array, it contains the arguments to be
    // passed to xyzAsync().
    function queue( iResult, name, next ) {

        // Callback for the xyzAsync() function that was called
        // by this invocation of the queue() function.
        // value is the return value from the Maps API.
        function ready( value ) {

            // Save the result in the final results array
            results[iResult] = value;

            // If every async function has completed, call the
            // GAsync callback (in the last argument to GAsync)
            // with the final results array as its arguments.
            if( ! --nCalls )
                callback();
        }

        // Arguments array for the xyzAsync() call
        var a = [];

        // If 'next' is an array, it contains arguments to
        // be passed to xyzAsync()
        if( next.join )  // Arrays have .join, strings do not
            a = a.concat(next), ++iArg;  // append and skip

        // Call xyzAsync() in a mapplet or xyz() in the Maps API
        if( mapplet ) {
            // The callback for xyzAsync is its last argument
            a.push( ready );

            // Call xyzAsync() with arguments in 'a'
            obj[ name+'Async' ].apply( obj, a );
        }
        else {
            // Maps API, call xyz() and save its return value
            results[iResult] = obj[name].apply( obj, a );
        }
    }

    // Is this is a mapplet or the Maps API?
    var mapplet = ! window.GBrowserIsCompatible;

    // 'args' is a reference to GAsync's arguments array
    // that can be used in the nested functions.
    // 'nArgs' is is the number of arguments, not counting
    // the callback function at the end. (Thus, args[nArgs]
    // is a reference to the callback function.)
    var args = arguments, nArgs = args.length - 1;

    // 'results' is the final results array for the callback. It
    // will be populated from the individual ready() callbacks.
    // 'nCalls' is the total number of xyzAsync() calls. It is
    // incremented as those calls are queued, and then
    // decremented to discover when all the calls are done.
    var results = [], nCalls = 0;

    // Loop through GAsync()'s arguments, starting at
    // the first function name (after 'obj'), and ending
    // before the callback function.
    for( var iArg = 1;  iArg < nArgs;  ++iArg ) {

        // Get the name of the function to be called
        var name = args[iArg];

        // If the argument is an object, not a name,
        // switch to that object for subsequent calls.
        // If it is a name, count and queue the function.
        if( typeof name == 'object' )
            obj = name// change object
        else
            queue( nCalls++, name, args[iArg+1] );
    }

    // If using the Maps API, call the callback now
    if( ! mapplet )
        callback();
}

Of course, there are still cases where you will have to run one async call after another one. If you need one piece of information as input to a subsequent call, nested async functions are the way to do it. Even then, it may be possible to combine some other async calls into a single GAsync call, wherever they don’t depend on each other’s results. You’ll shave about a quarter second off your mapplet’s response time for every call you combine using GAsync.

Enjoy your simpler and faster mapplet code!

Thanks to Ben A. of Google for mapplet design and coding tips.

Your <body> is in your <head>

Michael Geary | Fri, 2006-04-28 19:16

I was chasing down a bizarre bug. My JavaScript code was working fine in IE, Firefox, and Safari, until I tried using it here on mg.to. It still worked in Firefox and Safari, but it blew up in IE, with behavior that made no sense at all. I was seeing duplicate copies of DOM elements I created using my DOM creation plugin for jQuery, and all kinds of strange behavior. It was almost as if something was fundamentally wrong with the way the DOM was working.

I’m usually pretty good at tracking down problems, but this one had me stumped. I wanted to look around the the DOM structure, but since this was IE, I couldn’t use any of the usual Firefox plugins such as the DOM Inspector or FireBug. Then I found a great tool for IE troubleshooting, the DebugBar.

I looked at the DOM with the DebugBar, and my jaw dropped when I saw this:

DebugBar DOM

(This screen shot and the others below are from simplified test cases.)

This had to be impossible! The document <body> was inside the <base> tag, which in turn was inside the <head>. The latter I’d expect, but why was <body> not a sibling of <head> as it should be? And how was <base> involved in this?

I got lucky with some searches and found these articles by Justin Rogers from the IE team:

This part of the “Implied tags” article seemed to explain what was going on:

The set of implied rules has impacts in other areas as well. You can, for instance, end up using document.writeln to prematurely terminate your HEAD element and move a bunch of stuff out into the BODY. So, if you are doing inline document writes you should probably do them where you want the content to go. Writing the content out in script blocks that appear in the head is the wrong way to go about it. You could hook up to some events or have a container element that you write into, and that is acceptable, but with inline writes you could get unexpected behavior.

Recently I noticed a site that was doing a document.writeln in their HEAD element about half-way through the head content. End result? Well, the content got moved into the BODY element and the object model tree for the page was completely wrong. Good thing they weren’t navigating the object model looking for stuff and good thing the extra META/LINK elements weren’t being used as well. With a static parse of the page you wouldn’t even notice these problems, but when DHTML becomes involved it can change the structure of your document on the fly and rewrite what the object model tree looks like.

Indeed, this site runs on Drupal, and Drupal 4.6.x does use a <base> tag. (The forthcoming Drupal 4.7 eliminates the <base> tag.)

Justin’s examples showed an unclosed <base> tag like this:

<base href="foo">

That should be OK; the W3C HTML specification defines the BASE element as EMPTY, so it shouldn’t require any kind of closing tag (except for XHTML compatibility).

The <base> tag that Drupal generates is self-closing in the XHTML style:

<base href="https://mg.to/" />

However, IE6 does not seem to recognize that the tag is closed (or EMPTY), and it puts everything after that inside the BASE element.

(You may also notice that strictly speaking, Drupal’s <base> tag is incorrect. It should include a filename, but it seems to work OK without it—except for the IE problem.)

On a hunch, I tried closing the tag the old fashioned way:

<base href="https://mg.to/"></base>

and presto! Everything started working, and DebugBar revealed that the <head> and <body> elements were siblings, both direct children of <html> as expected.

The bottom line: Every HTML document in the world that uses a <base> tag is being parsed in this odd way by IE, unless an explicit closing </base> tag is used. It doesn’t affect ordinary HTML rendering, but any kind of DOM manipulation may go haywire.

Here are the three test cases. First, the unclosed/empty <base> tag:

Unclosed BASE tag

The XHTML-style <base /> tag is no better:

Self-closing BASE tag

And the one that works, with a closing </base> tag:

BASE tag with closing tag

There is one remaining problem. If you validate your pages as HTML 4.01 Transitional, there is no way to use a <base> tag that works correctly in IE and also validates. The validator barfs on the closing </base> tag, because it figures the tag is already closed (being an EMPTY element).

If you use XHTML 1.0 (either Transitional or Strict), then you can use the closing </base> tag and it will validate. Since most people who validate their pages are probably using XHTML anyway, this shouldn’t be a problem for many.

However, the W3C’s XHTML/HTML Compatibility Guidelines offer this warning:

…use the minimized tag syntax for empty elements, e.g. <br />, as the alternative syntax <br></br> allowed by XML gives uncertain results in many existing user agents.

Well, that’s just great. The only syntax that works in IE and validates is <base></base>, but W3C warns against it. I haven’t actually seen any problems caused by using this syntax with the <base> tag, though, even in old browsers. So for now, I’m using it and hoping for the best.

Major thanks are due to the DebugBar for pointing me toward the problem, and Justin Rogers for explaining it.

Easy DOM creation for jQuery and Prototype

Michael Geary | Mon, 2006-02-27 11:07

Here is a jQuery plugin that makes it easy to build up a tree of DOM nodes. It lets you write code like this:

var table =
   $.TABLE({ Class:"MyTable" },
      $.TBODY({},
         $.TR({ Class:"MyTableRow" },
            $.TD({ Class:"MyTableCol1" }, 'howdy' ),
            $.TD({ Class:"MyTableCol2" },
               'Link: ',
               $.A({ Class:"MyLink", href:"https://www.example.com" },
                  'example.com'
               )
            )
         )
      )
   );

Basically, each function such as $.TABLE creates a DOM node and takes the following arguments:

The first argument is an object that list any attributes to be set on the node. You can specify the className attribute in a few different ways depending on your taste:

  className: text
  Class: text
  'class': text

Any additional arguments after the first one represent child nodes to be created and appended. These arguments can be DOM elements themselves (e.g. inline $.FOO calls as above), or they can be numbers or strings which are converted to text nodes, or they can be arrays, in which case each element of the array is handled in this same way.

This interface is inspired by Bob Ippolito’s MochiKit DOM API, although it doesn’t implement all of the features of that one (yet).

The code predefines most of the common tags; you can add additional tags by calling:

$.defineTag( tagName )// e.g. $.defineTag( 'dd' );
 

Or simply add the tag names to the tags list in the code.

One last definition is $.NBSP which defines a non-breaking space (same as &nbsp; in HTML).

$._createNode is an internal helper function used by the $.FOO functions. I would have hidden it away as a nested function, but I wanted to avoid any unnecessary closures.

This code doesn’t actually depend on any of the features of jQuery except for the presence of the $ function—and it uses $ only as a way to avoid cluttering the global namespace. I haven’t tested it with Prototype.js, but it should work equally well there. Or the code can be used with no library, by preceding it with:

var $ = {};

Here is the source code, or you can download it:

// DOM element creator for jQuery and Prototype by Michael Geary
// https://mg.to/topics/programming/javascript/jquery
// Inspired by MochiKit.DOM by Bob Ippolito
// Free beer and free speech. Enjoy!

$.defineTag = function( tag ) {
    $[tag.toUpperCase()] = function() {
        return $._createNode( tag, arguments );
    }
};

(function() {
    var tags = [
        'a', 'br', 'button', 'canvas', 'div', 'fieldset', 'form',
        'h1', 'h2', 'h3', 'hr', 'img', 'input', 'label', 'legend',
        'li', 'ol', 'optgroup', 'option', 'p', 'pre', 'select',
        'span', 'strong', 'table', 'tbody', 'td', 'textarea',
        'tfoot', 'th', 'thead', 'tr', 'tt', 'ul' ];
    for( var i = tags.length - 1;  i >= 0;  i-- ) {
        $.defineTag( tags[i] );
    }
})();

$.NBSP = '\u00a0';

$._createNode = function( tag, args ) {
    var fix = { 'class':'className', 'Class':'className' };
    var e;
    try {
        var attrs = args[0] || {};
        e = document.createElement( tag );
        for( var attr in attrs ) {
            var a = fix[attr] || attr;
            e[a] = attrs[attr];
        }
        for( var i = 1;  i < args.length;  i++ ) {
            var arg = args[i];
            if( arg == null ) continue;
            if( arg.constructor != Array ) append( arg );
            else for( var j = 0;  j < arg.length;  j++ )
                append( arg[j] );
        }
    }
    catch( ex ) {
        alert( 'Cannot create <' + tag + '> element:\n' +
            args.toSource() + '\n' + args );
        e = null;
    }

    function append( arg ) {
        if( arg == null ) return;
        var c = arg.constructor;
        switch( typeof arg ) {
            case 'number': arg = '' + arg;  // fall through
            case 'string': arg = document.createTextNode( arg );
        }
        e.appendChild( arg );
    }

    return e;
};

Use FILE_SHARE_DELETE in your shell extension

Michael Geary | Wed, 2004-09-29 18:24
category

A Windows shell extension that provides information from the contents of a file has to open the file to do it. Opening a file locks it to some extent or another, depending on the file sharing flags you use. Even if you open the file for only a moment, that can be long enough to interfere with another program's use of the file.

What happens when someone drags a file from one folder to another, and you have a shell extension that renders thumbnails for the selected file type? Windows calls your IExtractImage interface and you start rendering the thumbnail. Then as soon as your customer releases the mouse, Windows tries to move the file to the new folder. If they move fast enough, this can happen while you've still got the file open to render the thumbnail. That results in this lovely message:

Error Moving File

If they're lucky, they'll try again and go a little slower, and it will work! You've finished rendering the thumbnail, closed the file, and Windows can move it with no problem.

There's an easy way to fix this for Windows NT, 2000, and XP. In the CreateFile() call that opens the file, use FILE_SHARE_READ | FILE_SHARE_DELETE in the dwShareMode parameter.

The MSDN documentation doesn't make it clear at all, but FILE_SHARE_DELETE works with MoveFile() in the same way it does with DeleteFile(). In other words, it gives you Unix-style delete/rename semantics. Even while you have the file open, Windows can delete it or rename it right out from under you, but you can keep reading it—your handle to the file remains valid until you close it.

So, in the case above, Windows moves the file to the destination folder without interference from your thumbnail code.

Mike Mascari ran a test of this and posted the results in the comp.databases.postgresql.hackers newsgroup:

Well, here's the test:

foo.txt contains "This is FOO!"
bar.txt contains "This is BAR!"

Process 1 opens foo.txt
Process 2 opens foo.txt
Process 1 sleeps 7.5 seconds
Process 2 sleeps 15 seconds
Process 1 uses MoveFile() to rename "foo.txt" to "foo2.txt"
Process 1 uses MoveFile() to rename "bar.txt" to "foo.txt"
Process 1 uses DeleteFile() to remove "foo2.txt"
Process 2 awakens and displays "This is FOO!"

On the filesystem, we then have:

foo.txt containing "This is BAR!"

The good news is that this works fine under NT 4 using just MoveFile(). The bad news is that it requires the files be opened using CreateFile() with the FILE_SHARE_DELETE flag set. The C library which ships with Visual C++ 6 ultimately calls CreateFile() via fopen() but with no opportunity through the standard C library routines to use the FILE_SHARE_DELETE flag. And the FILE_SHARE_DELETE flag cannot be used under Windows 95/98 (Bad Parameter). Which means, on those platforms, there still doesn't appear to be a solution. Under NT/XP/2K, AllocateFile() will have to modified to call CreateFile() instead of fopen(). I'm not sure about ME, but I suspect it behaves similarly to 95/98.

Even two years after Mike's post, the C runtime hasn't got much better. The _fsopen() and _sopen() functions claim to support file sharing, but neither one supports FILE_SHARE_DELETE. For a shell extension, that may not matter; you may just use the Win32 file I/O functions directly. If you want the buffering that the stream I/O functions provide, you can use CreateFile() to open the file with FILE_SHARE_DELETE, then _open_osfhandle() to get a C runtime file descriptor, and _fdopen() to open a stream from that.

Sorry, FILE_SHARE_DELETE doesn't work on 95, 98, or Me; you have to leave the flag off. I'm not sure how you fix this problem for those OSes.

Invisible JavaScript functions

Michael Geary | Thu, 2004-09-16 16:24

If you’re writing document-level JavaScript code in a PDF file, any variables you declare or assign to outside a function are created in the global object, which is the PDF document.

Suppose we have a PDF with this code as a document script, which creates three variables in three different ways:

var myVar = 11;    // declare a variable
myNoVar = 22;      // assign to a global variable
this.myProp = 33// create a document property
 

If we load this PDF and examine its variables in the JavaScript console, we see that each of the three variables is defined both as a global variable and as a property of this, the document object. That seems surprising until you consider that in Acrobat JavaScript, the document object is the global object.

myVar
11
this.myVar
11
myNoVar
22
this.myNoVar
22
myProp
33
this.myProp
33
(Keyboard input is in bold, and we type Ctrl+Enter at the end of each line to evaluate the expression.) When you’re writing code, you don’t want to pollute the document object’s namespace needlessly. You can wrap up your code in a function, so that variables you create with a var statement are local to the function. Adobe Acrobat encourages this by creating a function for you when you add a document script. If you use Acrobat’s Document JavaScripts dialog to add a script called DocScript, you’ll get an empty function to fill in:
function DocScript()
{
}
Of course, you need to call this function, so you may end up with code like this:
function DocScript( doc )
{
    var myVar1 = 11;
    myNoVar1 = 22;
    doc.myProp1 = 33;
}

DocScript( this );
Besides creating a scope for local variables, the function lets the code refer to the document object as doc instead of this. We didn’t have to do that, but I like to be able to use doc.whatever instead of this.whatever for document properties. If we look at these variables in the console window, the results are different:
myVar1
ReferenceError: myVar1 is not defined
1:Console:Exec
undefined
this.myVar1
undefined
myNoVar1
22
this.myNoVar1
22
myProp1
33
this.myProp1
33
As expected, myVar1 does not show up either in the global object or the document object; it is local to the function. myVar2 does (because we didn’t use var), as does myProp1 (because we created it as a property of the doc object). The difference between myVar1 and this.myVar1 is expected too. It’s an error to reference a variable name that does not exist, so myVar1 throws a ReferenceError exception. But it’s not an error to reference a nonexistent property of an object—the reference merely returns the undefined value. So this.myVar1 returns undefined without error. But we’ve still put one name in the document object:
DocScript

function DocScript(doc) {
    var myVar1 = 11;
    myNoVar1 = 22;
    doc.myProp1 = 33;
}

this.DocScript

function DocScript(doc) {
    var myVar1 = 11;
    myNoVar1 = 22;
    doc.myProp1 = 33;

}

To avoid even this bit of namespace clutter, we can define an anonymous function and call it on the spot:

(function( doc )
{
    var myVar2 = 11;
    myNoVar2 = 22;
    doc.myProp2 = 33;
}
)( this );

The console results for myVar2, etc. are the same as the previous example with the named function, so I won’t repeat them. But unlike the named function, we haven’t added any symbols at all to the document or global object.

Why the extra parentheses? function(){} is the simplest possible anonymous function, but if we try to call it using function(){}() it’s a syntax error. However, if we wrap the entire function inside a pair of parentheses, then we can follow that with () to call it: (function(){})() is legal JavaScript.

The same trick works when you want to add a nested scope inside a function. The way you would do this in C doesn’t work in JavaScript:

function noscope()
{
    var i = 1;
    console.println( i );

    // This {} does not introduce a scope as in C
    {
        var i = 2;
        console.println( i );
    }

    console.println( i );
}

noscope();

That code prints:

1
2
2

because there is only one variable named i in the entire function, even though we’ve (incorrectly) tried to add an inner scope with its own variable i.

In fact, if you load that function into ActiveState Komodo, it puts a green squiggly line under the inner var i = 2; complaining “strict warning: redeclaration of var i”.

But if we use a nested anonymous function:

function withscope()
{
    var i = 1;
    console.println( i );

    (function()
    {
        var i = 2;
        console.println( i );
    })();

    console.println( i );
}

withscope();

it prints:

1
2
1

which indicates that the nested function introduced a new scope.

Bad Eggs

Michael Geary | Fri, 2004-09-10 09:12
category

Do you have static objects that dynamically allocate memory at startup and carefully free it when your app exits? I’ve used them in a few projects—more than I’d want to admit.

If you’re throwing out a carton of bad eggs, you could take the eggs out of the carton and put them in the trash one by one, followed by the empty carton. But it’s quicker and easier to just toss the whole thing in the trash.

Static objects are bad eggs.

I suppose the C++ runtime has to toss them all in the trash—call their destructors—because it doesn’t know if those destructors really need to be run. But if they’re just going to free memory there ought to be some way to skip all that.