Ruby
Mapping the Votes - resources
I want to thank everyone who came to my Mapping the Votes talk at Google. The talk is available on YouTube - with apologies for the small font size in the code samples!
Here are some links and information that I referred to in the talk.
Maps and mapplets
Decision 2008 - the current election mapplet
Decision 2008 Gadget - the election map as a Google Gadget
Iowa Republican Caucus - an early API map
Iowa mapplet - an early mapplet
Twitter election map - the Super Tuesday twitter map (showing tweets from that day)
Campaign Trail - candidate calendars
New Hampshire in Google Earth - a KML file
Editors and desktop tools
The editor I used for the code samples is the one I use every day, Komodo IDE. Komodo’s debuggers for Ruby, Python, and PHP make it really easy to test my batch/script/server code. I’m especially fond of coding in the debugger. For the code that converts shapefiles and vote data into JSON output, I’d write the input part first, set a breakpoint and stop in the debugger after it reads the data, then write the conversion code with live data to look at while I code. Komodo also has a JavaScript debugger that works equally well, but most of the time I just use Firebug because of its simplicity.
Komodo IDE isn’t cheap, but I figure it paid for itself really fast. There’s also a free Komodo Edit that everyone should install even if you already have a favorite editor. Both versions have real-time syntax checking, where you get squiggly red underlines for syntax errors and squiggly green underlines for warnings, just like the spelling and grammar checkers in a word processor. This has saved me literally thousands of page reloads when testing, since Komodo catches my syntax errors before I even save the file. Komodo runs on Linux, Mac, and Windows.
One nice thing about GUI editors is that the basic editing works the same in all of them (or should), so it’s easy to switch back and forth if some other editor has a feature you want to take advantage of. Besides Komodo, I also use PSPad (free, Windows only), mostly because of its nice HTML/XML pretty-printer. It cleans up unreadable web page source code real quick.
Another expensive-but-well-worth-it tool for Windows and Mac is Araxis Merge, a terrific file compare and merge program with live editing. I use Merge as the diff/merge program for TortoiseSVN, which makes source control a dream.
A couple of free Windows tools I use every day are Zoom+ for screen zooming and my own JKLmouse for precise cursor control with the keyboard of your notebook computer. With JKLmouse, I can use the TrackPoint for fast cursor motion and then the keyboard for fine pixel-by-pixel movement, seamlessly and with no “modes”. (Sorry, I had to brag!)
Source code
The election map code is open source and is in two Google Code projects. The current code is in the primary-maps-2008 project, and the code for earliest caucuses and primaries is in the gmaps-samples project. (We moved the code to a new project to avoid filling up gmaps-samples!)
If you look at the code, go easy on me: much of it was written under severe time pressure. I asked if the elections could be delayed when I wasn’t quite ready, but even the mighty Google couldn’t seem to arrange that.
Also, if you read the code using the links provided here, there’s an awful lot of indentation, thanks to Google Code displaying my tab indentation using 8 spaces per tab. Shades of K&R! (So, why do I use tabs instead of two-space indents like everyone else? Well, one of the other benefits of Komodo is that unlike most code editors, it lets me edit in a proportional font. Two spaces in a proportional font is almost like not indenting at all.)
Shapefiles
Shapefiles are a wacky file format used for geographic data. Be thankful that other people have already written programs to pick them apart, so you and I don’t have to.
At first, I was using shp2text to convert shapefiles to an easy-to-use XML format (using the --gpx
option), but this loses some of the information in the shapefile. More recently, Zachary Forest Johnson, author of the interesting indiemaps blog, wrote shpUtils.py, which decodes shapefiles into usable Python data.
I extended shpUtils.py to calculate correct centroids, area and other information about the shapes, and to fix a few bugs. The updated version is in the primary-maps-2008 project.
Centroids
The election maps use the centroids of the state and county polygons to position markers for those states.
Centroids are one of those things that you think you understand and then find out you were completely wrong. My first guess was the same as Zachary’s, to take the arithmetic mean of all the points (X and Y separately). The Wikipedia article even seems to say this, but it’s talking about the centroid of the points, not the centroid of the polygon that those points define. If you read it carefully, the article does give the correct algorithm, but it’s better explained on this page, along with sample implementations in various languages.
Census bureau shapefiles
The state and county outlines in the election maps come from shapefiles provided by the Census Bureau. Most states report votes by county, but a few New England states report by town (County Subdivisions in the Census Bureau page), and a few other states report by congressional district.
Shapefile simplification
D’oh! I completely forgot to talk about this important topic. The Census Bureau shapefiles have too much detail to be usable in a browser-based map. If you draw polygons from them, it will be much too slow. A tile layer can handle more detail, but the graphic files will be larger than they could be, because of the excess detail.
MapShaper is a free online tool to simplify shapefiles. It is pretty neat—you can see the effect of your simplification in realtime as you try different settings. I used MapShaper for the election maps, with various levels of simplification: simpler for JavaScript and more detailed for tile layers. More recently I discovered the Map Simplification Program which looks ideal for programmed simplification.
The code that processes shapefiles for the election maps is in makepolys.py which generates JSON output, and maketiles.py which generates tiles from that JSON data using ImageMagick.
Votes and delegates
The code to convert vote data from the latest primaries is in voter.py. This processes CSV files provided by the Boston Globe and converts them to JSON data.
Twitter map
The Ruby script that gathers the Twitter updates uses the Jabber::Simple module written by Blaine Cook to create a custom Jabber client that talks to Twitter, and uses the Twittervision API to get geographic information. It parses the XML data with sweet Hpricot, then generates JSON data (but you probably saw that coming). If you like jQuery, you’ll like Hpricot.
Mapplet code
The election mapplet code is in decision2008.xml and map.js. The code for the Campaign Trail mapplet is in campaign-trail.xml and campaign-trail.js. The latter file has the latest versions of the Array.mapjoin()
, Array.index()
, Object.sort()
, S()
, and related functions that I talked about. They are at the top of the file, and not yet documented, but you can find examples of each in the code.
More to come
That’s it for now! I’ll be posting more detailed articles on some of these topics. If there is a particular area you’re interested in, please let me know in the comments.
Thanks!
Ruby iterators and C callback functions
Mike Sax wonders what’s the fuss about iterators. Aren’t they just a fancy use of function pointers? Indeed, Mike has hit the nail on the head. Consider the window iterator that’s been built into Windows since 1.0:
This function iterates through all of the top-level windows (children of the desktop window) and calls lpEnumFunc
for each one, passing it the HWND
of each window and the lParam
that you passed to EnumWindows
.
So lParam
is how you get to provide some state that the enumeration function can make use of. Suppose you wanted to write a function that counted the number of visible top-level windows. Your C code might look like this:
This works, but it is rather tedious. So Windows 2.0 added the GetWindow
function, which lets you simply ask for a window’s child or next sibling. That simplifies the overall structure of the code, especially if you use the GetFirstChild
and GetNextSibling
macros defined in windowsx.h
:
That’s it, just one function, no callback function or struct definition needed. We don’t need the struct because the code inside the loop can directly reference the nVisible
variable defined in the function.
But the simplification came at a price: We had to write the loop ourselves, asking explicitly for the first child of the desktop window and then the next sibling of each child window.
Also, it doesn’t work.
What if another application creates or destroys a top-level window, or just changes a window’s Z-order, while you’re in the loop? You’ll either miss a window, count one twice, or crash with an invalid window handle.
To handle these cases, you need a bit more complexity. If you had a way to temporarily lock all window creation and destruction, you could quickly create a list of all the windows and then release the lock, then enumerate from that list, perhaps also doing a last-minute check when you enumerate each window to skip any that get destroyed during enumeration. Or, you might set a Windows hook to notify you of any windows created, destroyed, or moved in the Z-order, so you could deal with them appropriately.
Whatever you did, it would be enough code that you wouldn’t want to duplicate it each time you wanted to write a window loop. The GetFirstChild
/GetNextSibling
style of loop doesn’t really facilitate that kind of code isolation. The EnumWindows
style enumerator completely separates the code that does the iterating (EnumWindows
itself) from the code that receives the iteration (your callback function). But, it makes it harder to share state between the callback function and the code that called EnumWindows
.
If you had a way to use a callback function, but have it more easily share state with the calling function, you’d have a winner. In C# and JavaScript, you can do this by using an anonymous callback function nested inside the surrounding code. Because of lexical scoping, the callback function can access variables in the parent function as easily as it can access its own.
Both those language have enough extra syntactic cruft that when you look at a simple example using nested anonymous functions, it’s easy to be unimpressed. The payoff shows up in more complicated, real-life coding situations.
Code blocks in Ruby simplify this technique down to its essence, making it useful even for simple cases. Assuming a good Rubyesque Windows interface library, our function might be something like:
In this code, the enumWindows
function takes a code block argument and calls that code block for each window, passing it the window as an argument. Because the code block is nested inside the countVisibleWindows
function, it can access the nVisible
variable directly.
This solves both our problems: The logic for iterating through the windows is separated out into the enumWindows
function, and the callback function (code block) can access state variables cleanly and easily.
(In Ruby, a code block is a like a callback function, but it’s not quite a full-fledged function. A code block does not introduce a new scope for variablesit shares the scope of the enclosing function.)
Unfortunately, Ruby does not seem to have a Windows interface library that works like this. Ruby’s standard Win32
module provides a general way to call Windows DLL functions, but it doesn’t have a clean implementation of enumWindows
that uses a code block.
However, MoonWolf has written a Ruby port of Perl’s Win32::GuiTest module that includes this kind of enumWindows
function. It’s implemented in two parts: a low level function written in C that enumerates HWND
values, and a higher level function written in Ruby that constructs Ruby window
objects and enumerates them. The window
object in Win32::GuiTest
is a fairly thin wrapper that encapsulates an HWND
and other window information.
The high-level enumWindows
looks like this:
This code calls the low-level _enumWindows
function, which passes an HWND
to the code block enclosed in curly braces. This code block creates the window
object, appends to the ret
array, and also yields the window object to a code block that was provided by the caller of enumWindows
.
If I were implementing this, I think I would change it a bit. Typically a function like this either yields results to a code block, or it returns a value, but not both. And I would change the confusingly named createWindow
function (which has no relation to the CreateWindow
function in Windows):
Either way, our countVisibleWindows
example ends up pretty much as I’d imagined:
The low-level enumWindows
function that enumerates HWND
values is implemented in C. The initialization code to add the enumWindows
function is simply:
where mGuiTest
is a reference to the Win32::GuiTest
module.
The guitest_enumWindows
function is:
and the EnumWindowsProc
callback is:
This shows how easy it is to extend Ruby with C code, adding functions that work just like ones written in Ruby.
So, how do all the calls and callbacks stack up when we run the countVisibleWindows
function? Something like this:
enumWindows
_enumWindows
EnumWindows
EnumWindowsProc
rb_yield
(code block in enumWindows)
yield
(code block in
countVisibleWindows)
In everyday use, of course, you don’t worry about that whole call stack, just the part of it you’re working with.
More C#, Ruby, and Python Iterators, and JavaScript too
Making a valiant attempt to post code with my comment system (Sorry, Mike! :-(), Mike Roome points out:
The ruby example isn
Iterators in C#, Python, and Ruby
Matt Pietrek marvels at C# 2.0 iterators and dissects them right down to the CLR bytecode. I always learn something from Matt, and this whirlwind tour is no exception.
Matt says, “This was the beginning of my descent into the loopy world of C# 2.0 iterators. It took me awhile to wrap my head around them, and when I tried to explain them to other team members I got looks of total confusion.” I wonder if it would have been less confusing if Matt’s team had first been exposed to yield
iterators in a language that makes them easier to use.
After using Python and Ruby, the iterators in C# feel right at home to me. They work the same in all three languages, but in Ruby and Python there’s not as much other code to get in the way of understanding them.
Let’s combine all of Matt’s examples into one, and compare the code in each language. First, in C#:
When you run that, it should print:
First
After First
0
1
2
Before Last
Last
After Last
Here’s how you would write the same code in Python:
And in Ruby, the code looks like this:
The one unfamiliar thing here may be the |name|
notation, which is how a code block such as the body of a loop receives its argument. And the p
statements are a kind of print statement.
This Ruby version is even more concise and equally readable once you’re comfortable with the |name|
notation:
Either way, the Python and Ruby versions make it easier to see what the iterator function does and how yield
interacts with the rest of the code.
You may note that the Python and Ruby versions don’t create and instantiate a SomeContainer
class as the C# version does. That’s true, and it would make the code in those languages a bit longer (but still simpler than the C# code). But, if you don’t need to—and you especially don’t need to when you’re experimenting and trying to understand a radical new technique like yield
iterators—why bother?