Infrequently Noted

Alex Russell on browsers, standards, and the process of progress.

Census 2: More Than Just A Pretty Graph

Benchmarks are hard, particularly for complex systems. As a result, the most hotly contested benchmarks tend not to be representative of what makes systems faster for real users. Does another 10% on TPC really matter to most web developers? And should we really pay any attention to how any JS VM does on synthetic language benchmarks?

Maybe.

These things matter only in regards to how well they represent end-user workloads and how trustworthy their findings are. The first is much harder than the second, and end-to-end benchmarking is pretty much the only way to get there. As a result, sites like Tom's Hardware focus on application-level benchmarks while still publishing "low level" numbers. Venerable test suites like SPECint have even moved toward running "full stack" style benchmarks which may emphasize a particular workload but are broad enough to capture the wider system effects which matter in the real world.

Marketing departments also like small, easily digestible, whole numbers. Saying something like "200% Faster!" sure sounds a lot better than "on a particular test which is part of a larger suite of tests, our system ran in X time vs. Y time for the competitor". Both may be true, but the second statement gives you some context. Preferably even that statement would occur above an actual table of numbers or graphs. Numbers without context are lies waiting to be repeated.

With all of this said, James Ward's Census benchmark makes a valiant stab at a full-stack test of data loading and rendering performance for RIA technologies. Last month Jared dug further into the numbers and found the methodology wanting, but given some IP issues couldn't patch the sources himself. Since I wasn't encumbered in the same way I thought I might as well try my hand at it, but after hours of attempting to get the sources to build, I finally gave up and decided to re-write the tests. The result is Census 2.

There are several goals of this re-write:

The results so far have been instructive. On smaller data sets HTML wins hands-down for time-to-render, even despite its disadvantage in over-the-wire size. For massive data sets, pagination saves even the most feature-packed of RIA Grids, allowing the Dojo Grid to best even XSLT and a more compact JSON syntax. Of similar interest is the delta between page cycle times on modern browsers vs their predecessors. Flex can have a relatively even performance curve over host browsers, but the difference between browsers today is simply stunning.

Given the lack of an out-of-the-box paginating data store for Flex, RIAs built on that stack seem beholden to either Adobe's LCDS licensing or are left to build ad-hoc pagination into apps by hand to get reasonable performance for data-rich business applications. James Ward has already exchanged some mail with me on this topic and it's my hope that we can show how to do pagination in Flex without needing LCDS in the near future.

The tests aren't complete. There's still work to do to get some of the SOAP and AMF tests working again. If you have ideas about how to get this done w/o introducing a gigantic harball of a Java toolchain, I'm all ears. Also on the TODO list is an AppEngine app for recording and analyzing test runs so that we can say something interesting about performance on various browsers.

Census 2 is very much an open source project and so if you'd like to get your library or technology tested, please don't hesitate to send me mail or, better yet, attach patches to the Bug Tracker.

Update: I failed to mention earlier that one of the largest changes in C2 vs. Census is that we report full page cycle times. Instead of reporting just the "internal" timings of an RIA which has been fully boostrapped, the full page times report the full time from page loading to when the output is responsive to user action. This keeps JavaScript frameworks (or even Flex) from omitting from the reports the price that users pay to download their (often sizable) infrastructure. There's more work to do in reporting overall sizes and times ("bandwidth" numbers don't report gzipped sizes, e.g.), but if you want the skinny on real performance, scroll down to the red bars. That's where the action is.