Infrequently Noted

Alex Russell on browsers, standards, and the process of progress.

Half Lives

I'm headed to Austin soon for spring break SxSWi, and this year I'm lucky and grateful to be representing Chrome on the always-packed browser panel (more usable Lanyrd talk page here). The context for this year's panel is interesting to me -- a couple of years into a renewed era of browser competition, users have more choice but developers are still struggling with the same landscape, even as HTML5 starts to materialize as the platform of choice for most apps -- even the ones wrapped up in native wrappers to jump the various app-store-form distribution hurdles.

It's good to see MSFT belatedly trying to put IE6 out to pasture, but what about IE 7? Or 8? Lets take stock of where we really are and where we're likely to be in the next couple of years. First, remember that there's no IE 9 for Windows XP -- an OS that's currently the most popular in the world -- and no matter what happens with IE 6, IE 8 is the end of the upgrade road for XP. Unless you think half of the world's computers will be replaced/upgraded in the next couple of years, it seems likely that IE 8 will be with us for the foreseeable future.

And what about the folks who do get IE 9? Well, so far, there's nothing to make me believe that the uptake rate will be anything better than the IE 8 transition; a process which has taken 2 years to give ~30% of the market the latest version. If anything, we should expect that rate to be retarded somewhat by the XP hurdle.

MSFT's browser replacement rates bear understanding because they're the most popular and suffer from the longest half-lives. That is to say, the time it takes for an old version of IE to decay in the wild is much, much higher than for other browsers. Some part of this is surely due to sheer market share, but not all of it. The XP hurdle, for instance, is a form of structural drag on uptake rates -- a flaw that browsers that aren't tightly tied to OSes don't suffer from. For web developers, I dare say that half-life of popular browsers matters much, much more than the current or trending market share since it's predictive of our potential for browser improvement in the near future. It's one thing to get the new shiny, but how long will it take you to install it? If the shiny is old and dingy by the time it's in place, what good is that? It's this lens that makes browser market share stats interesting to me; i.e., what percentage of the web's users will get the new features soonest? 'Cause those are the folks we can start building super compelling content for.

The average half-life of the majority of browsers in the wild also gates the rate of progress in standards. When the process is working well, bugs in browsers or pre-standards implementations of features aren't a permanent features of the landscape. Instead, they're the understandable and inevitable result of a process that prioritizes implementation experience and iteration over raw compliance with an academic spec that may or may not actually get it right on the first go 'round. But that iterative, feedback-rich process only works when browsers iterate quickly and web developers can target the future without thinking so hard about the past, else progress simply turns into something to resent and distrust. That's good for no one, and a shorter half-life is the key to making progress more than just a spec-tease.

I'm personally hopeful that when IE 9 is finally RTM'd, that it includes some provisions for shortening its life expectancy in the ways that Chrome and Firefox have through aggressive auto-updating. Getting IE 9 out to the world will be a good thing, but only if it happens quickly and if IE 10 can follow it even faster.

There's obviously a lot more to talk about at the browser panel -- Chrome 10 just launched with Crankshaft, for instance -- but the fact that nearly every Chrome user will have those improvements this week and that if you're building a Chrome Web Store app, you'll get to target those improvements nearly instantly seems like the biggest, most interesting change from where we were just a couple of years ago.

Performance Innumeracy & False Positives

tl;dr version: the web is waaaay too slow, and every time you write something off as "just taking a couple of milliseconds", you're part of the problem. Good engineering is about tradeoffs, and all engineering requires environmental assumptions -- even feature testing. In any case, there are good, reliable ways to use UA detection to speed up feature tests in the common case, which I'll show, and to which the generic arguments about UA vs. feature testing simply don't apply. We can and should go faster. Update: Nicholas Zackas explains it all, clearly, in measured form. Huzzah!

Performance Innumeracy

I want to dive into concrete strategies for low-to-no false positive UA matching for use in caching feature detection results, but first I feel I need to walk back to some basics since I've clearly lost some people along the way. Here are some numbers every developer (of any type) should know, borrowed from Peter Norvig's indispensable "Teach Yourself To Program In Ten Years":

Approximate timing for various operations on a typical PC:

execute typical instruction 1/1,000,000,000 sec = 1 nanosec
fetch from L1 cache memory 0.5 nanosec
branch misprediction 5 nanosec
fetch from L2 cache memory 7 nanosec
Mutex lock/unlock 25 nanosec
fetch from main memory 100 nanosec
send 2K bytes over 1Gbps network 20,000 nanosec
read 1MB sequentially from memory 250,000 nanosec
fetch from new disk location (seek) 8,000,000 nanosec
read 1MB sequentially from disk 20,000,000 nanosec
send packet US to Europe and back 150 milliseconds = 150,000,000 nanosec

That data's a bit old -- 8ms is optimistic for a HD seek these days, and SSD changes things -- but the orders of magnitude are relevant. For mobile, we also need to know:

fetch from flash storage 1,300,000 nanosec
60hz time slice 16,000,000 nanosec
send packet outside of a (US) mobile carrier network and back 80-800 milliseconds = 80,000,000 - 800,000,000 nanosec

The 60hz number is particularly important. To build UI that feels not just fast, but instantly responsive, we need to be yielding control back to our primary event loop in less than 16ms, all the time, every time. Otherwise the UI will drop frames and the act of clicking, tapping, and otherwise interacting with the app will seem "laggy" or "janky". Framing this another way, anything your webapp blocks on for more than 16ms is the enemy of solid, responsive UI.

Why am I blithering on and on about this? Because some folks continue to mis-prioritize the impact of latency and performance on user satisfaction. Google (my employer, who does not endorse this blog or my private statements in any way) has shown that seemingly minor increases in latency directly impact user engagement and that major increases in latency (> 500ms) can reduce traffic and revenue significantly. Latency then, along with responsiveness (do you drop below 60hz?), is a key metric for measuring the quality of an web experience. It's no accident that Google employs Steve Souders to help evangelize the cause of improving performance on the web, and has gone so far as to build products like Chrome & V8 who have as a core goal to the web faster. A faster web is a better web. Full stop.

That's why I get so deeply frustrated when we get straw-man based, data-challenged advocacy from the maintainers of important bits of infrastructure:

This stuff is far from easy to understand; even just the basics of feature detection versus browser detection are quite confusing to some people. That’s why we make libraries for this stuff (and, use browser inference instead of UA sniffing). These are the kind of efforts that we need, to help move the web forward as a platform; what we don’t need is more encouragement for UA sniffing as a general technique, only to save a couple of milliseconds. Because I can assure you that the Web never quite suffered, technologically, from taking a fraction of a second longer to load.

What bollocks. Not only did I not encourage UA sniffing "as a general technique", latency does in fact hurt sites and users -- all the time, every day. And we're potentially not talking about "a couple of milliseconds" here. Remember, in the context of mobile devices, the CPUs we're on are single-core and clocked in the 500mhz-1ghz range, which directly impacts the performance of single-threaded tasks like layout and JavaScript execution -- which by the way happen in the same thread. In my last post I said:

...if you’re a library author or maintainer, please please please consider the costs of feature tests, particularly the sort that mangle DOM and or read-back computed layout values

Why? Because many of these tests inadvertently force layout and style re-calculation. See for instance this snippet from has.js:

if(has.isHostType(input, "click")){
  input.type = "checkbox";
  input.style.display = "none";
  input.onclick = function(e){
    // ...
  };
  try{
    de.insertBefore(input, de.firstChild);
    input.click();
    de.removeChild(input);
  }catch(e){}
  // ...
}

Everything looks good. The element is display: none; so it shouldn't be generating render boxes when inserted into the DOM. Should be cheap, right? Well, lets see what happens in WebKit. Debugging into a simple test page with equivalent code shows that part of the call stack looks like:

#0	0x0266267f in WebCore::Document::recalcStyle at Document.cpp:1575
#1	0x02662643 in WebCore::Document::updateStyleIfNeeded at Document.cpp:1652
#2	0x026a89fd in WebCore::MouseRelatedEvent::receivedTarget at MouseRelatedEvent.cpp:152
#3	0x0269df03 in WebCore::Event::setTarget at Event.cpp:282
#4	0x026af889 in WebCore::Node::dispatchEvent at Node.cpp:2604
#5	0x026adbcb in WebCore::Node::dispatchMouseEvent at Node.cpp:2885
#6	0x026ae231 in WebCore::Node::dispatchSimulatedMouseEvent at Node.cpp:2816
#7	0x026ae3f1 in WebCore::Node::dispatchSimulatedClick at Node.cpp:2837
#8	0x02055bb5 in WebCore::HTMLElement::click at HTMLElement.cpp:767
#9	0x022587e6 in WebCore::HTMLInputElementInternal::clickCallback at V8HTMLInputElement.cpp:707
...

Document::recalcStyle() can be very expensive, and unlike painting, it blocks input and other execution. And the cost is at page loading is likely to be much higher than other times as there will be significantly more new styles streamed in from the network to satisfy for each element in the document when this is called. This isn't a full layout, but it's most of the price of one. Now, you can argue that this is a WebKit bug and I'll agree -- synthetic clicks should probably skip this -- but I'm just using this as an illustration to show that what browsers are doing on your behalf isn't always obvious. Once this bug is fixed, this test may indeed be nearly free, but it's not today. Not by a long shot.

Many layouts in very deep and "dirty" DOMs can take ten milliseconds or more, and if you're doing it from script, you're causing the system to do lots of work which it's probably going to need to throw away later when the rest of your markup and styles show up. Your average, dinky test harness page likely under-counts the cost of these tests, so when someone tells me "oh, it's only 30ms", not only do my eyes bug out at the double-your-execution-budget-for-anything number, but also the knowledge that in the real world, it's probably a LOT worse. Just imagine this happening in a deep DOM on a low-end ARM-powered device where memory pressure and a single core are conspiring against you.

False Positives

My last post concerned how you can build a cache to eliminate many of these problems if and only if you build UA tests that don't have false positives. Some commenters can't seem to grasp the subtlety that I'm not advocating for the same sort of lazy substring matching that has deservedly gotten such a bad rap.

So how would we build less naive UA tests that can have feature tests behind them as fallbacks? Lets look at some representative UA strings and see if we can't construct some tests for them that give us sub-version flexibility but won't pass on things that aren't actually the browsers in question:

IE 6.0, Windows:

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)

FF 3.6, Windows:

Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.2.13) Firefox/3.6.13

Chrome 8.0, Linux:

Mozilla/5.0 (X11; U; Linux x86_64; en-US) AppleWebKit/534.10 (KHTML, like Gecko) Ubuntu/10.10 Chromium/8.0.552.237 Chrome/8.0.552.237 Safari/534.10

Safari 5.0, Windows:

Mozilla/5.0 (Windows; U; Windows NT 6.1; sv-SE) AppleWebKit/533.19.4 (KHTML, like Gecko) Version/5.0.3 Safari/533.19.4

Some features start to jump out at us. The "platform" clauses -- that bit in the parens after the first chunk -- contains a lot of important data and a lot of junk. But the important stuff always comes first. We'll need to allow but ignore the junk. Next, stuff after platform clauses is good, has defined order, and can be used to tightly form a match for browsers like Safari and Chrome. With this in mind, we can create some regexes that don't allow much in the way of variance but do allow sub-minor version to match so we don't have to update these every month or two:

IE60 = /^Mozilla\/4\.0 \(compatible; MSIE 6\.0; Windows NT \d\.\d(.*)\)$/;
FF36 = /^Mozilla\/5\.0 \(Windows; U;(.*)rv\:1\.9\.2.(\d{1,2})\)( Gecko\/(\d{8}))? Firefox\/3\.6(\.\d{1,2})?( \(.+\))?$/;
CR80 = /^Mozilla\/5\.0 \((Windows|Macintosh|X11); U;.+\) AppleWebKit\/534\.10 \(KHTML\, like Gecko\) (.+)Chrome\/8\.0\.(\d{3})\.(\d{1,3}) Safari\/534\.10$/;

These look pretty wordy, and they are, because they're designed NOT to let through things that we don't really understand. This isn't just substring matching on the word "WebKit" or "Chrome", this is a tight fit against the structure of the entire string. If it doesn't fit, we don't match, and our cache doesn't get pre-populated. Instead, we do feature detection. Remember, false positives here are the enemy, so we're using "^" and "$" matches to ensure that the string has the right structure all the way through, not just at some random point in the middle, which UA's that parade around as other browsers tend to do.

Here's some sample code that incorporates the approach:

(function(global){

// The map of available tests var featureTests = { "audio": function() { var audio = document.createElement("audio"); return audio && audio.canPlayType; }, "audio-ogg": function() { /.../ } // ... };

// A read-through cache for test results. var testCache = {};

// An (exported) function to run/cache tests global.ft = function(name) { return testCache[name] = (typeof testCache[name] == "undefined") ? featureTestsname : testCache[name]; };

// Tests for 90+% of current browser usage

var ua = (global.navigator) ? global.navigator.userAgent : "";

// IE 6.0/WinXP: var IE60 = /^Mozilla/4.0 (compatible; MSIE 6.0; Windows NT \d.\d(.))$/; if (ua.search(IE60) == 0) { testCache = { "audio": 1, "audio-ogg": 0 / ... */ }; }

// IE 7.0 // ... // IE 8.0 // ...

// IE 9.0 (updated with fix from John-David Dalton) var IE90 = /^Mozilla/5.0 (compatible; MSIE 9.0; Windows NT \d.\d(.))$/; if (ua.search(IE90) == 0) { testCache = { "audio": 1, "audio-ogg": 0 / ... */ }; }

// Firefox 3.6/Windows var FF36 = /^Mozilla/5.0 (Windows; U;(.)rv:1.9.2.(\d{1,2}))( Gecko/(\d{8}))? Firefox/3.6(.\d{1,2})?( (.+))?$/; if (ua.search(FF36) == 0) { testCache = { "audio": 1, "audio-ogg": 1 / ... */ }; }

// Chrome 8.0 var CR80 = /^Mozilla/5.0 ((Windows|Macintosh|X11); U;.+) AppleWebKit/534.10 (KHTML, like Gecko) (.+)Chrome/8.0.(\d{3}).(\d{1,3}) Safari/534.10$/; if (ua.search(FF36) == 0) { testCache = { "audio": 1, "audio-ogg": 1 /* ... */ }; }

// Safari 5.0 (mobile) var S5MO = /^Mozilla/5.0 (iPhone; U; CPU iPhone OS \w+ like Mac OS X; .+) AppleWebKit/(\d{3,}).(\d+).(\d+) (KHTML, like Gecko) Version/5.0(.\d{1,})? Mobile/(\w+) Safari/(\d{3,}).(\d+).(\d+)$/; if (ua.search(FF36) == 0) { testCache = { "audio": 1, "audio-ogg": 0 /* ... */ }; }

// ...

})(this);

New versions of browsers won't match these tests, so we won't break libraries in the face of new UAs -- assuming the feature tests also don't break, which is a big if in many cases -- and we can go faster for the majority of users. Win.

Older Posts

Newer Posts