Infrequently Noted

Performance Innumeracy & False Positives

February 3, 2011

tl;dr version: the web is waaaay too slow, and every time you write something off as "just taking a couple of milliseconds", you're part of the problem. Good engineering is about tradeoffs, and all engineering requires environmental assumptions -- even feature testing. In any case, there are good, reliable ways to use UA detection to speed up feature tests in the common case, which I'll show, and to which the generic arguments about UA vs. feature testing simply don't apply. We can and should go faster. Update: Nicholas Zackas explains it all, clearly, in measured form. Huzzah!

Performance Innumeracy

I want to dive into concrete strategies for low-to-no false positive UA matching for use in caching feature detection results, but first I feel I need to walk back to some basics since I've clearly lost some people along the way. Here are some numbers every developer (of any type) should know, borrowed from Peter Norvig's indispensable "Teach Yourself To Program In Ten Years":

Approximate timing for various operations on a typical PC:

execute typical instruction 1/1,000,000,000 sec = 1 nanosec

fetch from L1 cache memory 0.5 nanosec

branch misprediction 5 nanosec

fetch from L2 cache memory 7 nanosec

Mutex lock/unlock 25 nanosec

fetch from main memory 100 nanosec

send 2K bytes over 1Gbps network 20,000 nanosec

read 1MB sequentially from memory 250,000 nanosec

fetch from new disk location (seek) 8,000,000 nanosec

read 1MB sequentially from disk 20,000,000 nanosec

send packet US to Europe and back 150 milliseconds = 150,000,000 nanosec

That data's a bit old -- 8ms is optimistic for a HD seek these days, and SSD changes things -- but the orders of magnitude are relevant. For mobile, we also need to know:

fetch from flash storage	1,300,000 nanosec
60hz time slice	16,000,000 nanosec
send packet outside of a (US) mobile carrier network and back	80-800 milliseconds = 80,000,000 - 800,000,000 nanosec

The 60hz number is particularly important. To build UI that feels not just fast, but instantly responsive, we need to be yielding control back to our primary event loop in less than 16ms, all the time, every time. Otherwise the UI will drop frames and the act of clicking, tapping, and otherwise interacting with the app will seem "laggy" or "janky". Framing this another way, anything your webapp blocks on for more than 16ms is the enemy of solid, responsive UI.

Why am I blithering on and on about this? Because some folks continue to mis-prioritize the impact of latency and performance on user satisfaction. Google (my employer, who does not endorse this blog or my private statements in any way) has shown that seemingly minor increases in latency directly impact user engagement and that major increases in latency (> 500ms) can reduce traffic and revenue significantly. Latency then, along with responsiveness (do you drop below 60hz?), is a key metric for measuring the quality of an web experience. It's no accident that Google employs Steve Souders to help evangelize the cause of improving performance on the web, and has gone so far as to build products like Chrome & V8 who have as a core goal to the web faster. A faster web is a better web. Full stop.

That's why I get so deeply frustrated when we get straw-man based, data-challenged advocacy from the maintainers of important bits of infrastructure:

This stuff is far from easy to understand; even just the basics of feature detection versus browser detection are quite confusing to some people. That’s why we make libraries for this stuff (and, use browser inference instead of UA sniffing). These are the kind of efforts that we need, to help move the web forward as a platform; what we don’t need is more encouragement for UA sniffing as a general technique, only to save a couple of milliseconds. Because I can assure you that the Web never quite suffered, technologically, from taking a fraction of a second longer to load.

What bollocks. Not only did I not encourage UA sniffing "as a general technique", latency does in fact hurt sites and users -- all the time, every day. And we're potentially not talking about "a couple of milliseconds" here. Remember, in the context of mobile devices, the CPUs we're on are single-core and clocked in the 500mhz-1ghz range, which directly impacts the performance of single-threaded tasks like layout and JavaScript execution -- which by the way happen in the same thread. In my last post I said:

...if you’re a library author or maintainer, please please please consider the costs of feature tests, particularly the sort that mangle DOM and or read-back computed layout values

Why? Because many of these tests inadvertently force layout and style re-calculation. See for instance this snippet from has.js:

if(has.isHostType(input, "click")){
  input.type = "checkbox";
  input.style.display = "none";
  input.onclick = function(e){
    // ...
  };
  try{
    de.insertBefore(input, de.firstChild);
    input.click();
    de.removeChild(input);
  }catch(e){}
  // ...
}

Everything looks good. The element is display: none; so it shouldn't be generating render boxes when inserted into the DOM. Should be cheap, right? Well, lets see what happens in WebKit. Debugging into a simple test page with equivalent code shows that part of the call stack looks like:

#0	0x0266267f in WebCore::Document::recalcStyle at Document.cpp:1575
#1	0x02662643 in WebCore::Document::updateStyleIfNeeded at Document.cpp:1652
#2	0x026a89fd in WebCore::MouseRelatedEvent::receivedTarget at MouseRelatedEvent.cpp:152
#3	0x0269df03 in WebCore::Event::setTarget at Event.cpp:282
#4	0x026af889 in WebCore::Node::dispatchEvent at Node.cpp:2604
#5	0x026adbcb in WebCore::Node::dispatchMouseEvent at Node.cpp:2885
#6	0x026ae231 in WebCore::Node::dispatchSimulatedMouseEvent at Node.cpp:2816
#7	0x026ae3f1 in WebCore::Node::dispatchSimulatedClick at Node.cpp:2837
#8	0x02055bb5 in WebCore::HTMLElement::click at HTMLElement.cpp:767
#9	0x022587e6 in WebCore::HTMLInputElementInternal::clickCallback at V8HTMLInputElement.cpp:707
...

Document::recalcStyle() can be very expensive, and unlike painting, it blocks input and other execution. And the cost is at page loading is likely to be much higher than other times as there will be significantly more new styles streamed in from the network to satisfy for each element in the document when this is called. This isn't a full layout, but it's most of the price of one. Now, you can argue that this is a WebKit bug and I'll agree -- synthetic clicks should probably skip this -- but I'm just using this as an illustration to show that what browsers are doing on your behalf isn't always obvious. Once this bug is fixed, this test may indeed be nearly free, but it's not today. Not by a long shot.

Many layouts in very deep and "dirty" DOMs can take ten milliseconds or more, and if you're doing it from script, you're causing the system to do lots of work which it's probably going to need to throw away later when the rest of your markup and styles show up. Your average, dinky test harness page likely under-counts the cost of these tests, so when someone tells me "oh, it's only 30ms", not only do my eyes bug out at the double-your-execution-budget-for-anything number, but also the knowledge that in the real world, it's probably a LOT worse. Just imagine this happening in a deep DOM on a low-end ARM-powered device where memory pressure and a single core are conspiring against you.

False Positives

My last post concerned how you can build a cache to eliminate many of these problems if and only if you build UA tests that don't have false positives. Some commenters can't seem to grasp the subtlety that I'm not advocating for the same sort of lazy substring matching that has deservedly gotten such a bad rap.

So how would we build less naive UA tests that can have feature tests behind them as fallbacks? Lets look at some representative UA strings and see if we can't construct some tests for them that give us sub-version flexibility but won't pass on things that aren't actually the browsers in question:

IE 6.0, Windows:

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)

FF 3.6, Windows:

Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.2.13) Firefox/3.6.13

Chrome 8.0, Linux:

Mozilla/5.0 (X11; U; Linux x86_64; en-US) AppleWebKit/534.10 (KHTML, like Gecko) Ubuntu/10.10 Chromium/8.0.552.237 Chrome/8.0.552.237 Safari/534.10

Safari 5.0, Windows:

Mozilla/5.0 (Windows; U; Windows NT 6.1; sv-SE) AppleWebKit/533.19.4 (KHTML, like Gecko) Version/5.0.3 Safari/533.19.4

Some features start to jump out at us. The "platform" clauses -- that bit in the parens after the first chunk -- contains a lot of important data and a lot of junk. But the important stuff always comes first. We'll need to allow but ignore the junk. Next, stuff after platform clauses is good, has defined order, and can be used to tightly form a match for browsers like Safari and Chrome. With this in mind, we can create some regexes that don't allow much in the way of variance but do allow sub-minor version to match so we don't have to update these every month or two:

IE60 = /^Mozilla\/4\.0 \(compatible; MSIE 6\.0; Windows NT \d\.\d(.*)\)$/;
FF36 = /^Mozilla\/5\.0 \(Windows; U;(.*)rv\:1\.9\.2.(\d{1,2})\)( Gecko\/(\d{8}))? Firefox\/3\.6(\.\d{1,2})?( \(.+\))?$/;
CR80 = /^Mozilla\/5\.0 \((Windows|Macintosh|X11); U;.+\) AppleWebKit\/534\.10 \(KHTML\, like Gecko\) (.+)Chrome\/8\.0\.(\d{3})\.(\d{1,3}) Safari\/534\.10$/;

These look pretty wordy, and they are, because they're designed NOT to let through things that we don't really understand. This isn't just substring matching on the word "WebKit" or "Chrome", this is a tight fit against the structure of the entire string. If it doesn't fit, we don't match, and our cache doesn't get pre-populated. Instead, we do feature detection. Remember, false positives here are the enemy, so we're using "^" and "$" matches to ensure that the string has the right structure all the way through, not just at some random point in the middle, which UA's that parade around as other browsers tend to do.

Here's some sample code that incorporates the approach:

(function(global){
// The map of available tests
var featureTests = {
"audio": function() {
var audio = document.createElement("audio");
return audio && audio.canPlayType;
},
"audio-ogg": function() { /.../ }
// ...
};
// A read-through cache for test results.
var testCache = {};
// An (exported) function to run/cache tests
global.ft = function(name) {
return testCache[name] = (typeof testCache[name] == "undefined") ?
featureTestsname :
testCache[name];
};
// Tests for 90+% of current browser usage
var ua = (global.navigator) ? global.navigator.userAgent : "";
// IE 6.0/WinXP:
var IE60 = /^Mozilla/4.0 (compatible; MSIE 6.0; Windows NT \d.\d(.))$/;
if (ua.search(IE60) == 0) {
testCache = { "audio": 1, "audio-ogg": 0 / ... */ };
}
// IE 7.0
// ...
// IE 8.0
// ...
// IE 9.0 (updated with fix from John-David Dalton)
var IE90 = /^Mozilla/5.0 (compatible; MSIE 9.0; Windows NT \d.\d(.))$/;
if (ua.search(IE90) == 0) {
testCache = { "audio": 1, "audio-ogg": 0 / ... */ };
}
// Firefox 3.6/Windows
var FF36 = /^Mozilla/5.0 (Windows; U;(.)rv:1.9.2.(\d{1,2}))( Gecko/(\d{8}))? Firefox/3.6(.\d{1,2})?( (.+))?$/;
if (ua.search(FF36) == 0) {
testCache = { "audio": 1, "audio-ogg": 1 / ... */ };
}
// Chrome 8.0
var CR80 = /^Mozilla/5.0 ((Windows|Macintosh|X11); U;.+) AppleWebKit/534.10 (KHTML, like Gecko) (.+)Chrome/8.0.(\d{3}).(\d{1,3}) Safari/534.10$/;
if (ua.search(FF36) == 0) {
testCache = { "audio": 1, "audio-ogg": 1 /* ... */ };
}
// Safari 5.0 (mobile)
var S5MO = /^Mozilla/5.0 (iPhone; U; CPU iPhone OS \w+ like Mac OS X; .+) AppleWebKit/(\d{3,}).(\d+).(\d+) (KHTML, like Gecko) Version/5.0(.\d{1,})? Mobile/(\w+) Safari/(\d{3,}).(\d+).(\d+)$/;
if (ua.search(FF36) == 0) {
testCache = { "audio": 1, "audio-ogg": 0 /* ... */ };
}
// ...
})(this);

New versions of browsers won't match these tests, so we won't break libraries in the face of new UAs -- assuming the feature tests also don't break, which is a big if in many cases -- and we can go faster for the majority of users. Win.

Cutting The Interrogation Short

January 30, 2011

I've been having a several-day mail, IRC, and twitter discussion with various folks about performance and the feature detection ~~religion~~ technique, particularly on mobile where CPU ain't free. So what's the debate? I say you shouldn't be running tests in UA's where you can dependably know the answer a-priori.

Wait, what? Why does Alex Russell hate feature testing, kittens, and cute fuzzy ducklings?

I don't. Paul warned me that my approach isn't going to be popular at first glance, but hear me out. My assumptions are as follows:

Working is better than busted
Fast is better than slow
No browser vendor changes the web-facing features in a given version. Evar. Does not happen

If you buy those, then I think we can all get some satisfaction by retracing our steps and asking, seriously, what is the point of feature testing?

Ok, I'll go first: feature testing is motivated by a desire not to be busted, particularly in the face of new versions of UA's which will (hopefully) improve standards support and reduce the need for hacks in the first place. Sensible enough. Why should users wait for a new version of your library just 'cause a new browser was released or because you didn't test on some version of something.

Extra bonus: if you don't mind running them every time, you can write just the feature test and your work is done now and in the future! Awesome! Except some of us do mind. Yes, things are now resilient in the face of new UA's and new versions of old ones, but only on the back of testing for everything you need ever time you load a library on a page. Slowly. Veeerrrrry slooowly.

Paul suggested that some library could use something like Local Storage to cache the results of these tests locally, but this hardly seems like an answer. First, what if the user upgrades their browser? Guess you have to cache and check against the UA string anyway. And what about the cost of going to storage? Paul reported that these tests can be wicked expensive to run at all, on the order of 30ms for the full suite (which you hopefully won't hit...but sheesh). Reported worst-case for has.js is even worse. But apparently going to Local Storage is also expensive. And we're still running all these tests in the fast path the first time anyway. If we think that they're so expensive that we want to cache the results, why don't we think they're so expensive that we don't want to run them in the common case?

Now for a modest proposal: feature tests should only ever be run when you don't know what UA you're running in.

Feature testing libraries should contain pre-built caches -- the kind that come with the library, not the kind that get built on the client -- but they should only be consulted for UA versions that you know you know. If we assume that behavior for UA/version combination never changes, we've got ourselves a get-out-of-jail free card. Libraries can have O(1) behavior in the common case and in the situations where feature testing would keep you from being busted, you're still not busted.

So what's the cost to this? Frankly, given the size of some of the feature tests I've seen, it's going to be pretty minimal vs. the bloat the feature tests add. All performance work is always a tradeoff, but if your library thinks it's important not to break and to be fast, then I don't see many alternatives. New versions of libraries can continue to update the caches and tests as necessary, keeping the majority of users fast, while at the same time keeping things working in hostile or unknown environments.

Anyhow, if you're a library author or maintainer, please please please consider the costs of feature tests, particularly the sort that mangle DOM and or read-back computed layout values. Going slow hurts users, hurts the web, and hurts the culture of performance that's so critical to keeping the platform a viable contender for the next generation of apps. We owe it to users to go faster.

A quick aside: I hesitated writing this for the same reasons that Paul cautioned me about how unpopular this was going to be: there's a whole lot of know-nothing advocacy that's still happening in the JS/webdev/design world these days, and it annoys me to no end. I'm not sure how our community got so religious and fact-disoriented, but it has got to stop. If you read this and your takeaway was "Alex Russell is against feature testing", then you're part of the problem. Think of it like a feature test for bogosity. Did you pass? If so, congrats, and thanks for being part of the bits of the JavaScript universe that I like.

Older Posts

Newer Posts

execute typical instruction	1/1,000,000,000 sec = 1 nanosec
fetch from L1 cache memory	0.5 nanosec
branch misprediction	5 nanosec
fetch from L2 cache memory	7 nanosec
Mutex lock/unlock	25 nanosec
fetch from main memory	100 nanosec
send 2K bytes over 1Gbps network	20,000 nanosec
read 1MB sequentially from memory	250,000 nanosec
fetch from new disk location (seek)	8,000,000 nanosec
read 1MB sequentially from disk	20,000,000 nanosec
send packet US to Europe and back	150 milliseconds = 150,000,000 nanosec