Bedrock

April 1, 2012

Jetlag has me in its throes which is as good an excuse as any to share what has been keeping me up many nights over the past couple of years; a theory of the web as a platform.

I had a chance last week to share some of my thinking here to an unlikely audience at EclipseCon, a wonderful experience for which my thanks go to Mike Milinkovich and Ian Skerrett for being crazy enough to invite a "web guy" to give a talk.

One of the points I tried (and perhaps failed) to make in the talk was that in every platform that's truly a platform it's important to have a stable conceptual model of what's "down there". For Java that's not the language, it's the JVM. For the web...well...um. Yes, it bottoms out at C/C++, but that's mostly observable through spooky action at a distance. The expressive capacity of C/C++ show up as limitations and mismatches in web specs all the time, but the essential semantics — it's all just words in memory that C/C++ can do whatever it pleases to — are safely hidden away behind APIs and declarative forms that are unfailingly high-level. Until they aren't. And you can forget about composition most of the time.

For a flavor of this, I always turn back to Joel Webber's question to me several years ago: why can't I over-ride the rendering of a border around an HTML element?

It's a fair question and one I wrote off too quickly the first time he posed it. We have <canvas> which lets us draw lines however we like, so why can't we override the path painting for borders? Why isn't it just a method you implement like in Flex or Silverlight?

Put another way: there are some low level APIs in the web that suggest that such power should be in the hands of us mortals. When using a low-level thing, you pay-as-you-go since lower-level things need more code (latency and complexity)...but that's a choice. Today's web is often mysteriously devoid of the sort of sane layering, forcing you to re-build parallel systems to what's already in the browser to get a job done. You can't just subclass the right thing or plug into the right lifecycle method most of the time. Want a <canvas>? Fine. There you go. Want a <span>? Hot <span>s coming up! But don't go getting any big ideas about using the drawing APIs from <canvas> to render your <span>. Both are magic in their own right and for no reason other than that's the way it has always been.

The daftest part in all of this is that JavaScript does exist in the web so you can strictly speaking do whatever you want. Goodness knows that when the platform fails us today, we're all-too-willing to just throw JS at it. It's crazy, in this context then, that spec authors seem to be trying to uphold a golden principle: JavaScript doesn't exist. Writing it out of the story allows you to just claim that your bit of the system is magic and that it doesn't need an exposed lifecycle and plug-in architecture. New things can just be bolted onto the magic, no layering required. It's magical turtles all the way down.

You can see why people who think in terms of VM's and machine words might find this a bit ahem limiting.

But how much should we "web people" care about what they think? After all, "real programmers" have been predicting the imminent death of this toy browser thing for so long that I'm forgetting exactly when the hate took its various turns through the 7 stages; "Applets will save us from this insanity!"..."Ajax is a hack"..."just put a compiler in front of it and treat it as the dumbest assembler ever" (which is at least acceptance, of a sort). The web continues to succeed in spite of all of of this. So why the gnashing of teeth?

Thanks to Steve Souders, I have an answer: every year we're throwing more and more JS on top of the web, dooming our best intended semantic thoughts to suffocation in the Turing tar pit. Inexorably, and until we find a way to send less code down the wire, us is them, and more so every day.

Let that picture sink in: at 180KB of JS on average, script isn't some helper that gives meaning to pages in the breech, it is the meaning of the page. Dress it up all you like, but that's where this is going.

Don't think 180KB of JS is a lot? Remember, that's transfer size which accounts for gzipping, not total JS size. Oy. And in most cases that's more than 3x the size of the HTML being served (both for the page and for whatever iframes it embeds). And that's not all; it's worse for many sites which should know better. Check out those loading "filmstrip" views for gawker, techcrunch, and the NYT. You might be scrolling down, looking at the graphs, and thinking to yourself "looks like Flash is the big ticket item...", and while that's true in absolute terms, Flash isn't what's blocking page loads. JS is.

And what for? What's all that code doing, anyway?

It's there to:

Clean up the messes that browser vendors aren't willing or able to clean up for themselves
Provide an API that becomes the new platform
Help you deliver app-specific features

Only the last one is strictly valuable.

You're not including JQuery, Backbone, Prototype or Dojo into your pages just because you like the API (if you are, stop it). You're doing it because the combination of API and even behavior across browsers makes them the bedrock. They are the new lisp of application construction; the common language upon which you and your small team can agree; just don't expect anyone else to be able to pick up your variant without squinting hard.

This is as untenable as it is dangerous. It was this realization that set me and Dimitri Glazkov off to build a team to do something about it more than a year and a half ago. The results are showing up now in the form of Web Components and Shadow DOM, Mutation Observers as plumbing for Model Driven View, and a host of new CSS capabilities and JavaScript language expressiveness wins. If that sounds like a huge pile of seemingly un-related work, let me walk back to one of the motivating questions and then I'll fast forward to the approach:

What would it mean to be able to subclass an HTML Element?

We observed that most of what the current libraries and frameworks are doing is just trying to create their own "widgets" and that most of these new UI controls had a semantic they'd like to describe in a pretty high-level way, an implementation for drawing the current state, and the need to parent other widgets or live in a hierarchy of widgets.

Heeeeeyyyyyy....wait a minute...that sounds a lot like what HTML does! And you even have HTML controls which generate extra elements for visual styling but which you can't access from script. This, BTW, is what you want when building your own controls. Think the bullets of list items or the sliders generated by <input type="range">. There are even these handy (non-constructable!?!) constructors for the superclasses in JS already.

So what would you need access to in order to plug into that existing system? And how should it be described? This, by the way, is the danger zone. Right about this point in the logical chain most folks tend to fall back to what they know best: C++ hacker? Give 'em a crappy C++-inspired high-level-ish JS API that will make the people yelling loudest stop beating you up. Declarative guy? Force everyone to describe their components as separate documents and...yeah. XUL. You get the idea. JavaScript person? Demand the lowest level API and as much unwarranted power as possible and pretend you don't need the browser. JS libraries are the "fuck it, we'll do it live!" of the web.

None of these are satisfying. Certainly not if what we want is a platform of the sort you might consider using "naked". And if your "platform" always needs the same shims here and polyfills there, let me be candid: it ain't no platform. It's some timber and bolts out of which you can make a nice weekend DIY project of building a real platform.

So we need to do better.

What does better look like?

Better is layered. Better is being able to just replace what you need, to plug in your own bits to a whole that supports that instead of making you re-create everything above any layer you want to shim something into. This is why mutable root prototypes in JS and object mutability in general are such cherished and loathed properties of the web. It is great power. It's just a pity we need it so often. Any plan for making things better that's predicated on telling people "oh, just go pile more of your own parallel systems on top of a platform that already does 90% of what you need but which won't open up the API for it" is DOOMED

Thus began a archaeology project, one which has differed in scope and approach from most of the recently added web capabilities I can think of, not because it's high-level or low-level, but because it is layered. New high-level capabilities are added, but instead of then poking a hole nearly all the way down to C++ when we want a low-level thing, the approach is to look at the high-level thing and say:

How would we describe what it's doing at the next level down in an API that we could expose?

This is the reason low-level-only API proposals drive me nuts. New stuff in the platform tends to be driven by scenarios. You want to do a thing, that thing probably has some UI (probably browser provided), and might invoke something security sensitive. If you start designing at the lowest level, throwing a C++ API over the wall, you've turned off any opportunity or incentive to layer well. Just tell everyone to use the very fine JS API, after all. Why should anyone want more? (hint: graph above). Instead of opening doors, though, it's mostly burden. Everything you have to do from script is expensive and slow and prone to all sorts of visual and accessibility problems by default.

If the browser can provide common UI and interaction for the scenario, isn't that better most of the time? Just imagine how much easier it would be to build an app if the initial cut at location information had been <input type="location"> instead of the Geolocation API we have now. True, that input element would need lots of configuration flags and, eventually, a fine-grained API...if only there were a way to put an API onto an HTML element type...hrm...

In contrast, if we go declarative-only we get a lot of the web platform today. Fine at first but horrible to work with over time, prone to attracting API barnacles to fill perceived gaps, and never quite enough. The need for that API keeps coming back to haunt us. We're gonna need both sides, markup and imperative, sooner or later. A framework for thinking about what that might look like seems in order. Our adventure in excavation with Web Components has largely been a success, not because we're looking to "kernalize the browser" in JS -- good or bad, that's an idea with serious reality-hostile properties as soon as you add a network -- but because when you force yourself to think about what's already down there as an API designer, you start making connections, finding the bits that are latent in the platform and should be used to explain more of the high level things in terms of fewer, more powerful primitives at the next layer down. This isn't a manifesto for writing the whole world in JS; it's a reasonable and practical approach for how to succeed by starting high and working backwards from the 80% use-case to something that eventually has most of the flexibility and power that high-end users crave.

The concrete steps are:

Introduce new platform capabilities with high-level, declarative forms. I.e., invent new tags and attributes. DOM gives you an API for free when you do it that way. Everyone's a winner.
When the thing you want feels like something that's already "down there" somewhere, try to explain the bits that already exist in markup in terms of a lower-level JS or markup primitive. If you can't do that or you think your new API has no connection to markup, go back to step 1 and start again.
When it feels like you're inventing new language primitives in DOM just to get around JS language limitations, extend the language, not the API

On the web, JavaScript is what's down there. When it's not, we're doing it wrong. It has taken me a very long time to understand why the Java community puts such a high premium on the "pure java" label, and fundamentally what it says to others in the community is "I appealed to no gods and no magic in the construction of this, merely physics". That's a Good Thing (TM), and the sort of property that proper platforms should embody to the greatest extent possible.

And this brings me to my final point. C/C++ might be what's "down there" for web browsers, but that's also been true of Java. What separates the web and Java, however, is that the Java community sees their imperative abstraction that keeps them from having to think about memory correctness (the JVM) as an asset and many "web people" think of JS as pure liability. I argue that because of the "you're gonna need both sides" dynamic, trying to write JS out of the picture is a dumb as it is doomed to fail. JavaScript is what's "down there" for the web. The web has an imperative backbone and we're never going to expose C/C++ ABI for it, which means JS is our imperative successor. The archaeological dig which is adding features like Web Components is providing more power to JS by the day and if we do this right and describe each bit as a layer with an API that the one above builds on, we can see pretty clearly how the logical regress of the "you must use JS to implement the whole browser" isn't insane. JS itself is implemented as C/C++, so there's always room for the mother tongue and of course many of the APIs that we interact with from JS must be C/C++; you can't write it out of the story — but that doesn't mean we need to design our APIs there or throw bad design decisions over the wall for someone else to clean up.

It is high time we started designing low-level stuff for the web in idiomatic JS (not IDL), start describing the various plug-in points for what they are. We can provide power from our imperative abstraction to and through our declarative layer in ways that make both high and low-level users of the web platform more able to interoperate, build on each other's work, and deliver better experiences at reasonable cost. That's the difference between a minefield and a platform. Only one of them is reasonable to build on.

The trash truck just came by which means it's 6AM here in almost-sunny London. WordPress is likewise telling me that I'm dangerously out of column-inches, so I guess I'll see if I can't get a last hour or two of sleep before the weekend is truly over. The arguments here may not be well presented, and they are subtle, but layering matters. We don't have enough of it and when done well, it can be a powerful tool in ending the battle between imperative and declarative. I'll make the case some other time for why custom element names are a good idea, but consider it in the layered context: if I could subclass HTMLElement from JavaScript in the natural way, why can't I just put a new tag name in the map the parser is using to create instances of all the other element types? Aside from the agreement about the names, what makes the built-in elements so special, anyway?

Cognitive dissonance, ahoy! You're welcome ;-)

Note: this post has evolved in the several days since its initial posting, thanks largely to feedback from Annie Sullivan and Steve Souders. But it's not their fault. I promise.