Infrequently Noted

Reckoning: Part 2 — Object Lesson

What hath we wrought? A case study.

August 13, 2024

This is part two of the four-part series "Reckoning"

Other posts in the series:

The Golden Wait
- The Truth Is In The Trace
- Zip It
Near Peers
Deep Breaths

The Golden Wait #

BenefitsCal is the state of California's recently developed portal for families that need to access SNAP benefits (née "food stamps"):^[1]

BenefitsCal loading on a low-end device over a 9Mbps link with 170ms RTT latency via WebPageTest.org

Code for America's getcalfresh.org performs substantially the same function, providing a cleaner, faster, and easier-to-use alternative to California county and state benefits applications systems:

getcalfresh.org loading under the same conditions

The like-for-like, same-state comparison getcalfresh.org provides is unique. Few public sector services in the US have competing interfaces, and only Code for America was motivated to build one for SNAP applications.

getcalfresh.org finishes in 1/3 the time, becoming interactive not long after the BenefitsCal loading screen appears.

WebPageTest.org timelines document a painful progression. Users can begin to interact with getcalfresh.org before the first (of three) BenefitsCal loading screens finish (scroll right to advance):

Google's Core Web Vitals data backs up test bench assessments. Real-world users are having a challenging time accessing the site:

BenefitsCal is a poor experience on real phones.

But wait! There's more. It's even worse than it looks.

The multi-loading-screen structure of BenefitsCal fakes out Chrome's heuristic for when to record Largest Contentful Paint.

On low-end devices, BenefitsCal appears to <em>almost</em> load at the 22 second mark, only to put up a second loading screen shortly after. Because this takes so long, Chromium's heuristics for Largest Contentful Paint are thrown off, incorrectly reporting the end of the first loading screen as the complete page. — On low-end devices, BenefitsCal appears to *almost* load at the 22 second mark, only to put up a second loading screen shortly after. Because this takes so long, Chromium's heuristics for Largest Contentful Paint are thrown off, incorrectly reporting the end of the first loading screen as the complete page.

The real-world experience is significantly worse than public CWV metrics suggest.

Getcalfresh.org uses a simpler, progressively enhanced, HTML-first architecture to deliver the same account and benefits signup process, driving nearly half of all signups for California benefits (per CalSAWS).

The results are night-and-day:^[2]

getcalfresh.org generates almost half of the new filings to the CalSAWS system. Its relative usability presumably contributes to that success.

And this is after the state spent a million dollars on work to achieve "GCF parity".

The Truth Is In The Trace #

No failure this complete has a single father. It's easy to enumerate contributing factors from the WebPageTest.org trace, and a summary of the site's composition and caching make for a bracing read:

File Type	First View			Repeat View
	Wire (KB)	Disk	Ratio	Wire	Disk	Ratio	Cached
JavaScript	17,435	25,865	1.5	15,950	16,754	1.1	9%
Other (text)	1,341	1,341	1.0	1,337	1,337	1.0	1%
CSS	908	908	1.0	844	844	1.0	7%
Font	883	883	N/A	832	832	N/A	0%
Image	176	176	N/A	161	161	N/A	9%
HTML	6	7	1.1	4	4	1.0	N/A
Total	20,263	29,438	1.45	18,680	19,099	1.02	7%

The first problem is that this site relies on 25 megabytes of JavaScript (uncompressed, 17.4 MB on over the wire) and loads all of it before presenting any content to users. This would be unusably slow for many, even if served well. Users on connections worse than the P75 baseline emulated here experience excruciating wait times. This much script also increases the likelihood of tab crashes on low-end devices.^[3]

Very little of this code is actually used on the home page, and loading the home page is presumably the most common thing users of the site do:^[4]

Red is bad. DevTools shows less than a quarter of the JavaScript downloaded is executed.

As bad as that is, the wait to interact at all is made substantially worse by inept server configuration. Industry-standard gzip compression generally results in 4:1-8:1 data savings for text resources (HTML, CSS, JavaScript, and "other") depending on file size and contents. That would reduce ~28 megabytes of text, currently served in 19MB of partially compressed resources, to between 3.5MB and 7MB.

But compression is not enabled for most assets, subjecting users to wait for 19MB of content. If CalSAWS built BenefitsCal using progressive enhancement, early HTML and CSS would become interactive while JavaScript filigrees loaded in the background. No such luck for BenefitsCal users on slow connections.

For as bad as cell and land-line internet service are in dense California metros, the vast rural population experiences large areas with even less coverage.

Thanks to the site's JavaScript-dependent, client-side rendered, single-page-app (SPA) architecture, nothing is usable until nearly the entire payload is downloaded and run, defeating built-in browser mitigations for slow pages. Had progressive enhancement been employed, even egregious server misconfigurations would have had a muted effect by comparison.^[5]

Zip It #

Gzip compression has been industry standard on the web for more than 15 years, and more aggressive algorithms are now available. All popular web servers support compression, and some enable it by default. It's so common that nearly every web performance testing tool checks for it, including Google's PageSpeed Insights.^[6]

Gzip would have reduced the largest script from 2.1MB to a comparatively svelte 340K; a 6.3x compression ratio:

$ gzip -k main.2fcf663c.js 
$ ls -l
> total 2.5M
> ... 2.1M Aug  1 17:47 main.2fcf663c.js
> ... 340K Aug  1 17:47 main.2fcf663c.js.gz

Not only does the site require a gobsmacking amount of data on first visit, it taxes users nearly the same amount every time they return.

Because most of the site's payload is static, the fraction cached between first and repeat views should be near 100%. BenefitsCal achieves just 7%.

This isn't just perverse; it's so far out of the norm that I struggled to understand how CalSAWS managed to so thoroughly misconfigure a web server modern enough to support HTTP/2.

The answer? Overlooked turnkey caching options in CloudFront's dashboard.^[7]

This oversight might have been understandable at launch. The mystery remains how it persisted for nearly three years (pdf). The slower the device and network, the more sluggish the site feels. Unlike backend slowness, the effects of ambush-by-JavaScript can remain obscured by the fast phones and computers used by managers and developers.

But even if CalSAWS staff never leave the privilege bubble, there has been plenty of feedback:

A Reddit user responding to product-level concerns with: <br><em>'And it's so f'n slow.'</em> — A Reddit user responding to product-level concerns with:
*'And it's so f'n slow.'*

Having placed a bet on client-side rendering, CalSAWS, Gainwell, and Deloitte staff needed to add additional testing and monitoring to assess the site as customers experience it. This obviously did not happen.

The most generous assumption is they were not prepared to manage the downsides of the complex and expensive JavaScript-based architecture they chose over progressive enhancement.^[8]^[9]

Near Peers #

Analogous sites from other states point the way. For instance, Wisconsin's ACCESS system:

Six seconds isn't setting records, but it's a fifth as long as it takes to access BenefitsCal.

There's a lot that could be improved about WI ACCESS's performance. Fonts are loaded too late, and some of the images are too large. They could benefit from modern formats like WebP or AVIF. JavaScript could be delay-loaded and served from the same origin to reduce connection setup costs. HTTP/2 would left-shift many of the early resources fetched in this trace.

But the experience isn't on fire, listing, and taking on water.

Despite numerous opportunities for improvement, WI ACCESS's appropriate architecture keeps the site usable for all.

Because the site is built in a progressively enhanced way, simple fixes can cheaply and quickly improve on an acceptable baseline.

Even today's "slow" networks and phones are so fast that sites can commit almost every kind of classic error and still deliver usable experiences. Sites like WI ACCESS would have felt sluggish just 5 years ago but work fine today. It takes extra effort to screw up as badly as BenefitsCal has.

Blimey #

To get a sense of what's truly possible, we can compare a similar service from the world leader in accessible, humane digital public services: gov.uk, a.k.a., the UK Government Digital Service (GDS).

gov.uk's Universal Credit page finishes loading before BenefitsCal's first loading screen even starts.

California enjoys a larger GDP and a reputation for technology excellence, and yet the UK consistently outperforms the Golden State's public services.

There are layered reasons for the UK's success:

GDS's Service Manual is an enforceable guide for how government services should be built and delivered continuously. This liberates each department from reinventing processes or rediscovering how best to deliver through trial and error.
The GDS Service Manual requires progressive-enhancement.
Progressive Enhancement is baked into infrastructure and practice by patterns long documented in the official GDS design system and reference implementation.
The parallel Service Standard clearly articulates the egalitarian, open values that every delivery team is held to.
The Service Manual and Service Standard nearly shout "do not write an omnibus contract!" if you can read between the lines. The interlocking processes, spend controls, and agile activism serve to grind down too-big-to-fail procurement into a fine powder, one contract at a time.^[10]

The BenefitsCal omnishambles should trigger a fundamental rethink. Instead, the prime contractors have just been awarded another $1.3BN over a long time horizon. CalSAWS is now locked in an exclusive arrangement with the very folks that broke the site with JavaScript. Any attempt at fixing it now looks set to reward easily-avoided failure.

Too-big-to-fail procurement isn't just flourishing in California; it's thriving across public benefits application projects nationwide. No matter how badly service delivery is bungled, the money keeps flowing.

JavaScript Masshattery #

CalSAWS is by no means alone.

For years, I have been documenting the inequality exacerbating effects of JS-based frontend development based on the parade of private-sector failures that cross my desk.^[11]

Over the past decade, those failures have not elicited a change in tone or behaviour from advocates for frameworks, but that might be starting to change, at least for high-traffic commercial sites.

Core Web Vitals is creating pressure on slow sites that value search engine traffic. It's less clear what effect it will have on public-sector monopsonies. The spread of unsafe-at-any-scale JavaScript frameworks into government is worrying as it's hard to imagine what will dislodge them. There's no comparison shopping for food stamps.^[12]

The Massachusetts Executive Office of Health and Human Services (EOHHS) seems to have fallen for JavaScript marketing hook, line, and sinker.

DTA Connect is the result, a site so slow that it frequently takes me multiple attempts to load it from a Core i7 laptop attached to a gigabit network.

From the sort of device a smartphone-dependent mom might use to access the site? It's lookin' slow.

Introducing the Can You Hold Your Breath Longer Than It Takes to Load DTA Connect? Challenge.

I took this trace multiple times, as WebPageTest.org kept timing out. It's highly unusual for a site to take this long to load. Even tools explicitly designed to emulate low-end devices and networks needed coaxing to cope.

The underlying problem is by now familiar:

You don't have to be a web performance expert to understand that <a href='https://www.webpagetest.org/breakdown.php?test=240804_AiDcFG_31Y&run=1&end=visual'>10.2MB of JS</a> is a tad much, particularly when it is served without compression. — You don't have to be a web performance expert to understand that 10.2MB of JS is a tad much, particularly when it is served without compression.

Vexingly, whatever hosting infrastructure Massachusetts uses for this project throttles serving to 750KB/s. This bandwidth restriction combines with server misconfigurations to ensure the site takes forever to load, even on fast machines.^[13]

It's a small mercy that DTA Connect sets caching headers, allowing repeat visits to load in "only" several seconds. Because of the SPA architecture, nothing renders until all the JavaScript gathers its thoughts at the speed of the local CPU.

The slower the device, the longer it takes.^[14]

Even when everything is cached, DTA Connect takes multiple seconds to load on a low-end device owing to the time it takes to run this much JavaScript (yellow and grey in the 'Browser main thread' row).

A page this simple, served entirely from cache, should render in much less than a second on a device like this.^[15]

Maryland Enters The Chat #

The correlation between states procuring extremely complex, newfangled JavaScript web apps and fumbling basic web serving is surprisingly high.

Case in point, the residents of Maryland wait seconds on a slow connection for megabytes of uncompressed JavaScript, thanks to the Angular 9-based SPA architecture of myMDTHINK.^[16]

Maryland's myMDTHINK loads its <a href='https://www.webpagetest.org/breakdown.php?test=240805_BiDc66_389&run=1&end=visual'>5.2MB critical-path JS</a> bundle sans gzip. — Maryland's myMDTHINK loads its 5.2MB critical-path JS bundle sans gzip.

American legislators like to means test public services. In that spirit, perhaps browsers should decline to load multiple megabytes of JavaScript if a site can't manage gzip.

Chattanooga Chug Chug #

Tennessee, a state with higher-than-average child poverty, is at least using JavaScript to degrade the accessibility of its public services in unique ways.

Instead of misconfiguring web servers, The Volunteer State uses Angular to synchronously fetch JSON files that define the eventual UI in an onload event handler.

The enormous purple line represents four full seconds of main thread unresponsiveness.

It's little better on second visit, owing to the CPU-bound nature of the problem.

Needless to say, this does not read to the trained eye as competent work.

SNAP? In Jersey? Fuhgeddaboudit #

New Jersey's MyNJHelps.gov (yes, that's the actual name) mixes the old-timey slowness of multiple 4.5MB background stock photos with a nu-skool render-blocking Angular SPA payload that's 2.2MB on the wire (15.7MB unzipped), leading to first load times north of 20 seconds.

Despite serving the oversized JavaScript payload relatively well, the script itself is so slow that repeat visits take nearly 13 seconds to display fully:

What Qualcomm giveth, Angular taketh away.

Despite almost perfect caching, repeat visits take more than 10 seconds to render thanks to a slow JavaScript payload.

Debugging the pathologies of this specific page are beyond the scope of this post, but it is a mystery how New Jersey managed to deploy an application that triggers a debugger; statement on every page load with DevTools open whilst also serving a 1.8MB (13.8MB unzipped) vendor.js file with no minification of any sort.

One wonders if anyone involved in the deployment of this site are developers, and if not, how it exists.

Hoosier Hospitality #

Nearly half of the 15 seconds required to load Indiana's FSSA Benefits Portal is consumed by a mountain of main-thread time burned in its 4.2MB (16MB unzipped) Angular 8 SPA bundle.

Combined with a failure to set appropriate caching headers, both timelines look identically terrible:

Can you spot the difference?

Deep Breaths #

The good news is that not every digital US public benefits portal has been so thoroughly degraded by JavaScript frameworks. Code for America's 2023 Benefits Enrollment Field Guide study helpfully ran numbers on many benefits portals, and a spot check shows that those that looked fine last year are generally in decent shape today.

Still, considering just the states examined in this post, one in five US residents will hit underperforming services, should they need them.

None of these sites need to be user hostile. All of them would be significantly faster if states abandoned client-side rendering, along with the legacy JavaScript frameworks (React, Angular, etc.) built to enable the SPA model.

GetCalFresh, Wisconsin, and the UK demonstrate a better future is possible today. To deliver that better future and make it stick, organisations need to learn the limits of their carrying capacity for complexity. They also need to study how different architectures fail in order to select solutions that degrade more gracefully.

Next: Caprock: Development without constraints isn't engineering.

Thanks to Marco Rogers, and Frances Berriman for their encouragement in making this piece a series and for their thoughtful feedback on drafts.

If you work on a site discussed in this post, I offer (free) consulting to public sector services. Please get in touch.

The JavaScript required to render anything on BenefitsCal embodies nearly every anti-pattern popularised (sometimes inadvertently, but no less predictably) by JavaScript influencers over the past decade, along with the most common pathologies of NPM-based frontend development.

A perusal of the code reveals:
- Multiple reactive frontend frameworks, namely React, Vue, and RxJS.
- "Client-side routing" metadata for the entire site bundled into the main script.
- React components for all UI surfaces across the entire site, including:
  - Components for every form, frontloaded. No forms are displayed on the home page.
  - An entire rich-text editing library. No rich-text editing occurs on the home page.
  - A complete charting library. No charts appear on the home page.
  - Sizable custom scrolling and drag-and-drop libraries. No custom scrolling or drag-and-drop interactions occur on the home page.
- A so-called "CSS-in-JS" library that does not support compilation to an external stylesheet. This is categorically the slowest and least efficient way to style web-based UIs. On its own, it would justify remediation work.
- Unnecessary polyfills and transpilation overhead, including:
  - class syntax transpilation.
  - Generator function transpilation and polyfills independently added to dozens of files.
  - Iterator transpilation and polyfills.
  - Standard library polyfills, including obsolete userland implementations of ArrayBuffer, Object.assign() and repeated inlining of polyfills for many others, including a litany of outdated TypeScript-generated polyfills, bloating every file.
  - Obselete DOM polyfills, including a copy of Sizzle to provide emulation for document.querySelectorAll() and a sizable colourspace conversion system, along with userland easing functions for all animations supported natively by modern CSS.
- No fewer than ~~2...wait...5~~...no, 6 large — seemingly different! — User-Agent parsing libraries that support browsers as weird and wonderful as WebOS, Obigo, and iCab. What a delightful and unexpected blast from the past! (pdf)
- What appears to be an HTML parser and userland DOM implementation!?!
- A full copy of Underscore.
- A full copy of Lodash.
- A full copy of core-js.
- A userland elliptic-curve cryptography implementation. Part of an on-page chatbot, naturally.
- A full copy of Moment.js. in addition to the custom date and time parsing functions already added via bundling of the (overlarge) react-date-picker library.
- An unnecessary OAuth library.
- An emulated version of the Node.js buffer class, entirely redundant on modern browsers.
- The entire Amazon Chime SDK, which includes all the code needed to do videoconferencing. This is loaded in the critical path and alone adds multiple megabytes of JS spread across dozens of webpack-chunked files. No features of the home page appear to trigger videoconferencing.
- A full copy of the AWS JavaScript SDK, weighing 2.6MB, served separately.
- Obviously, nothing this broken would be complete without a Service Worker that only caches image files.
This is, to use the technical term, whack.

The users of BenefitsCal are folks on the margins — often working families — trying to feed, clothe, and find healthcare for kids they want to give a better life. I can think of few groups that would be more poorly served by such baffling product and engineering mismanagement. ↩︎ ↩︎
getcalfresh.org isn't perfect from a performance standpoint.

The site would feel considerably snappier if the heavy chat widget it embeds were loaded on demand with the facade pattern and if the Google Tag Manager bundle were audited to cut cruft. ↩︎
Browser engineers sweat the low end because that's where users are^[17], and when we do a good job for them, it generally translates into better experiences for everyone else too. One of the most durable lessons of that focus has been that users having a bad time in one dimension are much more likely to be experiencing slowness in others.

Slow networks correlate heavily with older devices that have less RAM, slower disks, and higher taxes from "potentially unwanted software" (PUS). These machines may experience malware fighting with invasive antivirus, slowing disk operations to a crawl. Others may suffer from background tasks for app and OS updates that feel fast on newer machines but which drag on for hours, stealing resources from the user's real work the whole time.

Correlated badness also means that users in these situations benefit from any part of the system using fewer resources. Because browsers are dynamic systems, reduced RAM consumption can make the system faster, both through reduced CPU load from zram, as well as rebalancing in auto-tuning algorithms to optimise for speed rather than space.

The pursuit of excellent experiences at the margins is deep teacher about the systems we program, and a frequently humbling experience. If you want to become a better programmer or product manager, I recommend focusing on those cases. You'll always learn something. ↩︎
It's not surprising to see low code coverage percentages on the first load of an SPA. What's shocking is that the developers of BenefitsCal confused it with a site that could benefit from this architecture.

To recap: the bet that SPA-oriented JavaScript frameworks make is that it's possible to deliver better experiences for users when the latency of going to the server can be shortcut by client-side JavaScript.

I cannot stress this enough: the premise of this entire wing of web development practice is that expensive, complex, hard-to-operate, and wicked-to-maintain JavaScript-based UIs lead to better user experiences.

It is more than fair to ask: do they?

In the case of BenefitsCal and DTA Connect, the answer is "no".

The contingent claim of potentially improved UI requires dividing any additional up-front latency by the number of interactions, then subtracting the average improvement-per-interaction from that total. It's almost impossible to imagine any app with sessions long enough to make 30-second up-front waits worthwhile, never mind a benefits application form.

These projects should never have allowed "frontend frameworks" within a mile of their git repos. That they both picked React (a system with a lurid history of congenital failure) is not surprising, but it is dispiriting.

Previous posts here have noted that site structure and critical user journeys largely constrain which architectures make sense:

Sites with short average sessions cannot afford much JS up-front.

These portals serve many functions: education, account management, benefits signup, and status checks. None of these functions exhibit the sorts of 50+ interaction sessions of a lived-in document editor (Word, Figma) or email client (Gmail, Outlook). They are not "toothbrush" services that folks go to every day, or which they use over long sessions.

Even the sections that might benefit from additional client-side assistance (rich form validation, e.g.) cannot justify loading all of that code up-front for all users.

The failure to recognise how inappropriate JavaScript-based SPA architectures are for most sites is an industry-wide scandal. In the case of these services, that scandal takes on whole new dimension of reckless irresponsibility. ↩︎
JavaScript-based SPAs yank the reins away from the browser while simultaneously frontloading code at the most expensive time.

SPA architectures and the frameworks built to support them put total responsibility for all aspects of site performance squarely on the shoulders of the developer. Site owners who are even occasionally less than omniscient can quickly end up in trouble. It's no wonder many teams I work with are astonished at how quickly these tools lead to disastrous results.

SPAs are "YOLO" for web development.

Their advocates' assumption of developer perfection is reminiscent of C/C++'s approach to memory safety. The predictable consequences should be enough to disqualify them from use in most new work. The sooner these tools and architectures are banned from the public sector, the better. ↩︎
Confoundingly, while CalSAWS has not figured out how to enable basic caching and compression, it has rolled out firewall rules that prevent many systems like PageSpeed Insights from evaluating the page through IP blocks.

The same rules also prevent access from IPs geolocated to be outside the US. Perhaps it's also a misconfiguration? Surely CalSAWS isn't trying to cut off access to services for users who are temporarialy visiting family in an emergency, right? ↩︎
There's a lot to say about BenefitsCal's CloudFront configuration debacle.

First, and most obviously: WTF, Amazon?

It's great that these options are single-configuration and easy to find when customers go looking for them, but they should not have to go looking for them. The default for egress-oriented projects should be to enable this and then alert on easily detected double-compression attempts.

Second: WTF, Deloitte?

What sort of C-team are you stringing CalSAWS along with? Y'all should be ashamed. And the taxpayers of California should be looking to claw back funds for obscenely poor service.

Lastly: this is on you, CalSAWS.

As the procurer and approver of delivered work items, the failure to maintain a minimum level of in-house technical skill necessary to call BS on vendors is inexcusable.

New and more appropriate metrics for user success should be integrated into public reporting. That conversation could consume an entire blog post; the current reports are little more than vanity metrics. The state should also redirect money it is spending with vendors to enhance in-house skills in building and maintaining these systems directly.

It's an embarrassment that this site is as broken as it was when I began tracing it three years ago. It's a scandal that good money is being tossed after bad. Do better. ↩︎
It's more likely that CalSAWS are inept procurers and that Gainwell + Deloitte are hopeless developers.

The alternative requires accepting that one or all of these parties knew better and did not act, undermining the struggling kids and families of California in the process. I can't square that with the idea of going to work every day for years to build and deliver these services. ↩︎
In fairness, building great websites doesn't seem to be Deloitte's passion.

Deloitte.com performs poorly for real-world users, a population that presumably includes a higher percentage of high-end devices than other sites traced in this post.

But even Deloitte could have fixed the BenefitsCal mess had CalSAWS demanded better. ↩︎
It rankles a bit that what the UK's GDS has put into action for the last decade is only now being recognised in the US.

If US-centric folks need to call these things "products" instead of "services" to make the approach legible, so be it! Better late than never. ↩︎
I generally have not not posted traces of the private sector sites I have spent much of the last decade assisting, preferring instead to work quietly to improve their outcomes.

The exception to this rule is the public sector, where I feel deeply cross-pressured about the sort of blow-back that underpaid civil servants may face. However, sunlight is an effective disinfectant, particularly for services we all pay for. The tipping point in choosing to post these traces is that by doing so, we might spark change across the whole culture of frontend development. ↩︎
getcalfresh.org is the only direct competitor I know of to a state's public benefits access portal, and today it drives nearly half of all SNAP signups in California. Per BenefitsCal meeting notes (pdf), it is scheduled to be decommissioned next year.

Unless BenefitsCal improves dramatically, the only usable system for SNAP signup in the most populous state will disappear when it goes. ↩︎
Capping the effective bandwidth of a server is certainly one way to build solidarity between users on fast and slow devices.

It does not appear to have worked.

The glacial behaviour of the site for all implies managers in EOHHS must surely have experienced DTA Connect's slowness for themselves and declined to do anything about it. ↩︎
The content and structure of DTA Connect's JavaScript are just as horrifying as BenefitsCal's^[1:1] and served just as poorly. Pretty-printed, the main bundle runs to 302,316 lines.

I won't attempt nearly as exhaustive inventory of the #fail it contains, but suffice to say, it's a Create React App special. CRAppy, indeed.

Many obsolete polyfills and libraries are bundled, including (but not limited to):
- A full copy of core-js
- Polyfills for features as widely supported as fetch()
- Transpilation down to ES5, with polyfills to match
- A full userland elliptic-curve cryptography library
- A userland implementation of BigInt
- A copy of zlib.js
- A full copy of the Public Suffix List
- A full list of mime types (thousands of lines).
- What appears to be a relatively large rainbow table.
Seasoned engineers reading this list may break out in hives, and that's an understandable response. None of this is necessary, and none of it is useful in a modern browser. Yet all of it is in the critical path.

Some truly unbelievable bloat is the result of all localized strings for the entire site occurring in the bundle. In every supported language.

Any text ever presented to the user is included in English, Spanish, Portuguese, Chinese, and Vietnamese, adding megabytes to the download.

A careless disregard for users, engineering, and society permeates this artefact. Massachusetts owes citizens better. ↩︎
Some junior managers still believe in the myth of the "10x" engineer, but this isn't what folks mean when they talk about "productivity". Or at least I hope it isn't. ↩︎
Angular is now on version 18, meaning Maryland faces a huge upgrade lift whenever it next decides to substantially improve myMDTHINK. ↩︎
Browsers port to macOS for CEOs, hipster developers, and the tech press. Macs are extremely niche devices owned exclusively by the 1-2%. Its ~5% browsing share is inflated by the 30% not yet online, almost none of whom will be able to afford Macs.

Wealth-related factors also multiply the visibility of high-end devices (like Macs) in summary statistics. These include better networks and faster hardware, both of which correlate with heavier browsing. Relatively high penetration in geographies with strong web use also helps. For example, Macs have 30% share of desktop-class sales in the US, vs 15% worldwide..

The overwhelming predominance of smartphones vs. desktops seals the deal. In 2023, smartphones outsold desktops and laptops by more than 4:1. This means that smartphones outnumber laptops and desktops to an even greater degree worldwide than they do in the US.

Browser makers keep Linux ports ticking over because that's where developers live (including many of their own). It's also critical for the CI/CD systems that power much of the industry.

Those constituencies are vocal and wealthy, giving them outsized influence. But iOS and and macOS aren't real life; Android and Windows are, particularly their low-end, bloatware-filled expressions.

Them's the breaks. ↩︎

Reckoning: Part 1 — The Landscape

August 12, 2024

This is part one of the four-part series "Reckoning"

Instead of an omnibus mega-post, this investigation into JavaScript-first frontend culture and how it broke US public services has been released in four parts. Other posts in the series:

When you live in the shadow of a slow-moving crisis, it's natural to tell people about it. At volume. Doubly so when engineers can cheaply and easily address the root causes with minor tweaks. As things worsen, it's also hard not to build empathy for Cassandra.

In late 2011, I moved to London, where the Chrome team was beginning to build Google's first "real" browser for Android.^[1] The system default Android Browser had, up until that point, been based on the system WebView, locking its rate of progress to the glacial pace of device replacement.^[2]

In a world where the Nexus 4's 2GB of RAM and 32-bit, 4-core CPU were the high-end, the memory savings the Android Browser achieved by reusing WebView code mattered immensely.^[3] Those limits presented enormous challenges for Chromium's safer (but memory-hungry) multi-process sandboxing. Android wasn't just spicy Linux; it was an entirely new ballgame.

Even then, it was clear the iPhone wasn't a fluke. Mobile was clearly on track to be the dominant form-factor, and we needed to adapt. Fast.^[4]

Browsers made that turn, and by 2014, we had made enough progress to consider how the web could participate in mobile's app-based model. This work culminated in 2015's introduction of PWAs and Push Notifications.

Disturbing patterns emerged as we worked with folks building on this new platform. A surprisingly high fraction of them brought slow, desktop-oriented JavaScript frameworks with them to the mobile web. These modern, mobile-first projects neither needed nor could afford the extra bloat frameworks included to paper over the problems of legacy desktop browsers. Web developers needed to adapt the way browser developers had, but consistently failed to hit the mark.

By 2016, frontend practice had fully lapsed into wish-thinking. Alarms were pulled, claxons sounded, but nothing changed.

It could not have come at a worse time.

By then, explosive growth at the low end was baked into the cake. Billions of feature-phone users had begun to trade up. Different brands endlessly reproduced 2016's mid-tier Androids under a dizzying array of names. The only constants were the middling specs and ever-cheaper prices. Specs that would set punters back $300 in 2016, sold for only $100 a few years later, opening up the internet to hundreds of millions along the way. The battle between the web and apps as the dominant platform was well and truly on.

<em>Tap for a larger version.</em><br>Geekbench 5 single-core scores for 'fastest iPhone', 'fastest Android', 'budget', and 'low-end' segments.<br><br>Nearly all growth in smartphone sales volume since the mid '10s occured in the 'budget' and 'low-end' categories. — *Tap for a larger version.*
Geekbench 5 single-core scores for 'fastest iPhone', 'fastest Android', 'budget', and 'low-end' segments.

Nearly all growth in smartphone sales volume since the mid '10s occured in the 'budget' and 'low-end' categories.

But the low-end revolution barely registered in web development circles. Frontenders poured JavaScript into the mobile web at the same rate as desktop, destroying any hope of a good experience for folks on a budget.

Median JavaScript bytes for Mobile and Desktop sites. <br>As this blog has <a href='/series/performance-inequality/'>covered at length,</a> median device specs were largely stagnant between 2014 and 2022. Meanwhile, web developers made sure the — Median JavaScript bytes for Mobile and Desktop sites.
As this blog has covered at length, median device specs were largely stagnant between 2014 and 2022. Meanwhile, web developers made sure the "i" in "iPhone" stood for "inequality."

Prices at the high end accelerated, yet average selling prices remained stuck between $300 and $350. The only way the emergence of the $1K phone didn't bump the average up was the explosive growth at the low end. To keep the average selling price at $325, three $100 low-end phones needed to sell for each $1K iPhone; which is exactly what happened.

And yet, the march of JavaScript-first, framework-centric dogma continued, no matter how incompatible it was with the new reality. Predictably, tools sold on the promise they would deliver "app-like experiences" did anything but.^[5]

Billions of cheap phones that always have up-to-date browsers found their CPUs and networks clogged with bloated scripts designed to work around platform warts they don't have.

Environmental Factors #

In 2019, Code for America published the first national-level survey of online access to benefits programs, which are built and operated by each state. The follow-up 2023 study provides important new data on the spread of digital access to benefits services.

One valuable artefact from CFA's 2019 research is a post by Dustin Palmer, documenting the missed opportunity among many online benefits portals to design for the coming mobile-first reality that was already the status quo in the rest of the world.

Worldwide mobile browsing surpassed desktop browsing sometime in 2016.

US browsing exhibited the same trend, slightly delayed, owing to comparatively high desktop and laptop ownership vs emerging markets.

Moving these systems online only reduces administrative burdens in a contingent sense; if portals fail to work well on phones, smartphone-dependent folks are predictably excluded:

28% of US adults in households with less than $30K/yr income are smartphone-dependent, falling to only 19% for families making 30-70K/yr.

But poor design isn't the only potential administrative burden for smartphone-dependent users.^[6]

The networks and devices folks use to access public support aren't latest-generation or top-of-the-line. They're squarely in the tail of the device price, age, and network performance distributions. Those are the overlapping conditions where the consistently falsified assumptions of frontend's lost decade have played out disastrously.

California is a rich mix of urban and hard-to-reach rural areas. Some of the poorest residents are in the least connected areas, ensuring they will struggle to use bloated sites.

It would be tragic if public sector services adopted the JavaScript-heavy stacks that frontend influencers have popularised. Framework-based, "full-stack" development is now the default in Silicon Valley, but should obviously be avoided in universal services. Unwieldy and expensive stacks that have caused agony in the commercial context could never be introduced to the public sector with any hope of success.

Right?

Next: Object Lesson: a look at California's digital benefits services.

Thanks to Marco Rogers, and Frances Berriman for their encouragement in making this piece a series and for their thoughtful feedback on drafts.

A "real browser", as the Chrome team understood the term circa 2012, included:
- Chromium's memory-hungry multi-process architecture which dramatically improved security and stability
- Winning JavaScript performance using our own V8 engine
- The Chromium network stack, including support for SPDY and experiments like WebRTC
- Updates that were not locked to OS versions
↩︎
Of course, the Chrome team had wanted to build a proper mobile browser sooner, but Android was a paranoid fiefdom separate from Google's engineering culture and systems. And the Android team were intensely suspicious of the web, verging into outright hostility at times.

But internal Google teams kept hitting the limits of what the Android Browser could do, including Search. And when Search says "jump", the only workable response is "how high?"

WebKit-based though it was (as was Chrome), OS-locked features presented a familiar problem, one the Chrome team had solved with auto-update and Chrome Frame. A deal was eventually struck, and when Chrome for Android was delivered, the system WebView also became a Chromium-based, multi-process, sandboxed, auto-updating system. For most, that was job done.

This made a certain sort of sense. From the perspective of Google's upper management, Android's role was to put a search box in front of everyone. If letting Andy et al. play around with an unproven Java-based app model was the price, OK. If that didn't work, the web would still be there. If it did, then Google could go from accepting someone else's platform to having one it owned outright.^[7] Win/win.

Anyone trying to suggest a more web-friendly path for Android got shut down hard. The Android team always had legitimate system health concerns that they could use as cudgels, and they weilded them with abandon.

The launch of PWAs in 2015 was an outcome Android saw coming a mile away and worked hard to prevent. But that's a story for another day. ↩︎
Android devices were already being spec'd with more RAM than contemporary iPhones, thanks to Dalvik's chonkyness. This, in turn, forced many OEMs to cut corners in other areas, including slower CPUs.

This effect has been a silent looming factor in the past decade's divergence in CPU performance between top-end Android and iPhones. Not only did Android OEMs have to pay a distinct profit margin to Qualcomm for their chips, but they also had to dip into the Bill Of Materials (BOM) budget to afford more memory to keep things working well, leaving less for the CPU.

Conversely, Apple's relative skimpiness on memory and burning desire to keep BOM costs low for parts it doesn't manufacture are reasons to oppose browser engine choice. If real browsers were allowed, end users might expect phones with decent specs. Apple keeps that in check, in part, by maximising code page reuse across browsers and apps that are forced to use the system WebView.

That might dig into margins ever so slightly, and we can't have that, can we? ↩︎
It took browsers that were originally architected in a desktop-only world many years to digest the radically different hardware that mobile evolved. Not only were CPU speeds and memory budgets cut dramatically — nevermind the need to port to ARM, including JS engine JITs that were heavily optimised for x86 — but networks suddenly became intermittent and variable-latency.

There were also upsides. Where GPUs had been rare on the desktop, every phone had a GPU. Mobile CPUs were slow enough that what had felt like a leisurely walk away from CPU-based rendering on desktop became an absolute necessity on phones. Similar stories played out across input devices, sensors, and storage.

It's no exaggeration to say that the transition to mobile force-evolved browsers in a compressed time frame. If only websites had made the same transition. ↩︎
Let's take a minute to unpack what the JavaScript framework claims of "app-like experiences" were meant to convey.

These were code words for more responsive UI, building on the Ajax momentum of the mid-naughties. Many boosters claimed this explicitly and built popular tools to support these specific architectures.

As we wander through the burning wreckage of public services that adopted these technologies, remember one thing: they were supposed to make UIs better. ↩︎
When confronted with nearly unusable results from tools sold on the idea that they make sites easier, better, and faster to use, many technologists offer the variants of "but at least it's online!" and "it's fast enough for most people". The most insipid version implies causality, constructing a strawman but-for defense; "but these sites might not have even been built without these frameworks."^[9]

These points can be both true and immaterial at the same time. It isn't necessary for poor performance to entirely exclude folks at the margins for it to be a significant disincentive to accessing services.

We know this because it has been proven and continually reconfirmed in commercial and lab settings. ↩︎
The web is unattractive to every Big Tech company in a hurry, even the ones that owe their existence to it.

The web's joint custody arrangement rankles. The standards process is inscrutable and frustrating to PMs and engineering managers who have only ever had to build technology inside one company's walls. Playing on hard mode is unappealing to high-achievers who are used to running up the score.

And then there's the technical prejudice. The web's languages offend "serious" computer scientists. In the bullshit hierarchy of programming language snobbery, everyone looks down on JavaScript, HTML, and CSS (in that order).

The web's overwhelmingly successful languages present a paradox: for the comfort of the snob, they must simultaneously be unserious toys beneath the elevated palettes of "generalists" and also Gordian Knots too hard for anyone to possibly wield effectively. This dual posture justifies treating frontend as a less-than discipline, and browsers as anything but a serious application platform.

This isn't universal, but it is common, particularly in Google's C++/Java-pilled upper ranks.^[8] Endless budgetary space for projects like the Android Framework, Dart, and Flutter were the result. ↩︎
Someday I'll write up the tale of how Google so thoroughly devalued frontend work that it couldn't even retain the unbelievably good web folks it hired in the mid-'00s. Their inevitable departures after years of being condescended to went hand-in-hand with an inability to hire replacements.

Suffice to say, by the mid '10s, things were bad. So bad an exec finally noticed. This created a bit of space to fix it. A team of volunteers answered the call, and for more than a year we met to rework recruiting processes and collateral, interview loop structures, interview questions, and promotion ladder criteria.

The hope was that folks who work in the cramped confines of someone else's computer could finally be recognised for their achievements. And for a few years, Google's frontends got markedly better.

I'm told the mean has reasserted itself. Prejudice is an insidious thing. ↩︎
The but-for defense for underperforming frontend frameworks requires us to ignore both the 20 years of web development practice that preceeded these tools and the higher OpEx and CapEx costs associated with React-based stacks.

Managers sometimes offer a hireability argument, suggesting they need to adopt these univerally more expensive and harder to operate tools because they need to be able to hire. This was always nonsense, but never more so than in 2024. Some of the best, most talented frontenders I know are looking for work and would leap at the chance to do good things in an organisation that puts user experience first.

Others sometimes offer the idea that it would be too hard to retrain their teams. Often, these are engineering groups comprised of folks who recently retrained from other stacks to the new React hotness or who graduated boot camps armed only with these tools. The idea that either cohort cannot learn anything else is as inane as it is self-limiting.

Frontenders can learn any framework and are constantly retraining just to stay on the treadmill. The idea that there are savings to be had in "following the herd" into Next.js or similar JS-first development cul-de-sacs has to meet an evidentiary burden that I have rarely seen teams clear.

Managers who want to avoid these messes have options.

First, they can crib Kellan's tests for new technologies. Extra points for digesting Glyph's thoughts on "innovation tokens."

Next, they should identify the critical user journeys in their products. Technology choices are always situated in product constraints, but until the critical user journeys are enunciated, the selection of any specific architecture is likely to be wrong.

Lastly, they should always run bakeoffs. Once critical user journeys are outlined and agreed, bakeoffs can provide teams with essential data about how different technology options will perform under those conditions. For frontend technologies, that means evaluating them under representative market conditions.

And yes, there's almost always time to do several small prototypes. It's a damn sight cheaper than the months (or years) of painful remediation work. I'm sick to death of having to hand-hold teams whose products are suffocating under unusably large piles of cruft, slowly nursing their code-bases back to something like health as their management belatedely learns the value of knowing their systems deeply.

Managers that do honest, user-focused bakeoffs for their frontend choices can avoid adding their teams to the dozens I've consulted with who adopted extremely popular, fundamentally inappropriate technologies that have had disasterous effects on their businesses and team velocity. Discarding popular stacks from consideration through evidence isn't a career risk; it's literally the reason to hire engineers and engineering leaders in the first place. ↩︎

Older Posts

Newer Posts