Infrequently Noted

Alex Russell on browsers, standards, and the process of progress.

Comments for Long Term Web Semantics


Just started reading the article so nothing to comment on yet, but I think I've found a typo:

"This is a plea to stop using the phrase “semantic HTML” to mean “markup that does with it looks like it will do”."

I think you mean "does what it looks like"?

Fixed, thanks!

by alex at
I have been thinking about this - here's the thing - ultimately common semantics help make understanding a second page easier after you have the infrastructure for understanding the second.

For language, learning a language will not guarantee that you will understand someone speaking, and every one uses words with slightly different meanings, emphasis, and contexts, but it is easier to adapt to a speaker's idiosyncrocies if you know the speaker's language.

Thus failure for semantics to be universal is not failure of common semantics, the key is making things easier, because the only way to make things automatic is if you abstract the act of understanding itself, which is beyond the realm of content.

Going back to tables, tables often have tabular data, sometimes with unnecessary markup, sometimes intermixed with other things, but if a document has tabular data than looking for tables makes it easier to find that data, even if the semantics aren't perfect. Generally, even if there are also tables used for presentational purposes, there are ways of distinguishing them (how the markup is being used around it), just as we distinguish homophones by context.

Stepping back a little bit more, what do we gain from this. For me, easier scraping - much of the web, including search engine, relies on scraping to a greater or lesser degree, somewhere in some part of the overall infrastructure of a site. So that's not a useless thing.

Secondly, maintainability. You asked about Constituencies, well there are developers and then there are other developers. How easy is it for the next guy to pick up what you are doing and understand it? How easy is it for a guy who is not working on your project but is interfacing with your project to understand it. These are key questions.

Ultimately, semantic html is not really for the end users. The overall html result, including all of the hacks is for the end users. Even screen-readers probably benefit more from developers using hacks to adapt their site than a proper html layout. The key here is re-use for the future and by others. And that is the core of the open web - that anyone can take what you did and steal a few ideas from it as well as some extra data.

Hi Alex,

This is a really interesting post. I've read through it several times now. Still trying to grok it all :)

I'm curious to know what you think about the semantic-ui (http://semantic-ui.com/) library.

It seems to be using the term "semantic" to mean, as you put it, "well written" or "copy edited." And there are some decisions which don't sit well with me. For instance, that it's tag agnostic so there are examples of

<

div class="button">. That would seem to be a situation in which the HTML language does provide an expressive enough tag but it's being ignored. But maybe that's ok, kind of like your

<

table> example. I'd like to know your thoughts on that point.

Since the library is architected such that each piece of UI is standalone, it seems like it lends itself really well to being turned into Web Components. I really like this aspect of it, especially since I've spent a ton of time working with Bootstrap which can't easily be broken down into the individual components. This probably means the CSS is redundant in places but hopefully modular. What do you think about this approach?

One point to mention is that a nice clean html table opens up very nicely in excel (probably other spreadsheet programs as well). It is also very easy to export into a csv.

One key aspect of the idea of semantics in the web is the idea that people beyond the original developers can rework the information. I see that vision fading in the web today as people focus more on the immediate author-to-user communication, and not opening the door for possible side conversations that can be even more meaningful.

Luke touches a point that I've always remembered when trying to determine what I mean if I want to wrap up some content in something "semantic": the original Tim-Berners-Lee-dream of device-independence.

Of course we're no longer just reading documents anymore. Now that we need a wider variety of interfaced, device-independence gets harder and harder.

And as you said in your article, the users are changing. How they use things changes, and they and developers invent new vocabulary (or steal from real life and SciFi stories) whenever new things come along.

Obvious indeed that semantics can't mean anything if we're not all talking about the same language, in the same language.

by Stomme poes at

Hi Luke,

First, I realise it was a long post, but I did mention assistive technologies. It wouldn't really be me if i hadn't -- I'm the guy who made a11y a priority for Chrome Frame and lead the Dojo project when we were working with IBM to become one of the first toolkits with ARIA, key navigation, high-contrast, and low-motor-skills support. I didn't mention screen readers by name because I understand what a small (albeit important) sliver of the a11y world they are. The fact that HTML allows reasonable zooming and re-layout is just as enabling as any of the screen-reader specific features are.

But you've quite missed the larger point: if the primary mode of interaction with the web were, say, screen readers, I'd be arguing that it was the interactions between humans and the content that is spoken that is the important semantic -- regardless of how it was marked up.

That you've come to value the alternative axes of communication that HTML enables is a laudable thing, but losing the plot isn't. Nor is holding up an unattainable ideal as something to be defended. If we're going to make progress, it will be in terms of a world which can exist. I've sought to create an outline of the motives of the players. To say that I'm not including some other modality is neither here nor there. My model includes them to the extent that they're used. Invalidating it requires constructing an alternative argument about how people will behave when, e.g., 3d displays show up. You could make such an argument on the basis of, perhaps, rational expectation theory. Or some local maxima in which 3d rules the roost. Any of that would have made an interesting counterpoint.

Regards

by alex at
pfew, read everything. Long but worth reading for understanding your position.

I have the feeling, that the post is mixing different ways of looking at "semantics", the word. A word can have different meanings ;) In most debates that I have witnessed or participated myself, people had a different assumptions. And indeed a Web developer, a browser implementer, a simple user, a poet, a librarian, all of them had a notion of semantics which was different. We use language (I'm not a native English speaker) and we don't come with the same cultural background when using these words. The same happens for the Web. For example, in the discussions, I see often "academic" used by "engineers" (another assumption) with the assumed meaning "out-of-reality". A bit like saying a poet is not part of the real world, or not describing the real world. I usually prefer to think that there are different communities using the same tools in different ways. You also make the assumption that people working on ontologies are not aware that languages and meanings change. :) I think you need to discuss more with them ;) The first rule of participation is to be inclusive and assume good will.

But I'm stopping here my babbling about what I think others might think. :) And how I think about the Web and HTML. Let's say I'm a power end user of HTML. I do write code by hand, because I care about the structure of the document (Some people collect cheese labels). I want to be able to be categorized the content in my documents, not only for others but for me. It has a cost. And there is an endless path of over categorization. The important is to be able to do what you care for and if the feedback loop is short enough that it makes sense to oneself. You took the element "a" by its very interesting capability to create links (I'm avoiding semantics here on purpose). I have my own preferred elements such as "blockquote", "cite" and attributes such as "cite". Another story for another day. What I'm missing more and more from the Web is related to the need to be online and to have a page completely broken once you do save as. I'm talking about this HTML page where the HTML is just a shell for wrapping some JSON and JS. Once you do save as, the content doesn't exist anymore. The View source too.

I want to be able to write documents and structures, to be able to attach meaning (through attributes or interactions). I want to be able to select a name in a Web page and to say here this person connect it to this id in my address book. I want to be able to give a cite="urn:isbn:1234567890" to a quote and configure my system to choose what reference system I will be using for it. I want to be able to indeed have a place element and/or place attribute and be able to tie a map to it through right click. Though I do NOT want to have vendor tie-in such as when the element is in the page and have a Google map without a way to choose the map system you would like. I want to be able to associate terms in a document to a vocabulary and to be able to evolve it.

So to come back to semantics. I was used in the past to categorize elements in HTML (aka creating an ontology, or meaning, or semantics, yada, yada).

Structure: p, li, table, etc. Action: a Presentation: b, i, etc. Meaning: title, blockquote, cite, samp, var, etc. Hermaphrodite: span, div

Non scientific classification, just a view on the world, aka semantics.

Aren't your points under "Semantic Evolution" in many ways covered by XML and its family of standards? Or at least it was headed that way.

So many times I see people wishing for custom elements and I automatically start thinking of XML, et al.

Hey Rob,

XML had a couple of key flaws -- notably a lack of forgiveness in the face of errors and some real nuttyness about DTDs derived from the SGML legacy -- and the communities that embraced it added other baggage that prevented them from winning the day. I don't think XML gets a do-over. It lost.

Here's why custom elements are different: they aren't trying to enable the idea of "first class" distributed extensibility. XML didn't have a language, it was a system for defining languages. HTML, on the other hand, IS a language and custom elements are a way to introduce "slang" words into that language. This is entirely different ot the dream of defining a brand-new language and hoping that speakers of multiple languages can co-exist.

Regards

by alex at
Surely the whole point of "semantic html" is to convey information in a way which isn't dependent on how it looks. It tries to make a distinction between what belongs in markup and what belongs in a stylesheet. One problem with the tables and WYSIWYG editors of the early 2000s is that they tightly coupled style and substance. Your whole argument is based on the premise that the only way people want to consume the web is visually on 2 dimensional screens. The absence of any mention of screen readers is very notable. Screen readers are important, not only because they make the web accessible to more people right now, but they also show that there is more than one way to consume content. It's very short-sighted (pardon the pun) to ignore their existence as they may hint at other ways people will want to access the web in future. I'm not just talking about computers reading out what's on the page. There's potential for lots of other outputs: 3d displays, some sort of tactile response, maybe even interfacing direct with people's brains. Sure, that all sounds quite far fetched, but the point is that we don't know what the future has in store. I think the best way to be prepared for that unknown is to keep separate the information we're trying to convey from how we'd like it to look. For all its faults, "semantic html" is a nice thing to aim for, even if we don't get it right every time.
For what it's worth, I think Alex is right on the money here. This notion of democratizing the growth of semantics in HTML is downright seminal. Alex is deftly removing a mammoth wrench from the cogs of sane scope expansion as Moore's law and consumer demand outpaces the standards bodies' ability to innovate.