Fast DOM Queries in Today’s Browsers

What follows is a janky hack. If you do not have the stomach for things that are useful in the real world, please stop reading here. “But it’s not standards compliant!” comments will receive no sympathy. Validatorians, you’ve been warned.

If you’re still reading, you’re probably aware of the crappy primitives that the W3C has bestowed us with for scripting arbitrary collections of nodes. Things like Microsoft’s HTC and Mozilla’s XBL allow for browser-specific markup-upgrade paths but these aren’t really feasible in the “real world” since they require lots of code branching and different semantics for attaching a behavior. Tools only succeed where they lower our costs. This is why Dojo and Behavior (and even netWindows, back in The Day) work so hard to provide a portable basis for applying behaviors.

Given that the W3C has f’d us in the ear and that the browser manufacturers can’t get it together enough to come up with one non-standard way to apply scripts to collections of nodes, we’re back to iteration. Updating the collection to which a behavior should be applied when a DOM is updated presents a particular challenge. IE doesn’t throw mutation events for DOM changes nor does it provide client-side XSLT. Both hands our tied behind our back.

So we need something else. document.getElementsByName() would work quite well if the browsers paid attention to name attributes for any element. Alas, they don’t. Which brings us back to the one fast query browsers will support on any element: document.getElementById(). With it we can build a fast, efficient query function for every browser but Safari:

function elementsById(id){
  var nodes = [];
  var tmpNode = document.getElementById(id);
  while(tmpNode){
    nodes.push(tmpNode);
    tmpNode.id = "";
    tmpNode = document.getElementById(id);
  }
  for(var x=0; x<nodes .length; x++){
    nodes[x].id = id;
  }
  return nodes;
}

A permutation of this that caches the results and does not set the IDs back to the original value allows re-runs of the function to determine if new elements have been added to the group and/or if a node should be removed:

var groupCache = {};
function elementsById(id){
  if(!groupCache[id]){
    groupCache[id] = [];
  }
  var nodes = groupCache[id];
  for(var x=0; x<nodes .length; x++){
    if(nodes[x].id != ""){
      nodes.splice(x, 1);
      x--;
    }
  }
  var tmpNode = document.getElementById(id);
  while(tmpNode){
    nodes.push(tmpNode);
    tmpNode.id = "";
    tmpNode = document.getElementById(id);
  }
  return nodes;
}

Obviously, getting a list of added/removed nodes from this function might be preferable to receiving the full list, but I’ll leave a better API for this as an exercise. Safari is the only browser which appears to not support this method of constructing queries, but we can fall back to iteration. It’s certainly not going to be any slower than current methods. The hack is also made somewhat less useful by the W3C’s bone-headed decision to limit elements to a single ID.

The jury is still out as to whether or not this is going to prove useful, but I can already imagine it being an optional optimization for Dojo users looking to make their apps go like hell on pages with complex DOMs.

If only it weren’t necessary.

Update: After reading much WebCore source code, I’m not aware of a way to make elementsById() work on the current Safari. There is good news, however. On the latest Konqueror release and nightly Safari builds this hack works flawlessly. In short, this will very soon the the most widely available, fastest DOM query method. More news regarding elementsById() to follow shortly.

41 Comments

  1. wayneoo
    Posted March 20, 2006 at 11:34 am | Permalink

    Are you proposing we group elements by their ID? Thats a bit nuts? It defo won’t work in ASP.NET. I tried to get a decent getElementsByName() method for a web app I was building, it turned out to be nigh on impossible to get it optimised to a point where I could use it. I also tried using the getElementsByClass() method which is fast on one hit, but looping through the dom doing that? Nah! But at least it can handle multiple class names. I for one would definately give up a vital organ as payment for a resolution to this!!!

  2. Posted March 20, 2006 at 11:38 am | Permalink

    Yes, I’m proposing grouping elements by ID. As for ASP.NET, I can only say that you might strongly consider better/more flexible tools.

    Regards

  3. wayneo
    Posted March 20, 2006 at 3:50 pm | Permalink

    Actually, I’ve been thinking about this. It’s not really ASP.NETs fault this time ;)

    No really, what we have here is a lack of support in the underlying technology, ASP.NET is actually using the IDs as they were meant, it is us that are not. I mean, I barely use any of the ASP.NET framework (viewstate and all that giberish), I use it for the C# language, which I happen to like.

    I originally attempted this with multi class names and tried to get all elements with a class name in the class string, this is OK but again, not its intended use. We need a group name which can allow an element to be part of one or more groups and we _need_ a fast way of selecting those groups. The stuff I could do with that!!!! Oh man!!

  4. Posted March 21, 2006 at 3:18 pm | Permalink

    But you can add as many custom attributes to an element as you want right?

  5. Posted March 21, 2006 at 3:30 pm | Permalink

    Sure, but you can’t use them to quickly query the DOM. The big issue here is DOM parse speed. The elementsById() hack allows us to quickly query for an arbitrary collection. My tests are showing it to be an order of magnitude faster, and the relative gain increases as the complexity of the page does.

    Regards

  6. Posted March 22, 2006 at 5:09 am | Permalink

    I have something against this method. Not because it breaks standards, but because intuitionally a class stands for a group of objects, and an id refers to a single object.
    In my world, I wish there was a natively implemented document.getElementsByClassName(). That would make it possible to solve the problem that you’re pointing at. (For example by using multiple class names for a node, if necessary)
    But seeing how the world looks, your solution isn’t to bad after all.

  7. Posted March 26, 2006 at 7:14 am | Permalink

    Based on this Gecko bug:

    https://bugzilla.mozilla.org/show_bug.cgi?id=311681

    It looks like multiple elements with the same ID working is not guaranteed (though based on the resolution, will continue to for now).

  8. Posted March 27, 2006 at 11:51 am | Permalink

    IE already has getElementsByName:

    http://msdn.microsoft.com/workshop/author/dhtml/reference/methods/item.asp

  9. Posted March 27, 2006 at 1:10 pm | Permalink

    Dean: getElementsByName doesn’t work on arbitrary element types. It’s a total non-starter as the basis for a fast query language.

    Regards

  10. Posted March 27, 2006 at 3:08 pm | Permalink

    After reading this I’ve decided to start using the same primary key value for multiple rows in my database tables too. That way, I can query all the matching records a whole lot faster!

  11. Posted March 27, 2006 at 10:13 pm | Permalink

    Alex, read it again. You can do this:

    var byName = element.all.item("some-id");

    or:

    var byName = document.all.item("some-id");

    That’s pretty much what you are looking for isn’t it?

  12. Posted March 30, 2006 at 11:49 am | Permalink

    (Sorry for the repeat, last didn’t seem to make it through.)

    Alex,

    One solution that I’ve only tested on Internet Explorer 6 is to stream down a custom (unrecognized) element with some pre-determined position in relationship to the desired element class in question. Then you query for that custom element by it’s tag name, then use the predetermined position to get the actual element to which you want to attach a behavior.

    HTML:

    <Behavior:MyBehavior />
    <div id="SomeBehavingElement"></div>

    Behavior Code:


    (function()
    {

    (function()
    {
    var oBehaviorNodes = document.getElementsByTagName("Behavior:MyBehavior");
    for (var nIndex = 0; nIndex

  13. Posted March 30, 2006 at 11:50 am | Permalink

    (Third time’s a charm.)

    Alex,

    One solution that I’ve only tested on Internet Explorer 6 is to stream down a custom (unrecognized) element with some pre-determined position in relationship to the desired element class in question. Then you query for that custom element by it’s tag name, then use the predetermined position to get the actual element to which you want to attach a behavior.

    HTML:

    Behavior Code:

    (function()
    {

    (function()
    {
    var oBehaviorNodes = document.getElementsByTagName(“Behavior:MyBehavior”);
    for (var nIndex = 0; nIndex < oBehaviorNodes.length; nIndex++)
    {
    var oElement = oBehaviorNodes[nIndex].nextSibling;
    addBehavior(oElement.id);
    }
    })();

    function addBehavior(strID)
    {
    // …
    }
    })();

  14. Posted March 30, 2006 at 11:51 am | Permalink

    (Dang, make that fourth. I really apologize.)

    Alex,

    One solution that I’ve only tested on Internet Explorer 6 is to stream down a custom (unrecognized) element with some pre-determined position in relationship to the desired element class in question. Then you query for that custom element by it’s tag name, then use the predetermined position to get the actual element to which you want to attach a behavior.

    HTML:

    <Behavior:MyBehavior />
    <div id="SomeBehavingElement"></div>

    Behavior Code:

    (function()
    {

    (function()
    {
    var oBehaviorNodes = document.getElementsByTagName("Behavior:MyBehavior");
    for (var nIndex = 0; nIndex < oBehaviorNodes.length; nIndex++)
    {
    var oElement = oBehaviorNodes[nIndex].nextSibling;
    addBehavior(oElement.id);
    }
    })();

    function addBehavior(strID)
    {
    // ...
    }
    })();

  15. Posted March 31, 2006 at 8:15 am | Permalink

    Update: the above code works in FireFox 1.0.7 as well. If you try adding an xmlns:Behavior=”[GUID]” attribute to the HTML element on the page, IE localizes the namespace when performing tag name searches, and FireFox does not. This means you have to search for “MyBehavior” and “Behavior:MyBehavior” respectively.

  16. Sergio
    Posted April 1, 2006 at 12:59 pm | Permalink

    I’m with Douglas (#10). Let’s also pass a new law requiring all people with the same first name to have the same SSN.

  17. Posted April 3, 2006 at 7:17 am | Permalink

    Luckily, the DOM allows you to scope queries on ID, using #11’s syntax. That kind of weakens the primary key analogy, 10 and 17.

  18. Nikola Klaric
    Posted April 3, 2006 at 7:20 am | Permalink

    Alex,

    “IE doesn’t (…) provide client-side XSLT.”

    I’m not sure what you meant by this because, actually, IE does provide client-side XSLT by the interfaces:

    .selectSingleNode(XPathExpr)
    .selectNodes(XPathExpr)
    .transform()

    which perfectly implement all relevant portions of XSLT and XPath.

  19. Nikola Klaric
    Posted April 3, 2006 at 7:21 am | Permalink

    (entitites got lost)

    Alex,

    “IE doesn’t (…) provide client-side XSLT.?

    I’m not sure what you meant by this because, actually, IE does provide client-side XSLT by the interfaces:

    <node>.selectSingleNode(XPathExpr)
    <node>.selectNodes(XPathExpr)
    <processor>.transform()

    which perfectly implement all relevant portions of XSLT and XPath.

  20. Posted April 3, 2006 at 7:34 am | Permalink

    Nikola: those are only relevant to the MSXML DOM, not the HTML DOM, which is what we actually care about.

  21. Ryan Gahl
    Posted April 10, 2006 at 6:59 am | Permalink

    There has to be some serious flaws in the way you are implementing your UIs if you have to resort to this. And to say “The hack is also made somewhat less useful by the W3C’s bone-headed decision to limit elements to a single ID” is laughable. You’re saying, “we should be able to put more than one ID on an element”, and trying to pass that off as a great idea, because it supports this horrible horrible hack of yours.

    Yes, it’s horrible. Go ahead defend it all you want. Hack up your pages with a hundred elements that have the same ID, great idea.

  22. Posted April 10, 2006 at 8:21 am | Permalink

    Ryan: Once you get over your outrage, perhaps you’ll give the concept that multiple “keys” might point to the *same* peice of data some serious thought. You can apply multiple classes to any element, why not allow a single element to be addressed by multiple names?

    Even if you hate the elementsById() hack, you’re not making a rational case for why the DOM should represent a collision-free hash (which it clearly doesn’t anyway since elementsById() works!).

  23. Ryan Gahl
    Posted April 10, 2006 at 10:05 am | Permalink

    For the sake of this article, it’s unfortunate that you have garnered such respect in this industry, because there will be people that read this article and think it must be sound advice because it comes from you.

    Look, I’m all for speeding things up, but this is just rubbish. Instead of advocating unclean, non-standard hacks, create an architecture based on autonomous controls that handle setting up their event handlers as they are needed, incorporating lazy loading where applicable. I create applications which are used in hospitals, and I can all but gaurantee they are more complicated than the UIs you are dealing with, with hundreds of widgets active (with event handlers) on a page at a time, and I’ve never had problems with speed that would make me considering resorting to poor practices like this. As each widget is loaded, it handles it’s own DOM event handler wiring, and each DOM element is uniquely IDed and keyed to the widget which it belongs to (so many obviuos benefits here, and covered so many times by so many people I won’t even bother going into all that here). ID = identifier. 1 element, 1 ID. I’d hate to have to even look at your markup, let alone try to extract any semantic meaning from it if I was a developer on your team.

    This method is not scalable, not maintanable, not standard, not good.

    And I’m not outraged, just appalled. If you did something like this as a member of any team I was involved with and I had to clean up your mess, then I would be outraged.

    You knew all this before you posted the article, which is why you had to dedicate your first paragraph to defending your method Why not just present it as a hack that some people could try and leave it at that?, Rather than trying to say “here’s a hack, but I made it so it’s a GOOD hack… a hack that the industry should look at as something great.. and because the W3C’s standards don’t support this, they are a bunch of bone-heads”… your ego is way too big man.

    It’s just a hack.

  24. Posted April 10, 2006 at 10:53 am | Permalink

    Ryan: what I *think* I wrote (and please correct me if I’m wrong) is that the W3C hasn’t provided scripters with fast primtives to implement a set of patterns that we are relying ever-more-heavily on. I’ve been a proponent of what is now called “progressive enhancement” since about 2001 when I implemented the first widget parser for netWindows. The limiting factor in *real world applications* for this parser (and now behvior.js) is the speed of running the query over the DOM. The W3C clearly grokked that such iterations would be necessaray and therefore we have (variously): XPath, Tree Walkers, global searches (element by id), scoped searches (elements by tag name), and parent-child eneumeration.

    What I’m suggesting is that the W3C decided to go the abstract (but slow) route instead of concrete (but fast). Instead of providng a getElementsByClassName (which I’d be much in favor of!) they just said “oh, people can implement that in script”. Ever wonder why there’s no node.prependChild() method? Because the working group (in it’s infinite wisdom) decided that you could just determine a.) whether or not a node had children, b.) choose insertBefore() or appendChild() as necessaray and c.) wrap it up in your own interface if you liked. Working groups fuck up. I’m suggesting that the omission of a fast arbitrary-group primitive from the core DOM spec (and I don’t care what it is) is one such fuck-up.

    To understand why all the alternatives to XPath and/or elementsById() are slow, you have to know a little bit about how browsers themselves optimize things. Renderers like Mozilla create the JS/DOM bindings for a node only when it is accessed from script the first time. Let that sink in a bit.

    There is object creation/allocation cost for *every chaff node in an interative search*. That cost goes up (linearly, we hope) with the number of chaff nodes. Scripters are at a structural disadvantage from the C/C++ implementation of the DOM that the JS engine overlays. What I’m presenting here is an algorithm that appears to be O(N) and *not* O(M) (where N is the nodes of interest and M is total nodes in the tree). It is an interesting result regardless of whether or not you find it elegant.

    As for whether or not I’m suggesting that this hack is somehow “good”, please see the paragraph of deprecating remarks that clearly denotes it as something for people with hard, real-world problems to solve. You seem to think I’m somehow “defending” it as more elegant than it is (I’m not). Weirdly, you just skipped all that and just went straight on to the flaming.

    FWIW, I’m working on a permeutation of Dean’s excellent HTC+XPath approach for dojo.behavior, but it will still support this method for those who choose to use it. I’m *not* trying to break standards compliance left and right. I’d like a better solution, but applications are different than pages and we draw the lines differently in the kinds of apps I work on than perhaps the ones you develop.

    Regards

  25. Posted April 10, 2006 at 12:39 pm | Permalink

    Hi!

    Cool thing but I am a validatorian and I use Prototype for a similar thing ( I am so married to $()) … so it’s useless for me and I even think that’s not very recommendable to use

  26. Posted April 10, 2006 at 12:40 pm | Permalink

    .. whoops…
    Javascript that way. JS is a dirty langauge and tricks like that make it even dirtier. Try Prototype ;)

    Greets,
    .mario

  27. Posted April 10, 2006 at 2:12 pm | Permalink

    Hi all,

    I really don’t think doing that (Alex’s) hack will make accessing elements with same id faster. The faster way is to use document.all. Most of the browsers implemented document.all now. Please check the fastness by yourself with the following link:
    http://www.geocities.com/keelypavan/DOMFasterMethods.html .
    Please click on the second button first as Alex’s method is resetting ids. I tested in IE 6, Firefox 1.5, opera 8.54 and in all these browsers document.all method took almost 0ms. I know it’s not a W3C standard but when all the browsers implemented that, I don;t see any other reason why we should stop using that.

    Pavan Keely

  28. Posted April 10, 2006 at 2:48 pm | Permalink

    Pavan and Dean,

    That’s a great result! It works on nightly Safaris and Konq 3.5.1 as well.

    Outstanding.

  29. TI
    Posted April 11, 2006 at 7:11 am | Permalink

    Pavan, I found your code wouldn’t draw the div’s under Firefox 1.5.0.1. Mozilla does not like appending child nodes to document.body. To confirm further I modified your code with a simple document.write and it does not like document.all either.

    Meanwhile why do people feel the need to become so outraged? Alex put up the disclaimer that this was not a standards friendly manouver and even called it a ‘hack’.

    On the upside flaming does prompt some very informative justification posts. From Alex at least.

  30. Posted April 11, 2006 at 10:15 am | Permalink

    TI,

    The problem of not showing up Divs in Firefox is because I am using innerText which is not supported in Firefox, in place of that if you try using document.createTextNode, it displays the Divs and Firefox does support appendChild on document.body. But I have to agree with you that Forefox is not supporting Firefox. I think I somehow missed it because my plan was to test in 3 diff browsers.

    Pavan Keely

  31. Posted April 11, 2006 at 10:19 am | Permalink

    I apologize for that wrong statement “But I have to agree with you that Forefox is not supporting Firefox”..I meant, “But I have to agree with you that Firefox doesn’t support document.all”

    Pavan Keely

  32. Posted April 14, 2006 at 2:59 pm | Permalink

    Why is it that I always miss shizzo like this. A month later here I am “after the fact” reading about a query like this and a discussion between Dean and Alex… posh.

  33. Posted April 15, 2006 at 8:13 am | Permalink

    This is hack but if you use both ID and Name within a tag the elements will be properly collected by document.getElementsByName.

    Example 1

    Example 2

    document.write(document.getElementsByName(“example”).length)

  34. Posted April 15, 2006 at 8:57 am | Permalink

    This is hack but if you set both ID and Name within a tag the elements will be properly collected by document.getElementsByName.

    <B ID="example" NAME="example">Example 1</B>
    <B ID="example" NAME="example">Example 2</B>
    <SCRIPT>
    document.write(document.getElementsByName("example").length)
    </SCRIPT>

  35. Tapper
    Posted April 15, 2006 at 1:26 pm | Permalink

    Hacking with ID and breaking its uniqueness has other side-effects including potentially screwing up CSS as well as linking since ID is used in those technologies as well.

    I think the need for speed and appropriate collections to apply behaviors to is worthy, but this is surely not a reasonable way to accomplish it.

  36. Anonymous
    Posted April 19, 2006 at 4:32 am | Permalink

    Please read Pavan Keely’s blog
    http://keelypavan.blogspot.com/2006/04/faster-way-of-accessing-dom-elements.html . He gives a simple solution-no need for this “hideous” ( as someone has mentioned ) solution.

  37. erik
    Posted May 5, 2006 at 1:12 am | Permalink

    What about giving each element its own unique id but any elements you need to group provide a common prefix for? e.g. id=”group.1″, id=”group.2″, id=”group.3″ and simply have an integer that increments after you lookup each element by id until an element isn’t found?

    Your ids are all still unique and you get the performance boost of getElementById().

  38. Posted September 12, 2006 at 7:28 pm | Permalink

    Today I come across a problem with DOM manipulation performance and I tried your trick but didn’t work for me.

    My problem was that I had to resize almost all the rows of a table dynamically (it doesn’t matter why, we needed this). The strategy was to iterate through all the rows setting the heights to the apropiate values.

    This thing takes a *lot* of time. About 70 ms for a 600 rows table (= 42 seconds).

    After some fight with it, I managed to solve it hidding (style.display=’none’) the div containing the table before doing the resizing and showing it again afterwards (style.display=’inline’). This only little thing made the time drop to 2 ms per row (about 1.2 seconds). As you can see, a big improvement!

    Aparently, Internet Explorer defers the DOM drawing/update until it is shown.

    I hope this helps somebody!

  39. Posted December 6, 2006 at 8:57 pm | Permalink

    This is what I have, works with Safari/Konqueror and all other browsers, is fast, very fast should say, and it does not remove the ID, should work on IE5/IE4.

    The first query will fill the cache, from there on queries will be immediate.-

    The full example is at: my demo place if you want to see the timings.

    Hope the cut & paste works, enjoy and comment please.

    Here goes the snippet:

    /*
    * ElementsById
    *
    * Author: Diego Perini
    * Updated: 06/11/2006
    * Version: 0.0 (from parent)
    *
    * Extracted from latest versions of IPORT/STYLER engines.
    *
    * Returns an array of elements with specified ID.
    */
    function ElementsById($id) {
    var c=0,i=0,j=0,k=0;
    var nodes=[],storage=arguments.callee.storage;
    var childrens=document.body.childNodes,len=childrens.length;

    // only BODY elements are checked, no HTML or HEAD nodes/subnodes
    if (storage && storage.length && storage.length != 0) {
    k = $id;
    while (storage[k]) {
    nodes[nodes.length] = storage[k];
    k = $id + ‘_’ + (++i);
    }
    } else {
    storage = { length: 0 };
    while (len > i) {
    c = childrens[i];
    if ((k = c.id) == $id) {
    nodes[nodes.length] = c;
    if(storage[k]) {
    k = c.id + ‘_’ + (++j);
    }
    }
    i++;
    storage[k] = c;
    storage.length++;
    }
    arguments.callee.storage = storage;
    }
    return nodes;
    }

  40. Posted December 7, 2006 at 2:01 pm | Permalink

    Sorry for the mess…in the above example:

      document.body.childNodes

    should be:

       document.getElementsByTagName(‘*’)

    by cut & paste I took it from the wrong example, but it works, have a look in my site I am updating it.

    Diego

  41. Posted October 30, 2007 at 8:51 am | Permalink

    That’s a great result! It works on nightly Safaris and Konq 3.5.1 as well.
    Outstanding.

3 Trackbacks

  1. By Faster DOM Queries on March 28, 2006 at 7:15 am

    [...] Alex Russell, head honcho of the Dojo Foundation, suggests a hideous hack to speed up DOM queries. Why does he want to do this? He wants behavioral extensions to be applied as quickly as possible and he has a point. For a good user experience the code to support a web interface should be applied as quickly as possible. A delay can lead to an unresponsive interface, confusing the user about what works and what doesn’t. So, while I think speeding up DOM queries is a worthy aim, I think that his “id hack” is a step too far. I am supposed to be a standards advocate after all. [...]

  2. By Ajaxian » Faster DOM Queries on April 10, 2006 at 4:53 am

    [...] Alex started this off with a “janky hack” that uses our favourite document.getElementById in a naughty way, by grouping elements by one id. His hack includes a cached element version. [...]

  3. By dojo.foo » dojo.query: A CSS Query Engine For Dojo on February 4, 2007 at 12:10 pm

    [...] It’s an important goal, and ensuring that simple-looking queries don’t “hurt” disproportionately is a difficult task. After some initial work with my admittedly grotty getElementsById hack, my outlook wasn’t bright. Other systems weren’t looking much better. Using the behavior: expression(); hack degrades page performance something fierce and can’t be made synchronous, a key requirement to meet developer ease-of-use goals. Requiring a callback for every query result is a no-go. [...]