JS RegEx’s Are Slow

I’ve been trying to avoid regular expressions in dojo.query. It’s a difficult tradeoff since regexes are very space efficient vs. lots of indexOf operations. A little digging on the point turned up a fascinating article on why regexes in most languages are slower than they need to be.

Now if only there were some explanation of why regular expression string replacement, not just matching, is brutally slow across the board.

9 Comments

  1. jesse
    Posted February 5, 2007 at 10:17 pm | Permalink

    omg you’re sick for reading through all of that. …just sick  =p

  2. Posted February 6, 2007 at 6:02 am | Permalink

    To speed up regex matches, anchor your strings.

    /^foo$/i is faster than /foo/i when you want to match the whole string.

    string replacement would be slower than just matching because it’s a write operation rather than read.

  3. Posted February 6, 2007 at 6:41 am | Permalink

    Hey Philip,

    So my comment about replacement is mostly due to the apparent inefficiency in the process, not a simple-minded “why are they slower than matching?” kind of comment. Clearly they’ll be slower, but there’s no justification for them being as slow as they are today.

    For instance, when tuning the template system in Dojo, we looked at using regexps to speed up attribute matching, but the speed of replacement was such that it was faster to iterate recursively over the resulting DOM than to use regexps. Weirdly, it was the same story on FF and IE. There’s something rotten there.

    Regards

  4. Posted February 6, 2007 at 10:51 pm | Permalink

    Alex-dunno if this helps, but I did a lot of performance tuning with Regexes in Java, and one surprising finding was that in many cases constructing the regex was the main slowdown and using it to match wasn’t so bad in addition. So a little indexOf to test if we had a prayer of matching before taking the hit to construct the regex did wonders. That doesn’t answer the root question (why are they slow at all), but it may be of use in the short term.

    PS: Your cool WYSYWIG editor box for comments doesn’t work for me in IE7 (a normal textarea shows up underneath the submit button and when I submit it says nothing was entered.)

  5. Daniel
    Posted February 7, 2007 at 1:35 am | Permalink

    I think you’re being premature. Profiling suggests that in my code ‘toXPath’ takes about 2% of the execution time of the XPath version.

    I’m sure it could be optimized further, by replacing some of the regular expressions but I’d be very suprised if you could improved on the speed of the ‘tokenise’ function without sacrificing accuracy (of course, it can immediately be improved by removing the validation).

  6. Daniel
    Posted February 7, 2007 at 5:41 am | Permalink

    Oh dear, looks like something went wrong when I uploaded selector.js this morning and I won’t be able to update it for a day or two. Serves me right for using freebie hosting. Until I fix it, here’s a link to a working (ish) version.

  7. Posted February 7, 2007 at 8:15 am | Permalink

    I’ve found however, that some regex string replaces are faster than a substr.

  8. Mihai
    Posted February 17, 2007 at 9:38 am | Permalink

    Perl regexp matching is slow??

  9. Posted August 20, 2007 at 12:07 pm | Permalink

    Regexp compilation can slow things (as noted by Joseph Smarr), but the article that you cite is about regexp matching finding a first match for a given regexp.

    Replacement can be slow because there may be many matches for a given expression, and replacement often involves named subexpressions ($1, etc. which change the inherent execution time a lot).

    Any string changes will test the string representation implementation (and often the storage manager it uses). If the string representation can’t share structure with the original string, a new string has to be allocated — that’s slow, and the string has to be copied. If the number of matches grows with the size of the string, the performance is going to deteriorate in an n**2 fashion, because of all the copying.

    A string representation that uses sharing and pointer structures to avoid copying is more complex, and you will pay for the faster substitutions with slower access and updates to the string because it’s not an indexable buffer of character code points.

    Such structures can also introduce garbage-collection issues as portions deleted from a string may still be in memory, in the original character-buffers.

    This is an implementation area that’s been hard to optimize since the original classic papers on string-processing languages in the 1960’s. Look up Griswold and Snobol.

One Trackback

  1. By Ajax Performance » Catching up on the news bin on February 7, 2007 at 8:42 am

    […] In other Dojo news, Alex Russell posted a brief gripe about the incredible slowness of regular expressions, especially for replacements. The gripe included a link to a very detailed survey of regex suckage on multiple platforms, for those who want the deep dive. […]