Infrequently Noted

Alex Russell on browsers, standards, and the process of progress.

Comments for JS RegEx's Are Slow

omg you're sick for reading through all of that. ...just sick  =p
by jesse at
To speed up regex matches, anchor your strings.

/^foo$/i is faster than /foo/i when you want to match the whole string.

string replacement would be slower than just matching because it's a write operation rather than read.

Hey Philip,

So my comment about replacement is mostly due to the apparent inefficiency in the process, not a simple-minded "why are they slower than matching?" kind of comment. Clearly they'll be slower, but there's no justification for them being as slow as they are today.

For instance, when tuning the template system in Dojo, we looked at using regexps to speed up attribute matching, but the speed of replacement was such that it was faster to iterate recursively over the resulting DOM than to use regexps. Weirdly, it was the same story on FF and IE. There's something rotten there.


by alex at
Alex-dunno if this helps, but I did a lot of performance tuning with Regexes in Java, and one surprising finding was that in many cases constructing the regex was the main slowdown and using it to match wasn't so bad in addition. So a little indexOf to test if we had a prayer of matching before taking the hit to construct the regex did wonders. That doesn't answer the root question (why are they slow at all), but it may be of use in the short term.

PS: Your cool WYSYWIG editor box for comments doesn't work for me in IE7 (a normal textarea shows up underneath the submit button and when I submit it says nothing was entered.)

I think you're being premature. Profiling suggests that in my code 'toXPath' takes about 2% of the execution time of the XPath version.

I'm sure it could be optimized further, by replacing some of the regular expressions but I'd be very suprised if you could improved on the speed of the 'tokenise' function without sacrificing accuracy (of course, it can immediately be improved by removing the validation).

by Daniel at
Oh dear, looks like something went wrong when I uploaded selector.js this morning and I won't be able to update it for a day or two. Serves me right for using freebie hosting. Until I fix it, here's a link to a working (ish) version.
by Daniel at
I've found however, that some regex string replaces are faster than a substr.
by Mihai at
Regexp compilation can slow things (as noted by Joseph Smarr), but the article that you cite is about regexp matching finding a first match for a given regexp.

Replacement can be slow because there may be many matches for a given expression, and replacement often involves named subexpressions ($1, etc. which change the inherent execution time a lot).

Any string changes will test the string representation implementation (and often the storage manager it uses). If the string representation can't share structure with the original string, a new string has to be allocated -- that's slow, and the string has to be copied. If the number of matches grows with the size of the string, the performance is going to deteriorate in an n**2 fashion, because of all the copying.

A string representation that uses sharing and pointer structures to avoid copying is more complex, and you will pay for the faster substitutions with slower access and updates to the string because it's not an indexable buffer of character code points.

Such structures can also introduce garbage-collection issues as portions deleted from a string may still be in memory, in the original character-buffers.

This is an implementation area that's been hard to optimize since the original classic papers on string-processing languages in the 1960's. Look up Griswold and Snobol.