Comments for JS RegEx's Are Slow
/^foo$/i is faster than /foo/i when you want to match the whole string.
string replacement would be slower than just matching because it's a write operation rather than read.
So my comment about replacement is mostly due to the apparent inefficiency in the process, not a simple-minded "why are they slower than matching?" kind of comment. Clearly they'll be slower, but there's no justification for them being as slow as they are today.
For instance, when tuning the template system in Dojo, we looked at using regexps to speed up attribute matching, but the speed of replacement was such that it was faster to iterate recursively over the resulting DOM than to use regexps. Weirdly, it was the same story on FF and IE. There's something rotten there.
Regards
PS: Your cool WYSYWIG editor box for comments doesn't work for me in IE7 (a normal textarea shows up underneath the submit button and when I submit it says nothing was entered.)
I'm sure it could be optimized further, by replacing some of the regular expressions but I'd be very suprised if you could improved on the speed of the 'tokenise' function without sacrificing accuracy (of course, it can immediately be improved by removing the validation).
Replacement can be slow because there may be many matches for a given expression, and replacement often involves named subexpressions ($1, etc. which change the inherent execution time a lot).
Any string changes will test the string representation implementation (and often the storage manager it uses). If the string representation can't share structure with the original string, a new string has to be allocated -- that's slow, and the string has to be copied. If the number of matches grows with the size of the string, the performance is going to deteriorate in an n**2 fashion, because of all the copying.
A string representation that uses sharing and pointer structures to avoid copying is more complex, and you will pay for the faster substitutions with slower access and updates to the string because it's not an indexable buffer of character code points.
Such structures can also introduce garbage-collection issues as portions deleted from a string may still be in memory, in the original character-buffers.
This is an implementation area that's been hard to optimize since the original classic papers on string-processing languages in the 1960's. Look up Griswold and Snobol.