After last post on turning off useless backtracking by using atomic grouping, I kept on reading the Regular Expressions Cookbook and ran another experiment to validate the performance difference. Let's start with the results:
Normal Regex - # of loops: 1000, # of matches: 0, time (ms): 66149
Atomic Grouping - # of loops: 1000, # of matches: 0, time (ms): 11196
Note that a normal regex was 6 times slower than atomic grouping to fail to match.
The example from the book is a regex to match a well-formed html page. I saved a Wikipedia page, made a few adjustments to the program used in the last post (e.g. to read the html contents from file), and ran the tests.
The regular expression used is:
<html>(?>.*?<head>)(?>.*?<title>)(?>.*?</title>)(?>.*?</head>)(?>.*?<body[^>]*>)(?>.*?</body>).*?</html>Every test I ran with this regex and with the version without atomic grouping, I can see the normal regex being 6 times slower. If you want to know more about using atomic grouping or a regular regex, please read my last post.
And this difference was found without setting the regex to be compiled. After setting this flag, these are the values I get:
Normal Regex - # of loops: 1000, # of matches: 0, time (ms): 49319
Atomic Grouping - # of loops: 1000, # of matches: 0, time (ms): 9471
Still a pretty significant change between normal regex and atomic grouping.
1 comments:
worth digesting: http://swtch.com/~rsc/regexp/regexp1.html
Post a Comment