What are Regular Expressions? They are a way to search text. I don't know the specifics even though I have read about them on Wikipedia.
I love Regular Expressions (RegEx). I know they aren't the best for everything and I am not even that great at them. I like that they are logical and terse and strict.
I found this fun RegEx crossword site and after completing all the regular puzzles I am working on the player submitted ones. They should be much more challenging and force me to learn more of the syntax that I don't know well.
A couple things I use RegEx for are:
These aren't perfect cases though. The html/xml parsing needs to account for many possible types of tags and I might run several different expressions instead of one that covers everything. The email one is usually extremely simple and doesn't actually validate but makes sure it has a couple key components.
Below are some actual examples of simple regular expressions that I might use in code or to get through a large file in Notepad++.
Step 1:
<div.*?>.*?</div>
This will match either of these two texts: "<div>test</div>", "<div class='test'>test</div>". It will fail however if there are multiple lines, which is why I chose the div tag here.
Step 2:
(<div.*?>)(.|\r|\n)*?(</div>)
This will match a div with anything in between across multiple lines. Will still not get everything though. If there is a div within a div then it will not match that in way you might want.
For the markup:
<div class="div1">
<div class='div2'>
test1
</div>
</div>
The above regex will match this text only:
<div class="div1">
<div class='div2'>
test1
</div>
This is because it will find the first div and keep matching until it finds an ending div tag. I don't have a simple solution to that. I would likely have to replace a couple of times to get the results I needed.
I would probably not use the div tag and use something else if I could. The nice thing about the capture groups is that it makes replacing text much easier.
Using the RegEx:
(<div id='test'>)(.|\r|\n)*?(</div>)
Input:
<div id='test'>
<label>first</label>
</div>
Replace:
$1second$3
Output:
<div id='test'>second</div>
Obviously not perfect but I am sure you can see how it's useful.
The email example is much simpler.
.+?@.{2,}?\..{2,}
This isn't that great checking against a text file because it will match all kinds of things that have an @ symbol. This does work well for a single textbox input though. It doesn't actually test that everything is valid but it makes sure there is something before and after the @ and a period between the last two items.
I had planned to write a simple post about simple regular expressions but as you can see it can be complex. Some people find RegEx hard to understand and there are a number of ways that matches may be missed or included when they aren't wanted.