More fun with Regex!

Extending The Curly Quotes Module is fun! With a little understanding of how RegEx works, you can do all sorts of fun things. For instance, I added the following into my ‘Curly Quotes’ template:

<MTAddRegex>s|&([^#])|&#038;$1|g</MTAddRegex>

Which does what you may ask? Well, it does the same thing as the Hivelogic URL Cleaner. It finds all instances of an & in the site, and converts it to the equivalent symbolic notation (&#038;), except when it is followed by a # (indicating it’s already a symbolic notation of something). As an added bonus, this version will clean up your &’s all over the place, so your page will validate. Well, except for that pesky RDF thing.

There is a downside, in that I’m now force to used numeric symbolic equivalents instead of the handy shortcuts (like &#060; instead of &lt;), but I’m sure I’ll figure out some workaround for that too. Perhaps if I simply replace all those with their numerical equivalents before replacing the ampersand? I’ll sleep on this one.

8 Replies to “More fun with Regex!”

  1. Oh – because I replace all &’s, except those followed by #’s. So the shortcuts, which don’t have the #’s, will be converted, so they won’t work anymore. Although I was thinking I could probably do something where I also escape any series of characters ending in a ;. That should work also. But is slightly more complex, so I didn’t do that. I’m still thinking on this one.

  2. Oh – because I replace all &’s, except those followed by #’s. So the shortcuts, which don’t have the #’s, will be converted, so they won’t work anymore. Although I was thinking I could probably do something where I also escape any series of characters ending in a ;. That should work also. But is slightly more complex, so I didn’t do that. I’m still thinking on this one.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: