Automatically Hyperlinking URLs

Aug 27, 2009 Author: Developer

Most forum and blog software automatically convert URLs in posts and comments into hyperlinked text. You could probably think of a simple way to implement this feature in your own site by matching http:// and then using a backreference to add an anchor tag around it. However, what if someone is actually using an anchor tag? Then you'd get a real mess!

Therefore, you need to think of a way to make sure that the URL is not already inside an anchor tag. You might think of using the grouping modifier ?!, which says reject anything that matches this group. However, this works in a regular expression only if the unwanted group follows what you want to match, because of the way regular expressions work—they consume input one character at a time and never look back. Therefore, you need a feature called a lookbehind assertion that essentially states check this condition when you get a match later on in the regular expression. To denote a negative lookbehind assertion, use a ?<! grouping modifier.

This said, here's the code for autolinking URLs, where we don't want to disturb anything that is prefixed by the href=" that you'd find in an anchor tag:


             '<a href="$1">$1</a>',

Most of this regular expression is a character class containing the valid characters in a URL. Obviously, there are numerous variations on this theme checking for valid domains, looking for exceptions on other weird HTML, looking for a trailing dot or comma, and so on. You probably needn't get too carried away.

views 2687
  1. Add New Comment