How to parse hashtag, mention and URL in a tweet

I was trying to make a simple twitter application. Using jQuery and AJAX, grabbing twits was not that hard. But twitter returns twit in just simple characters. I needed to show the correct link for the mentions and hashtags and also needed to parse the URLs. So, after little search I came up with some Regex to grab it and then I made a simple function to parse twit. I needed such function in javascript as well as PHP. I just want to share that function here.

function parseTwit(str)
 //parse URL
 str = str.replace(/[A-Za-z]+://[A-Za-z0-9-_]+.[A-Za-z0-9-_:%&~?/.=]+/g,function(s){  
 //parse user_name
 str = str.replace(/[@]+[A-Za-z0-9_]+/g,function(s){
  var user_name = s.replace('@','');
 //parse hashtag
 str = str.replace(/[#]+[A-Za-z0-9_]+/g,function(s){
  var hashtag = s.replace('#','');
 return str;


Simple regular expressions are used here. But you can use more robust and advanced regular expressions also.


For Mention


For Hashtag


Same output can also be obtained from PHP in server side.

function parseTwit($str)
 $patterns = array();
 $replace = array();
 //parse URL
 foreach($urls[0] as $url)
  $patterns[] = $url;
  $replace[] = '<a href="'.$url.'" >'.$url.'</a>';
 //parse hashtag
 foreach($hashtags[1] as $hashtag)
  $patterns[] = '#'.$hashtag;
  $replace[] = '<a href="'.$hashtag.'" >#'.$hashtag.'</a>';
 //parse mention
 foreach($usernames[1] as $username)
  $patterns[] = '@'.$username;
  $replace[] = '<a href="'.$username.'" >@'.$username.'</a>';
 //replace now
 $str = str_replace($patterns,$replace,$str);
 return $str;

PHP function preg_match_all matches the regular expression globally and it is similar to /g modifier. All the matches and to be replaced values are saved in the array. After parsing all, those findings are replaced at once by str_replace function.
Img src : hughlashbrooke

3 thoughts on “How to parse hashtag, mention and URL in a tweet

  1. This works great! However, I wanted to mention that your URL regex will not parse properly if there is a period (.) in the tweet directly after the URL. To prevent this, use this regex instead:

    Since the URLs given out by twitter's API are all shortened URLs, you don't have to worry about a period being a valid character at the end of the URL.

  2. Nicely done. Thanks.

    A hint: I think it's better to include "#" for hashtag seaches. See here for criteria:

    I also tend to always encapsulate URL arguments in rawurlencode and HTML text in htmlspecialchars, a la better safe than sorry, but in this case the regular expressions should filter out anything that would be in conflict.

  3. how to parse Unicode character in it when i put #जलवायुपरिवर्तन its not becomes #tag hyperlink. i check it in demo.

