Hi there, Yesterday I found out some spam in my website. By default, all user generated links are detected using this regex <a(.*?)href=["|'](.*?)["|'](.*?)</a> Code (markup): in this way the script detects the URL and outputs this code: <a href="http://someurl.com" rel="nofollow">SomE Anchor Text</a> Code (markup): The only thing I do is to add the rel="nofollow" tag. This is to protect some how my site from being devaluated from the search engines as well as it disencourages spammers to put links in my site as their links won't count to increase their search engine rankings. This is done by Wordpress software. However, some clever spammer managed to bypass the code by putting some code like this: <a href="http://spammyurl.com <a href="http://anotherspammyurl.com"">Some Anchor</a>>...</a> Code (markup): Apparently this code bypasses the regex an put directly this along with the rest of the code: <a href="http://spammyurl.com <a href="http://anotherspammyurl.com"">Some anchor</a> >...< a > Code (markup): I have tried testing the spammer code to find out what did he really do to mock the regex but it does not work when I tried it. Any thoughts?
Your code is vulnerable to XSS (look it up). Furthermore the vulnerability looks like its caused by: (.*?) within the regex, as that allows any character - anyone can bypass that by closing the tag and inserting malicous HTML and/or JavaScript.
I have followed the trace of the spammer and he also spammed some wordpress blogs successfully posting links without the rel=nofollow tag. I just do not know what is this guy doing.
see this realted problem is this code not working too: function nofollow($text){ return preg_replace('/(<a[\s\r\n]+[^>]+)>/i', '\\1 rel="nofollow">',$text); } Code (markup): try it please source: http://www.jooria.com/snippets?snippet=12 Code (markup):