Since PHP 5.2 brought us the filter_var function, the time of such monsters was over (taken from here):
$urlregex = "^(https?|ftp)\:\/\/([a-z0-9+!*(),;?&=\$_.-]+(\:[a-z0-9+!*(),;?&=\$_.-]+)?@)?[a-z0-9+\$_-]+(\.[a-z0-9+\$_-]+)*(\:[0-9]{2,5})?(\/([a-z0-9+\$_-]\.?)+)*\/?(\?[a-z+&\$_.-][a-z0-9;:@/&%=+\$_.-]*)?(#[a-z_.-][a-z0-9+\$_.-]*)?\$"; if (eregi($urlregex, $url)) {echo "good";} else {echo "bad";}
The simple, yet effective syntax:
filter_var($url, FILTER_VALIDATE_URL)
As third parameter, filter flags can be passed. Considering URL validation, the following 4 flags are availible:
FILTER_FLAG_SCHEME_REQUIRED FILTER_FLAG_HOST_REQUIRED FILTER_FLAG_PATH_REQUIRED FILTER_FLAG_QUERY_REQUIRED
The first two FILTER_FLAG_SCHEME_REQUIRED and FILTER_FLAG_HOST_REQUIRED are the default.
Get started!
Alright, let’s look at some critical examples.
filter_var('http://example.com/"><script>alert("xss")</script>', FILTER_VALIDATE_URL) !== false; //true
Well, nobody said that filter_var was built to fight XSS. Let’s accept this and move on:
filter_var('php://filter/read=convert.base64-encode/resource=/etc/passwd', FILTER_VALIDATE_URL) !== false; //true
Way more critical. Any scheme will pass the filter. http(s) and ftp would have been acceptable, but this is problematic. filter_var has to deal with all the evilness that a url can contain.
filter_var('foo://bar', FILTER_VALIDATE_URL) !== false; //true
And the best
filter_var('javascript://test%0Aalert(321)', FILTER_VALIDATE_URL) !== false; //true
Let’s take a closer look: javascript is the scheme. Of course, hit javascript:alert(1+2+3+4); in the address bar of your browser and you’ll see:
This is the way that bookmarklets work and not a secret. But let’s move on: The double // starts an ordinary javascript comment and convinces filter_var that we are dealing with a valid url scheme – look at the examples above. After that, the sequence %0A follows, which is exactly the output of the following code:
echo urlencode("\n");
Get it? Because of the url encoded newline, the javascript comment started with // will be finished and what follows is arbitrary javascript code. Imagine a dating site where user urls are validated with filter_var and displayed on the front page. Very evil. Try it yourself.
And now?
The following modification of filter_var could be worth wile:
function validate_url($url) { $url = trim($url); return ((strpos($url, "http://") === 0 || strpos($url, "https://") === 0) && filter_var($url, FILTER_VALIDATE_URL, FILTER_FLAG_SCHEME_REQUIRED | FILTER_FLAG_HOST_REQUIRED) !== false); }
But even with this wrapping function, the – at least very unusual – url http://x passes validation. Maybe, the regex monsters are not that bad ;). And before I forget: filter_var is not multibyte capable. The absolutely valid url http://스타벅스코리아.com is being rejected:
var_dump(filter_var("http://스타벅스코리아.com", FILTER_VALIDATE_URL) !== false); //bool(false)
To conclude: use filter_var with care, adapt to your situation and be aware of the weaknesses. Finally, I’d like to recommend this nice collection of filter_var tests dependent on the filter flags. Ah, and have a look at Symfony 2’s url validator, if you like.