Sie sind hier : sebastian1012.bplaced.net/ homepage-neu / Themen-Blog1 / why-url-validation-with-filter_var-might-not-be-a-good-idea.php

Why URL validation with filter_var might not be a good idea

Prefer this in German? Warum URL-Validierung mit filter_var keine gute Idee ist

Since PHP 5.2 brought us the filter_var function, the time of such monsters was over (taken from here):

$urlregex = "^(https?|ftp)\:\/\/([a-z0-9+!*(),;?&=\$_.-]+(\:[a-z0-9+!*(),;?&=\$_.-]+)?@)?[a-z0-9+\$_-]+(\.[a-z0-9+\$_-]+)*(\:[0-9]{2,5})?(\/([a-z0-9+\$_-]\.?)+)*\/?(\?[a-z+&\$_.-][a-z0-9;:@/&%=+\$_.-]*)?(#[a-z_.-][a-z0-9+\$_.-]*)?\$";
if (eregi($urlregex, $url)) {echo "good";} else {echo "bad";}

The simple, yet effective syntax:

filter_var($url, FILTER_VALIDATE_URL)

As third parameter, filter flags can be passed. Considering URL validation, the following 4 flags are availible:

FILTER_FLAG_SCHEME_REQUIRED
FILTER_FLAG_HOST_REQUIRED
FILTER_FLAG_PATH_REQUIRED 
FILTER_FLAG_QUERY_REQUIRED 

The first two FILTER_FLAG_SCHEME_REQUIRED and FILTER_FLAG_HOST_REQUIRED are the default.

Get started!

Alright, let’s look at some critical examples.

filter_var('http://example.com/"><script>alert("xss")</script>', FILTER_VALIDATE_URL) !== false; //true

Well, nobody said that filter_var was built to fight XSS. Let’s accept this and move on:

filter_var('php://filter/read=convert.base64-encode/resource=/etc/passwd', FILTER_VALIDATE_URL) !== false; //true

Way more critical. Any scheme will pass the filter. http(s) and ftp would have been acceptable, but this is problematic. filter_var has to deal with all the evilness that a url can contain.

filter_var('foo://bar', FILTER_VALIDATE_URL) !== false; //true

And the best

filter_var('javascript://test%0Aalert(321)', FILTER_VALIDATE_URL) !== false; //true

Let’s take a closer look: javascript is the scheme. Of course, hit javascript:alert(1+2+3+4); in the address bar of your browser and you’ll see:

Javascript-URL

Javascript-URL

This is the way that bookmarklets work and not a secret. But let’s move on: The double // starts an ordinary javascript comment and convinces filter_var that we are dealing with a valid url scheme – look at the examples above. After that, the sequence %0A follows, which is exactly the output of the following code:

echo urlencode("\n");

Get it? Because of the url encoded newline, the javascript comment started with // will be finished and what follows is arbitrary javascript code. Imagine a dating site where user urls are validated with filter_var and displayed on the front page. Very evil. Try it yourself.

And now?

The following modification of filter_var could be worth wile:

function validate_url($url)
{
	$url = trim($url);
	
	return ((strpos($url, "http://") === 0 || strpos($url, "https://") === 0) &&
		    filter_var($url, FILTER_VALIDATE_URL, FILTER_FLAG_SCHEME_REQUIRED | FILTER_FLAG_HOST_REQUIRED) !== false);
}

But even with this wrapping function, the – at least very unusual – url http://x passes validation. Maybe, the regex monsters are not that bad ;). And before I forget: filter_var is not multibyte capable. The absolutely valid url http://스타벅스코리아.com is being rejected:

var_dump(filter_var("http://스타벅스코리아.com", FILTER_VALIDATE_URL) !== false); //bool(false)

To conclude: use filter_var with care, adapt to your situation and be aware of the weaknesses. Finally, I’d like to recommend this nice collection of filter_var tests dependent on the filter flags. Ah, and have a look at Symfony 2’s url validator, if you like.