CSS 2.1 escape sequences and encodings

I know there's plenty of good work being done over at places like http://ha.ckers.com, and http://www.thespanner.co.uk/. I have been researching CSS 2.1 and testing some very thorough and complex HTML and CSS filters myself, and trying to find the stuff that gets through. These are the type of filters that want to consume a stream of HTML or CSS, parse it, and return a safe version without executable script. This approach usually involves some complex lexical parsing with whitelisting and blacklisting combined.

I found the following exploits really useful in bypassing filters:
body { margin: expression\(alert\(\'xss')\) }
body { margin: \65\78\70\72\65\73\73\69\6F\6E\28\61\6C\65\72\74\28'\78\73\73\49'\29\29 }

Most of the CSS parsing was being done well and according to the lexical model at http://www.w3.org/TR/CSS21/grammar.html.

For testing this of course I had to reference the escaping characters allowed in the CSS 2.1 spec at http://www.w3.org/TR/REC-CSS2/syndata.html#escaped-characters.

Some of the more interesting details I found were in the CSS 2.1 Syntax and basic data types portion of the spec, especially http://www.w3.org/TR/CSS21/syndata.html#rule-sets

For example I hadn't seen much like this being used before:
Here is a more complex example. The first two pairs of curly braces are inside a string, and do not mark the end of the selector. This is a valid CSS 2.1 rule.

p[example="public class foo\
private int x;\
foo(int x) {\
this.x = x;\
}"] { color: red }

This is sort of similar to the 'content' attribute and property which should treat their properties as literal text, but in some cases I'm seeing values get interpreted by most browsers. More on that later if it proves useful. http://msdn2.microsoft.com/en-us/library/cc196962(VS.85).aspx

So what worked?

Well the HTML filter held up pretty well, I threw everything on the http://ha.ckers.org/xssAttacks.xml list at it, plus a ton of other razor-sharp modifications.

The CSS 2.1 test cases at http://www.w3.org/Style/CSS/Test/CSS2.1/20061011/ came in very handy. Especially the escape sequences like:

p.class\#id { background\: red; \}
p.c\lass { bac\kground: g\reen; }
p.c\00006Cas\000073 { back\000067round: gr\000065en; }
p.c\06C ass { back\67 round: gr\000065 en; }

This is the stuff that really screws with filters. This stuff confirms what the spec defines for escaping characters. Basically:

  • Backslashes are allowed anywhere in attribute names or property names to indicate character escapes

  • A special CSS character's meaning gets canceled with a backslash for example:

    • \" or \# or \{

  • Any ISO 10646 character (any unicode character basically) can be represented with up to 6 hexadecimal digits. So the following sequences all represent a double quote:

    • \22

    • \0022

    • \000022

  • So the neat thing to remember is that unless the backslash is followed by a hexadecimal character (A-F0-9) then it's treated just as a special character escape. If the character has no special meaning, then the backslash escape is just ignored. So the following stuff gets treated as an escaping, but really is just ignored by the preprocessor because there's no special character being escaped:

    • ex\pression

    • expres\sion\(alert())

You can see where this is going. The end result of bypassing some of these filters that weren't allowing the special expression property through:

body { margin: expre\ssion(alert()) }

But that wasn't enough, it turned out that the filter didn't allow anything followed by a open parenthesis. So this would never get through:

body { margin: any\thing() }

However, escaping worked here too, the ending exploits were:
body { margin: expression\(alert\(\'xss')\) }
body { margin: \65\78\70\72\65\73\73\69\6F\6E\28\61\6C\65\72\74\28'\78\73\73\49'\29\29 }