List of characters for testing Unicode transformations and best-fit mapping to dangerous ASCII

I'm attaching two CSV files for use in test cases and tools.  The uni2asc.csv contains all of the Unicode characters that map to something ASCII < 0x80.  The bestfit.csv  contains all of the known best-fit  mappings to dangerous ASCII between legacy charsets and Unicode.

uni2asc.csv - for straight Unicode to Unicode mappings
bestfit.csv - for legacy charset to Unicode mappings

I gave these to Gareth so they may wind up in HackVertor.

The Unicode database contains meta data about every character, including compatibility mappings, normalization mappings, case mappings, and other decomposition data.  It's useful for testing to know what special Unicode characters may transform to dangerous ASCII.  For example:

  • U+2134 SCRIPT SMALL O character will transform to the U+006F LATIN SMALL LETTER in certain cases


Of course, if you're testing for SQL injection or XSS you probably want to know what transforms to dangerous characters like ' and <.  We attempted to automate some of this in our x5s tool which has done a good job so far, and we have a big update for that coming soon.

In the bestfit.csv file you'll find all of best-fit mappings from Unicode to dangerous ASCII < 0x80 (and vice versa) in many of the legacy charsets from http://unicode.org/Public/MAPPINGS/.  There's some wild legacy stuff in here.  For example:


  • In APL-ISO-IR-68, 0x27 maps to 0x5D in Unicode, and vice versa.


If you put these to use anywhere please let me know so I can pass the word along.
Yeah they will :)

These are great big thanks!