Unicode root-cause security issues for generating test cases

09 Sep 2008

When it comes to Unicode implementations, there's a rich set of test cases to perform. Realizing it is the start. Automating it is the next step.

Most Unicode-related security bugs can be categorized into the following root-causes:

Canonicalization

Interpreting non-shortest form (e.g .UTF-8 encoding trickery)

Other decoding issues

Absorption (over-consumption)

Over-consuming invalid byte sequences or correcting rather than failing

When <41 C2 C3 B1 42> becomes <41 42>

Character deletion and swallowing

“deletion of noncharacters” (UTR-36)

<scr[U+FEFF]ipt> becomes <script>

Use replacement characters instead!

Interpreting Syntax replacements

white space and line feeds

E.g. when U+180E acts like U+0020

Best-fit mappings

When σ becomes s

When ′ becomes ‘

Buffer overruns

Incorrect assumptions about string sizes (chars vs. bytes)

Improper width calculations

Timing issues

handling Unicode after security gates

Sometimes handling Unicode before a gate can be a problem too! E.g. BOM handling

Log0

2008-09-18T06:28:37.000Z

That's very interesting! You really know a lot about international software security, it's something I'm trying to learn more about. =) Keep on writing, I'm feeding like hell.

So, what do you think of those multibyte characters? Say in SQL, if you inject a multibyte character that contains as part of its composition a single quote ('), and it was intrepreted as a closing quote for the SQL, but evaded the filter. Do you think it can be grouped into the list above as well?

lookout.net