Unicode security attacks and test cases – fuzzing with Unicode

23 Apr 2009

When it comes to fuzzing parsers, protocols, and other software, I want the fuzzer to be capable of producing tests specific to Unicode. Here’s what it should do at a minimum:

Generate half a surrogate pair in UTF-8 or UTF-16

Generate illformed byte sequences for UTF-8 and UTF-16

Generate overlong UTF-8

Generate unassigned and reserved code points

Generate codepoints outside of the valid range

Generate interesting control characters and characters with special meaning like the BOM, embedding, overrides, etc.

I’ve got some code that does most of these things. Maybe I should elaborate on them some more… Does Peach or another fuzzing framework provide this already?

lookout.net

Unicode security attacks and test cases – fuzzing with Unicode