Browser user-agents and variable-width utf-8 encoding issues

Table 3.1B from Corrigendum #1: UTF-8 Shortest Form provides the basis for some interesting test cases. Hopefully I'll have something to report about this this soon. In the meantime John Hernandez and I are structuring tests across all browsers to look for new XSS vectors through character absorption, swallowing, and exclusion.


























































Table 3.1B. Legal UTF-8 Byte Sequences
Code Points 1st Byte 2nd Byte 3rd Byte 4th Byte
U+0000..U+007F00..7F
U+0080..U+07FFC2..DF80..BF
U+0800..U+0FFFE0A0..BF80..BF
U+1000..U+FFFFE1..EF80..BF80..BF
U+10000..U+3FFFFF090..BF80..BF80..BF
U+40000..U+FFFFFF1..F380..BF80..BF80..BF
U+100000..U+10FFFFF480..8F80..BF 80..BF