Web browser charset label tests

Tests for Web browser supported charset encodings

This page loops through all known charset aliases, or labels, from IANA and tests to see whether or not the Web browser supports them. For each charset alias, a hidden iframe is appended to the document which causes an HTTP request to a PHP page that responds with that alias in the Content-Type header:

Content-Type: text/plain; charset=UTF-16

The iframe element is checked to determine whether or not its contentDocument.charset (or contentDocument.characterSet) property matches the charset alias named in the test case. A "match" is considered true when either a) the charset names are identical or b) the charset names are not identical but exist in the same alias group together (which means they are simply aliases for one another). If the two match (case-insensitive) then the Web browser supports the encoding and the test is considered a PASS If the two didn't match it was considered a FAIL which means that the charset label was not supported by the Web browser, or perhaps it triggered a fallback to the default encoding, which is often UTF-8.

Caveats

The test compares alias groupings strictly according to IANA's documented charsets. That is, though "latin5" may be a superset of "windows-1254", the test would report a fail when such a mapping was returned. That's because IANA has not listed them as aliases for one another, and rightfully so, after all the mapping tables are quite different between latin5 and windows-1254, in the range 0x80 to 0x9F.

However, there also seems to be a problem where two labels should be equivalent but are not because of the above. E.g. "ISO-8859-6-E" and "ISO-8859-6" will result in a FAIL because IANA does not list them as equivalent aliases, even though they seem to have identical mappings according to ICU.

TODO

1. Map alias groups together so they don't "fail" when they actually just map to an alternative alias e.g. when test case label "greek" results in the Web browser returning a "iso-8859-7" it should be noted as an alternative alias and the test should "pass".

2. Optionally hide all failures that fallback to UTF-8

3. Seems to be a few false positives because of the caveats mentioned above - replace or augment the alias groupings from IANA with those groupings from the ICU project Converter Explorer?

Some references

The following references were useful in creating this test.

The ICU project Converter Explorer
Anne van Kesteren's label test
W3C Encoding specification and associated test cases
Gecko supported character sets
Legacy Encodings support in Opera Presto 2.8

Start the testing...

Either type in a charset alias to run a single test, or run a test of all charset aliases. Results will be listed in either the PASS or FAIL column and displayed with the following format:

test_case, frame_charset

Where 'test_case' is the charset alias name used in testing, and 'frame_charset' was the charset encoding that the Web browser ended up interpreting the document iframe as. For example, iso-ir-144 , ISO-8859-5 means that the test case used 'iso-ir-144' and the Web browser interpreted that as 'ISO-8859-5'. Since the two labels are aliases for one another, this indicates a test that passed. If it failed, it would indicate that the Web browser either used a fallback encoding to interpret the text, or that it threw an error.

Hide results that fallback to UTF-8 or ISO-8859-1 (gets rid of potentially uninteresting results)