confusables.js - Unicode confusables in javascript

The Unicode Confusables have long been of interest in testing security of applications and social engineering. I work with Unicode often in tools and testing, and wanted to have the confusables data available in a javascript module confusables.js. The Unicode confusables are characters which are visually similar and easily confused with other characters. More information is available from the Unicode Consortium at http://www.unicode.org/reports/tr36/#visual_spoofing.

Because of some limitations in most javascript implementations, confusables.js requires a modified String.fromCodePoint and this polyfill by Mathias Bynens works just fine.

Also known as homoglyphs, lookalikes, and spoofs - the confusables are characters that visually resemble or are indistinguishable from another character. For example the following two characters are visually similar and confusing:

FF21 ; 0041 ; SA # ( A → A ) FULLWIDTH LATIN CAPITAL LETTER A → LATIN CAPITAL LETTER A

Sometimes during penetration testing, we want to bypass word blacklists, spoof URLs, spoof email addresses, or perform other tasks. Being able to generate lookalike strings can be quite useful in these cases, and we also know that bad guys will apply the same tactics to bypass antivirus or other security boundaries as well.

If you require more capability than this javascript provides, then go check out the Unicode Consortium's utility for generating confusables.

Note that generating a full list of all confusable permutations is expensive and often unnecessary, so confusables.js only generates a single permutation from randomly selected characters.

Installation

The test page index.html is running at http://lookout.net/test/confusablesjs

In a browser:

<script src="js/confusables.data.js"></script>
<script src="js/confusables.js"></script>
<script src="js/fromcodepoint.js"></script>

Two public methods are available with confusables.js to return the confusable data. You can pass in a string of characters and get a randomly selected string of confusable characters returned, or you can pass in a code point or single character and get an array of all confusables for that character.

The confusables.utility.getConfusableString() method accepts a string of one or more characters as input and returns a string of confusable characters. Since each character of input can have several confusables, a random one is selected from the data set. This provides a quick and convenient way to select confusables without enumerating the entire set.

var input = "abcDEF123";
var output = confusables.utility.getConfusableString(input); 
// output is "αƄсᎠᎬϜוƧЗ""

The confusables.utility.getConfusableCharacters() method accepts a single character or code point value (decimal or hex) as input and returns all of it's confusable characters in an array, which could be multidimensional when several characters combine to create a single confusable:

var codePoint = 0x0041;  // or "A" or 65
var output = confusables.utility.getConfusableCharacters(codePoint); 
// output is ['A', 'A', 'Α', 'А', 'Ꭺ', 'ᗅ']
// and could contain arrays of characters as values, e.g.:
// [["C", "'"], "Ƈ" ];