Also known as homoglyphs, lookalikes, and spoofs - the confusables are characters that visually resemble or are indistinguishable from another character. For example the following two characters are visually similar and confusing:
FF21 ; 0041 ; SA # ( Ａ → A ) FULLWIDTH LATIN CAPITAL LETTER A → LATIN CAPITAL LETTER A
Sometimes during penetration testing, we want to bypass word blacklists, spoof URLs, spoof email addresses, or perform other tasks. Being able to generate lookalike strings can be quite useful in these cases, and we also know that bad guys will apply the same tactics to bypass antivirus or other security boundaries as well.
Note that generating a full list of all confusable permutations is expensive and often unnecessary, so confusables.js only generates a single permutation from randomly selected characters.
The test page
index.html is running at http://lookout.net/test/confusablesjs
In a browser:
<script src="js/confusables.data.js"></script> <script src="js/confusables.js"></script> <script src="js/fromcodepoint.js"></script>
Two public methods are available with confusables.js to return the confusable data. You can pass in a string of characters and get a randomly selected string of confusable characters returned, or you can pass in a code point or single character and get an array of all confusables for that character.
confusables.utility.getConfusableString() method accepts a string of one or more characters as input and returns a string of confusable characters. Since each character of input can have several confusables, a random one is selected from the data set. This provides a quick and convenient way to select confusables without enumerating the entire set.
var input = "abcDEF123"; var output = confusables.utility.getConfusableString(input); // output is "αƄсᎠᎬϜוƧЗ""
confusables.utility.getConfusableCharacters() method accepts a single character or code point value (decimal or hex) as input and returns all of it's confusable characters in an array, which could be multidimensional when several characters combine to create a single confusable:
var codePoint = 0x0041; // or "A" or 65 var output = confusables.utility.getConfusableCharacters(codePoint); // output is ['A', 'Ａ', 'Α', 'А', 'Ꭺ', 'ᗅ'] // and could contain arrays of characters as values, e.g.: // [["C", "'"], "Ƈ" ];