Unicode attacks and test cases - Visual Spoofing, IDN homograph attacks, and the Confusables

Let's face it, playing tricks that mess with people's perception can be fun.  With Unicode, there's lots of fun tricks to be had.  What's to stop someone from believing the following is what it appears to be:

www.аmazon.com

Looks like amazon.com of course, but it's not.  The first 'a' is the Cyrillic small letter a, not the English, or Latin rather, small letter 'a', although they look identical - they're from two different languages.   Confused?  Good.  Now hover your mouse over the link above, don't click it because I don't know where it goes but it probably isn't nice.  In your browser's status bar you should see the Punycode encoded version of the domain name:

http://www.xn--mazon-3ve.com/

Because DNS does not support Unicode (only a subset of ASCII characters are allowed), we have IDN (Internationalized Domain Name) standards which define how domain names with Unicode characters should be encoded.  Punycode is the name of the encoding mechanism.

The above is often referred to as an IDN homograph attack.  Aside from spoofing with lookalike characters from completely different alphabets, we can do a bunch of spoofing just within our own alphabets.  For example, certain fonts make combinations of characters hard to determine.  Just like the letter's 'r' and 'n' together can look like the letter 'm': rn == m Zeroe's can look like 'O' and the number 1 can look like a lower case 'l'.  So you wind up with lots of clever visual attacks:


  • www.rnu11ets.com looks a lot like www.mullets.com


  • www.rnu11ets.com looks a lot like www.mullets.com


  • www.rnu11ets.com looks a lot like www.mullets.com


  • www.rnu11ets.com looks a lot like www.mullets.com


  • www.rnu11ets.com looks a lot like www.mullets.com


  • www.rnu11ets.com looks a lot like www.mullets.com


  • www.rnu11ets.com looks a lot like www.mullets.com


  • www.rnu11ets.com looks a lot like www.mullets.com


I've listed the same text here in several different fonts, because in some fonts, you wouldn't be able to tell the visual difference between the two words.  The visual appearance of characters has a lot to do with the fonts used to display the glyph, not just the alphabet.
Very interesting article. I did some poking around on the punnycode and found this nice Unicode/Punnycode converter.

https://www.dw-formmailer.de/index.php?action=convert

Interestingly, Microsoft Word 2007 will also do the conversions if you enter a letter and then press alt+x or enter the corresponding number value and hit alt+x.

i.e. entering 0430 and then hitting alt+x provides the cyrillic a (а)

-Michael

Cool info about Word, I didn't realize that hotkey function existed. How'd you figure that out? I usually use Babelmap for just about everything when I need access to certain characters or character properties.

I just sort of stumbled onto the Word hotkey for characters. At first I thought it was some sort of strange bug. Although now I don't think its a bug, I really have no idea why they thought to include it. Guess its nice for those of us that like to tinker.

-Michael

I know that some browsers are now showing just the ASCII versions. And this are not the wrongs of the Punycode, but visual perceptions only.

Too bad, so it seems to me any sort of IDN will have little hopes of doing it?

I suppose it's safest to use convert everything to Punycode rather than try to dicpher if there's spoofing and foulplay. Though that would be possible in situations, I think the browsers take the biggest personal risk for spoofing attacks. At this point in time Punycode saves the the end-user.

[...] Unicode attacks and test cases - Visual Spoofing, IDN homograph attacks, and the Confusables [...]