String handling when marshalling from .Net to a platform invoke

I've been looking into this recently, and was inspired to write a bit more about this from Michael Eddington's post on the subject.

By default, the .Net runtime will marshall a string (and files in a value type) as a LPStr to a platform invoke (p/invoke) function. By default the .Net framework and runtime handles strings as UTF-16. That's two bytes representing a single Unicode 'code point', and more familiar, a single character. An LPStr on the other hand, is an ANSI character, so in order to convert, the runtime will perform a best-fit conversion to the classic windows-1252 code page. This conversion is well-documented here:

http://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1252.txt

This might not be so surprising to people in tune with Unicode, but it's can lead to huge security problems when security filters are at risk. For example, if you're performing HTML filtering or file canonicalization, you need to perform so after the conversion to LPStr.

This default marshalling behavior is documented at: http://msdn2.microsoft.com/en-us/library/system.runtime.interopservices.marshalasattribute(VS.71).aspx

To properly and more safely deal with this, you can use the MarshallAsAttribute class to specify a LPWStr type instead of a LPStr. For example:
[MarshalAs(UnmanagedType.LPWStr)]

Because LPWStr is a pointer to a null-terminated array of Unicode characters, this ensures the Unicode code points are preserved across the marshalling.