Injecting new line characters (e.g. CR LF) into security logs with Unicode

11 May 2011

Today I was asked if ESAPI's approach to sanitizing log messages for CRLF (carriage return, line feed) injection was sound. "CRLF Injection" in this case describes an attack whereby textual content such as records in a security log can be forged. Imagine if a plain text security log file separates log entries with two CRLF sequences. I'm using plain text here to keep it simple, but hopefully real logs would be using some form of markup. In hex this would look like 0x0D 0x0A 0x0D 0x0A. If the input validation routines did not sanitize CR LF characters then an attacker could manipulate their input to create what appeared to be new records in the log. Here's a snippet from ESAPI which attempts to protect against this:

// ensure no CRLF injection into logs for forging records

String clean = message.replace('\n', '_').replace('\r', '_');
if (ESAPI.securityConfiguration().getLogEncodingRequired()) {
clean = ESAPI.encoder().encodeForHTML(message);
if (!message.equals(clean)) {
clean += " (Encoded)";
}

}

Note:I have never worked with or tested ESAPI. I don't know what actions the methods getLogEncodingRequired() and encodeForHTML(message) perform, so I don't know at all if ESAPI would be vulnerable to the attacks I'm about to describe. Maybe someone from ESAPI can jump in. I'm only using ESAPI to make the example more realistic.

ESAPI is concerned with the visual (human-readable) appearance of log entries here and not how software processes the characters in those entries. There seem to be three vectors that would screw up ESAPIs logic for protecting against CRLF injection:

Unicode normalization that decomposes and maps a character (or set of) to either a CR or an LF
* Not a problem.

Charset best-fit mappings that map input characters to either CR or LF during transcoding
* Unpredictable problem.

Unicode characters that provide the same visual effect as CR and LF
* Definitely a problem.

#1 you don’t have to worry about it. The four Unicode normalization forms do not map any characters to CR or LF.

#2 Best-fits are tough to predict, because they can differ per platform. Below are the set of characters I know that will best-fit map to either U+000A (LF) or U+000D (CR) in the given charset (e.g. CP424).

000A  008E  #REPEAT CP424
000A  25D9  --    IBMGRAPH
000A 008E  #CONTROL  CP037
000A  008E  #CONTROL  CP1026
000A 008E  #CONTROL  CP500
000A  008E  #CONTROL  CP875
000A  2326  # ERASE TO THE RIGHT # Delete right (right-to-left text)  KEYBOARD
000D 266A  02  IBMGRAPH

#3 Here is the most practical and most obvious attack. Each of the following Unicode characters (code points) will create a visual “new line” effect.

U+000A  LINE FEED (LF)
U+000B  LINE TABULATION
U+000D  CARRIAGE RETURN (CR)
U+000C  FORM FEED (FF)
U+0085  NEXT LINE (NEL)
U+2028 LINE SEPARATOR
U+2029 PARAGRAPH SEPARATOR

Meaning ESAPI should be filtering out all of these as well if it plans to handle Unicode input.

Of course there’s a #4 I didn’t mention – concerning the target locale and character encoding of the logs.

I assume this ESAPI function is concerned with logs written to using Latin characters in a Western locale. I tend to agree that blacklisting is not the best answer but sometimes it makes sense and works. If the logs are written out in plain text encoded with UTF-8 or other Unicode encoding then #3 above would be a problem.

Isn’t the whacky world of Unicode and internationalization fun?

lookout.net

Injecting new line characters (e.g. CR LF) into security logs with Unicode