-
Archives
- November 2009
- July 2009
- June 2009
- May 2009
- April 2009
- March 2009
- February 2009
- December 2008
- November 2008
- October 2008
- September 2008
- August 2008
- June 2008
- May 2008
- April 2008
- March 2008
- February 2008
- January 2008
- December 2007
- September 2007
- April 2007
- February 2007
- January 2007
- December 2006
- November 2006
- October 2006
- September 2006
- April 2006
- August 2005
- April 2005
- March 2005
- March 2004
-
Meta
Tag Archives: encodings
Ultrafast UTF-8 decoder by Bjoern Hoehrmann
I believe this is still getting tested by several parties, but it’s obviously a highly optimized implementation of a UTF-8 decoder. Bjoern Hoehrmann released his Flexible and Economical UTF-8 Decoder recently, check it out: // Copyright (c) 2008-2009 Bjoern Hoehrmann … Continue reading
Detecting ill-formed UTF-8 byte sequences in HTML content
One issue I’ve come across, pretty infrequently, is the existence of ill-formed UTF-8 byte sequences in HTML content. As far as I can tell nobody’s every really tried to find this type of bug. Huh, so what’s up? UTF-8 is … Continue reading
Surrogates, supplementary characters, double-byte, multi-byte, and variable-width encoding ranges in Unicode and ANSI code pages
When I started digging into Unicode I was lost. It started to clear up for me when I eventually found a lot of terms that are synonymous and used interchangeably all over the place. For starters, “code page” might be … Continue reading
CSS 2.1 escape sequences and encodings
I know there’s plenty of good work being done over at places like http://ha.ckers.com, and http://www.thespanner.co.uk/. I have been researching CSS 2.1 and testing some very thorough and complex HTML and CSS filters myself, and trying to find the stuff … Continue reading