Encodings part 1: Unicode, ASCII, UTF-8 and... Latin-1?

What is Unicode, why is it used everywhere, and what are the options? We, software developers, use string data types daily and yet, there is a lot under the surface that nowadays we don’t even think about anymore. This makes character encodings a basic thing, but is it simple? This is part 1 of a series of posts on Unicode and encodings: Encodings Part 1: Unicode, ASCII, UTF-8 and… Latin-1? Encodings Part 2: The down and dirty of UTF-8 Encodings Part 3: The down and dirty of UTF-16 Encoding, you say?...

November 3, 2021 · 8 min · Lucas Viana

Encodings part 2: The down and dirty of UTF-8

What the double UTF is UTF-8? Let’s decode this, bit-by-bit. This is part 2 of a series of posts on Unicode and encodings: Encodings Part 1: Unicode, ASCII, UTF-8 and… Latin-1? Encodings Part 2: The down and dirty of UTF-8 Encodings Part 3: The down and dirty of UTF-16 Unicode and UTF-8 In part 1, we dug a little deeper into the reasons to be for Unicode and found out that Unicode is an immense character table containing all the characters in the world....

November 3, 2021 · 4 min · Lucas Viana

Encodings part 3: The down and dirty of UTF-16

This is part 3 of a series of posts on Unicode and encodings: Encodings Part 1: Unicode, ASCII, UTF-8 and… Latin-1? Encodings Part 2: The down and dirty of UTF-8 Encodings Part 3: The down and dirty of UTF-16 A very short background At the end of the ’80s, there were two competing forces for standardizing character sets: ISO 10646 and Unicode. ISO was idealistic and Unicode was pragmatic. ISO created a character set containing more than 2 billion code points (of course, most were not allocated)....

November 3, 2021 · 5 min · Lucas Viana