Skip to main content

HTML Character Sets

ISO Character Sets

It is the International Standards Organization (ISO) that defines the standard character-sets for different alphabets/languages.
The different character-sets being used around the world are listed below:
Character set Description Covers
ISO-8859-1 Latin alphabet part 1 North America, Western Europe, Latin America, the Caribbean, Canada, Africa
ISO-8859-2 Latin alphabet part 2 Eastern Europe
ISO-8859-3 Latin alphabet part 3 SE Europe, Esperanto, miscellaneous others
ISO-8859-4 Latin alphabet part 4 Scandinavia/Baltics (and others not in ISO-8859-1)
ISO-8859-5 Latin/Cyrillic part 5 The languages that are using a Cyrillic alphabet such as Bulgarian, Belarusian, Russian and Macedonian
ISO-8859-6 Latin/Arabic part 6 The languages that are using the Arabic alphabet
ISO-8859-7 Latin/Greek part 7 The modern Greek language as well as mathematical symbols derived from the Greek
ISO-8859-8 Latin/Hebrew part 8 The languages that are using the Hebrew alphabet
ISO-8859-9 Latin 5 part 9 The Turkish language. Same as ISO-8859-1 except Turkish characters replace Icelandic ones
ISO-8859-10 Latin 6 Lappish, Nordic, Eskimo The Nordic languages
ISO-8859-15 Latin 9 (aka Latin 0) Similar to ISO 8859-1 but replaces some less common symbols with the euro sign and some other missing characters
ISO-2022-JP Latin/Japanese part 1 The Japanese language
ISO-2022-JP-2 Latin/Japanese part 2 The Japanese language
ISO-2022-KR Latin/Korean part 1 The Korean language


The Unicode Standard

Because the character-sets listed above are limited in size, and are not compatible in multilingual environments, the Unicode Consortium developed the Unicode Standard.
The Unicode Standard covers all the characters, punctuations, and symbols in the world.
Unicode enables processing, storage and interchange of text data no matter what the platform, no matter what the program, no matter what the language.

The Unicode Consortium

The Unicode Consortium develops the Unicode Standard. Their goal is to replace the existing character-sets with its standard Unicode Transformation Format (UTF).
The Unicode Standard has become a success and is implemented in XML, Java, ECMAScript (JavaScript), LDAP, CORBA 3.0, WML, etc. The Unicode standard is also supported in many operating systems and all modern browsers.
The Unicode Consortium cooperates with the leading standards development organizations, like ISO, W3C, and ECMA.
Unicode can be implemented by different character-sets. The most commonly used encodings are UTF-8 and UTF-16:
Character-set Description
UTF-8 A character in UTF8 can be from 1 to 4 bytes long. UTF-8 can represent any character in the Unicode standard. UTF-8 is backwards compatible with ASCII. UTF-8 is the preferred encoding for e-mail and web pages
UTF-16 16-bit Unicode Transformation Format is a variable-length character encoding for Unicode, capable of encoding the entire Unicode repertoire. UTF-16 is used in major operating systems and environments, like Microsoft Windows 2000/XP/2003/Vista/CE and the Java and .NET byte code environments
Tip: The first 256 characters of Unicode character-sets correspond to the 256 characters of ISO-8859-1.
Tip: All HTML 4 processors already support UTF-8, and all XHTML and XML processors support UTF-8 and UTF-16!

Comments

Popular posts from this blog

Toshiba Canada enters all-in-one PC market

Toshiba of Canada has thrown its hat into the all-in-one personal computer (PC) ring with the launch this week of the Toshiba DX730, a 23” all-in-one machine designed for users who want a large display and multimedia features in an environment where space is at a premium. It's Toshiba's first foray into this form factor, said Mini Saluja, national training manager with Toshiba of Canada, building on its experience in the laptop market. The DX730 has a 23” full HD multitouch display with a glossy black finish on an aluminum stand. It comes with a matching Bluetooth keyboard and mouse, and boasts Onkyo stereo speakers with Waves MaxxAudio sound processing. Two models of the DX730 will initially be available. The $899 model features a second-generation Intel Core i3 processor with 4GB of DDR3 memory, a 1TB 7200 RPM hard drive, a DVD SuperMulti Drive and HDMI in. For $1,049, you can move up to a model with an NVIDIA Geforce GT 540M processor and Intel Core i5, as we...

Xiaomi Mi 11 Ultra review

  Introduction Now that the Pro moniker has gone mainstream, it's Ultra that has come to represent the cream of the crop, and the Xiaomi Mi 11 Ultra can wear that badge proudly. Limited to its home market last year, the ultimate Mi has gone global this time around, and we're happy to have it for review today. We're torn whether it's the camera system's physical appearance that is more striking or the hardware inside. A simply massive raised area on the back looks bolted on, almost after the fact, it's hard to miss, and it's a great conversation starter even if it's not everyone's cup of tea. But its size is warranted - the main camera packs the largest sensor used on a modern-day smartphone, and next to it - two more modules unmatched in their own fields, in one way or another. Oh, and yes, there's also a display here - because why not, but also because it can be useful. There's a lot more than 1.1 inches of ...

JavaScript Where To

JavaScript in <body> The example below writes the current date into an existing <p> element when the page loads: Example <html> <body><h1>My First Web Page</h1> <p id="demo"></p> <script type="text/javascript"> document.getElementById("demo").innerHTML=Date(); </script> </body> </html>