A journey in IBM systems: Character Sets

The most basic way a database stores data is with character sets. A character set is the way we change character data into 1s and 0s so that computers can understand them. There are two large character sets that are in use today but there is a third worth mentioning. They are ASCII, UNICODE, and because this is a blog about the IBM systems we will talk about EBCDIC.
EBCDIC is a character set that was developed by IBM to work with mainframes. It is not in use outside of IBM systems and is not well liked in the modern computer industry as a whole. One criticism of EBCDIC is that the codes are not in sequential order. The letters a - z are not next to each other, and there for it is a difficult code set to do sorts. Another problem is EBCDIC stores data 8 bits wide and there for is difficult to use in languages other then English.
The big hitter in the computer character set family is ASCII. Unlike EBCDIC is stores most of the characters in a logical human order. It is the most commonly used character set in use today, but it does share a downfall with EBCDIC in that it to is only 8 bits wide. In a expanding global market a new way of storing data needed to be invented.
UNICODE will probably be the Character set that shoves ASCII out of the lime light. UNICODE was developed to store Asian languages, because the most of them are not character based. UNICODE has the ability to store more different types of data by taking up twice as much data space. A single character might cost you 8 bits of space with UNICODE it would be 16. For best practices you should use UNICODE is memory is not an issue, or you will be storing data in multiple languages, especially languages like Chinese.

A journey in IBM systems

Sunday, December 12, 2010

Character Sets

No comments:

Post a Comment