In this post, I discuss a small Database theory topic for Character Set and Collation.
As we are Database Professional and already experience with “Collate” in SQL Server and MySQL.
I have found that fresher and intermediate Database Professional have doubt and question about Character Set and Collation.
Let me clear this with easy note.
What is Character Set?
A Character Set is nothing but just a list of symbols and encodings.
For example, latin1 and UTF-8 are most popular character set.
Using latin1, you will be able to write all American words because latin1 contains all ASCII characters, which are sufficient to write any English word. On the contrary, with ASCII you will not be able to write all words of Western European specific languages, because for instance, characters like ‘ë’, ‘õ’, ‘Ñ’ are missing.
A Character Set encodes characters so that they fit in a memory. For example, the euro symbol, €, will be encoded as 0xa4, and in UTF-8, it will be 0xe282ac.
What is Collation?
A collation is a set of rules for comparing characters in a character set. It has also ruled for sorting of characters and proper order of two characters varies from language to language.
A Collation compared two strings like, if a word is greater than another one, and sort accordingly.
If you are using “latin1” Character set, you can use “latin1_swedish_ci” Collation.
You have to choose right collation because wrong collation may affect your database performance.
Now create one database in MySQL using Character Set and Collation:
CREATE DATABASE DatabaseName CHARACTER SET latin1 COLLATE latin1_swedish_ci;