☛ Hey look!

How, you ask, did I do that?
No, I didn’t plunk in a small graphic of a pointing hand. I used unicode.

You need more explanation, you say?
Okay, here goes. Any time you type any of the keys on your keyboard, it is translated from your finger tap into a code. The specific code depends on two things:
1) The key you tapped (and whether you used the shift key or not)
2) The language your computer keyboard is set to emulate.

For most of us in the United States, the keyboard we get when we buy a computer is set to produce U.S. English. In fact, most of us never even think about it unless we are students taking a foreign language in high school (or their teachers). If I were a student taking Spanish, I would benefit if I could type my class assignments with the extra letters used regularly in Spanish. As a quick example, I’m supposed to write a dialog with a friend named José. Look closely at that final letter in the name. It has an accent mark above it.

é

I just made it bigger so you could see the letter with its accent. Even small, though the accented letter shows up to a person who speaks Spanish. After all, your friend, José, wouldn’t want to take a chance that his misspelled name would also be mispronounced “hose” because the accent wasn’t there.

Early computers used a keyboard code called American Standard Code for Information Interchange (ASCII) which had 256 different possible codes. The original 256 codes represented the basic letters, numbers and punctuation symbols on the keyboard, plus useful extras like the carriage return. 256 codes are enough to handle the English alphabet, and there was even room for some characters called “dingbats” such as the symbols for the four card suits heart ♥, diamond ♦, spade ♠ and club ♣. The world’s many other languages don’t all fit, though. There are letters in many different shapes needed, way beyond 256 codes. Now, 255 is the biggest number you can represent in a byte and that was all the early computers could handle at once. Along with a zero value, that gives 256 possible codes.

Technically a byte is a binary value between 00000000 and 11111111, eight electronic switches which are all off or all on at the same time. The code value of all those zeros is pretty obvious (I hope). The eight on switches represented by 11111111 is the value 255. 255 is what we call a decimal value which most of us just call a number because we are so used to counting with our ten fingers. Ten fingers: a decimal counting system. Computers use those binary codes, with the switches on or off. Codes in between like 00000001, 00000010, 00000011, … are 1, 2, 3 … and so on. You can only have a one or a zero, 1 or 0 when you work with binary numbers.

Whew, this is getting pretty technical, can’t you keep it simple?
Well, I’ll try, but don’t actually hold your breath. The post goes on for a while.

ASCII and its 256 codes doesn’t have enough room. We need more room!

UTF8 unicode gives us the code space we need.

Using LibreOffice 3.3 or OpenOffice 3.3, you get access to these fancy character codes with the Insert menu and the Special Character choice. In the illustration below, I’ve scrolled down far enough to choose the copyright symbol.

special characters tool

Now if you look at the bottom-right corner of the illustration, you will see the copyright symbol’s UTF8 value (and decimal value in parenthesis).

Now look back at the top of the illustration. The current font is shown. On my system, the font is “Liberation Serif” and you should also note the subset chooser. The basic subset is called “Latin Basic” and the screen you see is mainly part of that subset group. The row in which the copyright symbol is found is actually the beginning of the “Latin-1” subset. A little further right in that same row, you’ll see the ® registered trademark symbol which you will often see on product packages next to the pictures or fancy name symbols of the product. For example the design of the Coca Cola name is a registered trademark and you will typically see ® on the cardboard boxes with cans or bottles in “six packs” for example. I drink Dr. Pepper so I cannot check a Coke can right this minute.

The last possible Latin-1 character is number 255, the last of the ASCII codes, is the symbol shown in the next illustration.

The next character Ā is code 256, just beyond the ASCII limit. In the Special Character tool, the decimal value no longer even shows. (Anybody know where you’d use the Ā symbol; I don’t.)

The further down the list of characters you go, the more you need to be ready for what are called Hexadecimal codes.

Bah, I wanted this kept simple.
Sorry, I’m trying my best.

As you look in the bottom right of the last illustration, you see the hexadecimal code. There are four “digits” in the code. They run up from here in a base-16 counting method. Each “place” can be 0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F. That’s how we write hexadecimal digits. The A is next after 9…and has the same value as our typical decimal 10. B is 11, C is 12, D is 13, E is 14 and F is 15. Base-16 (hexadecimal) has 16 values from 0 to F. Hexadecimal works well as a binary (computer switches) representation. Typically you don’t need to type in capital letters to trigger the value “f” is the same as “F” in practice.

From code 0100 (256) up, the code is only shown with its unicode version. Keep looking down your Special Character tool list to find the characters you might want to use. The common ones will be there in one of the subsets along with many you won’t have a clue how to use any more that I do.

When you tire of the mousing around in the menus for the characters you use a lot, you’ll want the next technique.

Many programs allow you to directly enter the unicode values by a keyboard trick:
Hold down the shift and the ctrl keys and while holding them tap letter U on the keyboard (an underline u will usually show up on the line where you’ve typed. Now let all three go and type the unicode 0100 and finally tap the enter key. This trick works with WordPress.com, but you probably figured that out because I’m blogging on WordPress. Kompozer, my preferred tool for rapid Web page development works with the U-code method. Let me know what other tools you try also work.

That pointy hand is a unicode character (symbol), too. It has the code 261b (remember to use the ctrl-shift-u.)

Twitter lets you use unicode. If you are using the Web page to enter your tweets, give it a try.

I’ll know you succeeded if you send this tweet to me: @algotruneman “I ♥ unicode.”

The heart code is U-2665.

Don’t overdo the Twitter unicode. It might cost you followers the way all caps would.

I want more!
Okay. Don’t forget to get more technical background from the links in the text above. In addition to them, here are a couple for you to check, too.

http://home.tiscali.nl/t876506/utf8tbl.html
The column you want to look at is “U-hex” for the codes

http://www.utf8-chartable.de/
Look in the third row of the table and you’ll see the chooser for many more pages of codes than the one in LibreOffice or the tiscali.nl site.

Finally, I’ll link to a page I’ve made so I can look at the codes for the codes I expect to use most often. It may change over time as I juggle the list.
http://runeman.org/articles/unicode/utf8_codes.html

Advertisements