Composing ‘extended’ characters
Those folk who live in the same parish (see ‘parochialism’) as the people at ANSI who decided on ASCII all those years ago may not think that this is much of an issue, but for those of us who live in other countries or who communicate with people from other countries (international projects, anyone?), this is a ROYAL PAIN. One usually ends up using less than ten of these characters in a particular context, and in time you find you commit their codes to memory. That's the reality when dealing with multiple languages and computer keyboards. Unicode standardised this whole lot, so to have the ability across UIs and OSs to access these characters by their codes is important. Sure, on Windows press these keys, on Mac those before you enter the codes, but on Linux… well, are you in GNOME, KDE, using LibreOffice, …?
This is a real pain in KDE Plasma, as you cannot use Unicode values, even if you know them. Instead you have to “compose” the character in question. In theory that sounds great, until you have no idea what the composition looks like.
First you need to tell KDE what you'll be using for the compose character. Do so via System Settings > Input Devices > Keyboard > Advanced > Position of compose key. I chose Scroll Lock, as it is available on the external keyboards I use as well as on the laptop directly, and it is a key I don't use for anything else.
Now that's done, when I press Scroll Lock, then what I press thereafter will create a composed character if it was a valid sequence (see below). These are usefully [!] listed in /usr/share/X11/locale/en_US.UTF-8/Compose. The thing is, there are over 5,890 of them in the file on my system! Worse, there is no mention of their Unicode values, as they are listed by name, so good luck finding the key combination you require for, say an s with an upside-down circumflex. I use it as example, as a friend of mine of Eastern European extraction has one in her name and I want to write her an ‘instant’ message.
This is how I did it:
Run KCharSelect
Choose European Scripts and start looking through the lists, in which the font size is really small, by the way. On my laptop screen it's really difficult to see the difference between Ē, Ĕ, and Ě at the font size used, so I have to click on each to see a larger representation.
Is it in Basic Latin? No.
Is it in Latin-1 Supplement? No.
Is it in Latin Extended-A? Ah, yes; there we go!
Click on it to discover that it is described thus:
Code: Select all
Character: š U+0161
Name: LATIN SMALL LETTER S WITH CARON
Annotations and Cross References
Notes:
Czech, Estonian, Finnish, Slovak, and many other languages
Equivalents:
s U+0073 LATIN SMALL LETTER S ̌ U+030C COMBINING CARON
General Character Properties
Block: Latin Extended-A
Unicode category: Letter, Lowercase
Various Useful Representations
UTF-8: 0xC5 0xA1
UTF-16: 0x0161
C octal escaped UTF-8: \305\241
XML decimal entity: š
So, under ‘Various Useful Representations’, this KDE program (remember, KDE won't let you use codes to enter extended characters) lists — yes, you guessed it — the codes used to enter this character. Thankfully it does give the name (‘LATIN SMALL LETTER S WITH CARON’), so now I can run grep "LATIN SMALL LETTER S WITH CARON" /usr/share/X11/locale/en_US.UTF-8/Compose to see which keys to press after (in my case) Scroll Lock to get KDE to succumb and display the character I need. Let's give it a go:
Code: Select all
grep "LATIN SMALL LETTER S WITH CARON" /usr/share/X11/locale/en_US.UTF-8/Compose
<dead_caron> <s> : "š" U0161 # LATIN SMALL LETTER S WITH CARON
<Multi_key> <c> <s> : "š" U0161 # LATIN SMALL LETTER S WITH CARON
<Multi_key> <less> <s> : "š" U0161 # LATIN SMALL LETTER S WITH CARON
<Multi_key> <s> <less> : "š" U0161 # LATIN SMALL LETTER S WITH CARON
<dead_abovedot> <scaron> : "ṧ" U1E67 # LATIN SMALL LETTER S WITH CARON AND DOT ABOVE
<Multi_key> <period> <scaron> : "ṧ" U1E67 # LATIN SMALL LETTER S WITH CARON AND DOT ABOVE
<dead_abovedot> <dead_caron> <s> : "ṧ" U1E67 # LATIN SMALL LETTER S WITH CARON AND DOT ABOVE
<dead_abovedot> <Multi_key> <c> <s> : "ṧ" U1E67 # LATIN SMALL LETTER S WITH CARON AND DOT ABOVE
<Multi_key> <period> <dead_caron> <s> : "ṧ" U1E67 # LATIN SMALL LETTER S WITH CARON AND DOT ABOVE
<dead_caron> <sabovedot> : "ṧ" U1E67 # LATIN SMALL LETTER S WITH CARON AND DOT ABOVE
<dead_caron> <dead_abovedot> <s> : "ṧ" U1E67 # LATIN SMALL LETTER S WITH CARON AND DOT ABOVE
Um…
Well, the first four lines are the ones I am interested in (I don't want a dot above), but how to interpret the bit before the colon? I don't know what a ‘dead caron’ is (that famous Monty Python sketch comes to mind), but the <Multi_key> ones might mean ‘press your compose key, followed by what is between the angle brackets’.
Let's give it a go:
Code: Select all
<Scroll Lock> <c> <s> : š
<Scroll Lock> <less> <s> : š
<Scroll Lock> <s> <less> : š
Yes!
For the fun of it, I also tried this, since I've experienced that the order of the keys after the compose key doesn't matter for other composed characters I have used:
<Multi_key> <s> <c> : <beep>
Mmm… so order does matter here. Oh well, I'll go with < and s in either combination, so I don't have to remember a permutation.
I shudder to think what ‘normal’ users are expected to think of Linux when they have to go through all of this just, for example, to type a ΰ. I copied and pasted that, because I don't know which keys to press for this recipe: <Multi_key> <apostrophe> <dead_diaeresis> <Greek_upsilon>. The character is called ‘GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND TONOS’ and its Unicode value is U03B0. I have an idea that if I want to communicate with my friend Nikos and need to use one of these, I'd have to resort to copying and pasting in my KDE environment because all 11 ways of getting this character require a Greek keyboard to begin with: my keyboard doesn't have the <Greek_upsilondieresis> key, the <dead_diaeresis> key, or the <Greek_upsilon> key, but if I were allowed to enter the Unicode code, it would be the simple matter of using the code. Who knows? I might even remember it.
In conclusion, trawling through forums it is clear that the community has been desperate for years to have the ability to use the Unicode codes when entering extended characters. Keep the current compose approach, but please: add the ability to enter the codes as well. Until then, sorry Nikos.