Revision as of 05:15, 6 February 2008 by Ogg.k.ogg.k (followup to unicode and language)
Kate is going to have support for all languages in the world, right ?
(This can be userfull to make a video where a user can choose the right subtitle language.)
- OggKate supports Unicode (UTF-8), so yes. Martin Leese 15:46, 29 January 2008 (PST)
- With the right fonts, I have a test stream that displays Japanese, Arabic, Chinese, as well as Latin characters. The only thing left open there is how to deal with languages like Arabic which are written right to left. The language in a stream is set in the header as a language/region tag, such as en_US, or just en. User:ogg.k.ogg.k Wed Jan 30 18:20:49 UTC 2008
- Right to left now supported (in my local version of xine). Language directionality can be overridden for each data packet from the default given in the headers. User:ogg.k.ogg.k Thurs Jan 31 13:29 UTC 2008
- Be carefull to make sure you're using the
- latest ISO standard (the one with the biggest number) about languages.
- Because there are already a few so you could miss and use the wrong one.
- For the rest Kate looks so good ;)
- Well, I am not certain about this - assuming you are referring to the latest
- RFC about language identification (the latest one is RFC 4646 I believe), then
- it is kinda complex, and I plan on supporting only part of it (yes, I know this
- is probably the standard's bane to have partial implementations).
- A full language tag can be quite long, and that RFC suggests a max "sane" size
- of 42 bytes. I have actually looked at what I'll do with that this weekend and
- am currently going with a 15 character string, which should handle easily things
- like primary tag and one (or two small) secondary tags, like "en_GB". Language
- plus country should cover most needs.
- However, it is possible to specify a language override in each data packet, if
- precision is required. User:ogg.k.ogg.k Wed Feb 6 12:08:03 UTC 2008
- I believe it would be useful if you went away and learnt something about Unicode. From the Unicode website, "Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language." There are around 7000 written languages in the world. (There is no agreement on an exact figure.) Unicode does the lot. Martin Leese 10:06, 5 February 2008 (PST)
- As a note, Kate uses UTF-8 only at the moment, and supports 31 bit UCS space (if
- a define is set, off by default), and current code points to 0x10ffff (eg, the
- currently defined 16 planes). I haven't quite ruled out UTF-16 and UTF-32, but
- if I add them in, libkate will have an auto conversion option for client code.
- Note that Kate doesn't concern itself with rules of ligatures, etc, defined by
- Unicode, that is up to the rendering client. User:ogg.k.ogg.k Wed Feb 6 12:16:05 UTC 2008