From XiphWiki
Revision as of 11:15, 4 May 2009 by Vmol (talk | contribs) (corrected spelling mistake, sentence construct)
Jump to navigation Jump to search

Kate is going to have support for all languages in the world, right ?

(This can be userfull to make a video where a user can choose the right subtitle language.)

OggKate supports Unicode (UTF-8), so yes. Martin Leese 15:46, 29 January 2008 (PST)
With the right fonts, I have a test stream that displays Japanese, Arabic, Chinese, as well as Latin characters. The only thing left open there is how to deal with languages like Arabic which are written right to left. The language in a stream is set in the header as a language/region tag, such as en_US, or just en. User:ogg.k.ogg.k Wed Jan 30 18:20:49 UTC 2008
Right to left now supported (in my local version of xine). Language directionality can be overridden for each data packet from the default given in the headers. User:ogg.k.ogg.k Thurs Jan 31 13:29 UTC 2008

Be careful to make sure you're using the latest ISO standard (the one with the highest number) about languages. Because there are already a few so you could miss and end up using a wrong one. For the rest Kate looks very good ;)

Well, I am not certain about this - assuming you are referring to the latest RFC about language identification (the latest one is RFC 4646 I believe), then it is kinda complex, and I plan on supporting only part of it (yes, I know this is probably the standard's bane to have partial implementations). A full language tag can be quite long, and that RFC suggests a max "sane" size of 42 bytes. I have actually looked at what I'll do with that this weekend and am currently going with a 15 character string, which should handle easily things like primary tag and one (or two small) secondary tags, like "en_GB". Language plus country should cover most needs. However, it is possible to specify a language override in each data packet, if precision is required. User:ogg.k.ogg.k Wed Feb 6 12:08:03 UTC 2008
I believe it would be useful if someone went away and learnt something about Unicode. From the Unicode website, "Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language." There are around 7000 written languages in the world. (There is no agreement on an exact figure.) Unicode does the lot. Martin Leese 10:06, 5 February 2008 (PST)
As a note, Kate uses UTF-8 only at the moment, and supports 31 bit UCS space (if a define is set, off by default), and current code points to 0x10ffff (eg, the currently defined 16 planes). I haven't quite ruled out UTF-16 and UTF-32, but if I add them in, libkate will have an auto conversion option for client code. Note that Kate doesn't concern itself with rules of ligatures, etc, defined by Unicode, that is up to the rendering client. User:ogg.k.ogg.k Wed Feb 6 12:16:05 UTC 2008

Embedding of bitmap fonts

Embedding bitmap fonts in the stream seems a very odd idea to me in in this day and age where display resolutions increase constantly and the number of output devices varies so much (desktop display, mobile phone, internet tablet etc.). What's the point of it? I think this idea should just be dropped. (TimMüller)

For simple text subtitles, this is very true, but the idea was allow control over the presentation
of the screen for other uses. Since images are supported, adding bitmap fonts is trivial, since it
is just a mapping from code point to bitmap index. The goal is not to encourage using custom fonts
but to allow it if needed.
Another point was that people wanting control over the font might use bitmaps directly to fake text
in a particular font, and this would result in visible text that couldn't be interpreted (eg, by
text to speech software).
That said, I do agree with your argument. Ogg.k.ogg.k