Kate is going to have support for all languages in the world, right ?
(This can be useful to make a video where a user can choose the right subtitle language.)
- OggKate supports Unicode (UTF-8), so yes. Martin Leese 15:46, 29 January 2008 (PST)
- With the right fonts, I have a test stream that displays Japanese, Arabic, Chinese, as well as Latin characters. The only thing left open there is how to deal with languages like Arabic which are written right to left. The language in a stream is set in the header as a language/region tag, such as en_US, or just en. User:ogg.k.ogg.k Wed Jan 30 18:20:49 UTC 2008
- Right to left now supported (in my local version of xine). Language directionality can be overridden for each data packet from the default given in the headers. User:ogg.k.ogg.k Thurs Jan 31 13:29 UTC 2008
Be careful to make sure you're using the latest ISO standard (the one with the highest number) about languages. Because there are already a few so you could miss and end up using a wrong one. For the rest Kate looks very good ;)
- Well, I am not certain about this - assuming you are referring to the latest RFC about language identification (the latest one is RFC 4646 I believe), then it is kinda complex, and I plan on supporting only part of it (yes, I know this is probably the standard's bane to have partial implementations). A full language tag can be quite long, and that RFC suggests a max "sane" size of 42 bytes. I have actually looked at what I'll do with that this weekend and am currently going with a 15 character string, which should handle easily things like primary tag and one (or two small) secondary tags, like "en_GB". Language plus country should cover most needs. However, it is possible to specify a language override in each data packet, if precision is required. User:ogg.k.ogg.k Wed Feb 6 12:08:03 UTC 2008
- I believe it would be useful if the person who asked the question went away and learnt something about Unicode. From the Unicode website, "Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language." There are around 7000 written languages in the world. (There is no agreement on an exact figure.) Unicode does the lot. Martin Leese 10:06, 5 February 2008 (PST)
- As a note, Kate uses UTF-8 only at the moment, and supports 31 bit UCS space (if a define is set, off by default), and current code points to 0x10ffff (eg, the currently defined 16 planes). I haven't quite ruled out UTF-16 and UTF-32, but if I add them in, libkate will have an auto conversion option for client code. Note that Kate doesn't concern itself with rules of ligatures, etc, defined by Unicode, that is up to the rendering client. User:ogg.k.ogg.k Wed Feb 6 12:16:05 UTC 2008
- I have learned stuff about unicode. That was stupid of me if you think that I was asking about Unicode supporting it.
I was asking about Kate supporting all unicode features. (I didn't knew about Unicode having language, country,... mapping.) If you want to tell me that Unicode has a region, language, currency,... mapping on top of a character mapping. Then say it clearly.
Reading this page:  All the localization stuff are under the name CLDR. CLDR is about the Unicode Common Locale Data Repository It does a lot more than just language and region mapping. In fact, the other things are also very useful to have.
(e.g. Engineers and the whole scientific community would be very pleased with the number localization.) (Because of the decimal and thousands separator issue:  )
There is a region definitions header present in Kate. For the CLDR information, there needs to be a new header. Will there be a CLDR Definitions header or extended Unicode Definitions header somewhere in the future? Please?
- While I have not looked in depth at the CLDR, I don't think it's something that matters here.
- It seems to be more useful to programs' localization. The CLDR would then be more useful in a
- possible Kate editor, for instance. Once applied, text would go in the Kate stream and the CDLR
- would not be useful anymore. Feel free to correct me if I'm missing your point though. Ogg.k.ogg.k
Embedding of bitmap fonts
Embedding bitmap fonts in the stream seems a very odd idea to me in in this day and age where display resolutions increase constantly and the number of output devices varies so much (desktop display, mobile phone, internet tablet etc.). What's the point of it? I think this idea should just be dropped. (TimMüller)
- For simple text subtitles, this is very true, but the idea was allow control over the presentation
- of the screen for other uses. Since images are supported, adding bitmap fonts is trivial, since it
- is just a mapping from code point to bitmap index. The goal is not to encourage using custom fonts
- but to allow it if needed.
- Another point was that people wanting control over the font might use bitmaps directly to fake text
- in a particular font, and this would result in visible text that couldn't be interpreted (eg, by
- text to speech software).
- That said, I do agree with your argument. Ogg.k.ogg.k
Possible additions restrictions trap
Please don't restrict your codec from having more than 256 colors in a bitmap. Applications can do it anyway and didn't these codecs where to create freedom in the first place? There is nothing wrong with allowing these things and MNG in overlays. OggSpots is meant for having a timed image track, not for image overlays. OggKate isn't duplicating because it just isn't in the same scope. High quality images in OggSpots will look very weird with 256 low-quality images overlays. It probably won't look good.
Please do allow the embedding of shared data (fonts). That's a fantastic idea you've got there, don't let it slide. It would be great to be able to make a font, add it to the file and use it for subtitle's. It would solve the platform dependency issue with fonts, which is currently a big deal. (There would even be more freedom added to your codec, this way.)
Please add support for svgfonts. They have the same advantages for fonts as svg has for images. It's vector based which means very good looking fonts.
What is your opinion about svgfonts?