VorbisComment: Difference between revisions

From XiphWiki
Jump to navigation Jump to search
(No. Fields are already utf-8. Allowing other encodings is not a simplification.)
(clarify and rebut the unicode proposal.)
Line 20: Line 20:
====Proposals====
====Proposals====
All field names must be UTF-8 and all UPPERCASE for easier machine processing.
All field names must be UTF-8 and all UPPERCASE for easier machine processing.
Allowing tag names to be UTF-8 instead of ASCII is a backwards-incompatible spec change. If we did this, requiring that the case mapping happen in the tagging application rather than in decoders is reasonable, since case mapping in unicode is non-trival.
The original argument for ASCII was that we need standardized tag names for interoperability, so there's no point in being able to localize them, and we might as well go with our native prejudice. Localizing the values should be done by appending a language code to the tag, since this is both machinable and there may be collisions between translated tag names.


===Dates and time===
===Dates and time===

Revision as of 08:48, 14 September 2007

The goal for this page is to discuss how to improve the Vorbis Comment specification.

It has been proposed to replace Vorbis comments with an XML based format like MDMF.

Field names

Some proposals for extra field names:

Comments are intended to be free-form, but for the purposes of interoperability, it is helpful to define tag sets for particular applications, and provide some guidelines for machine parsing.

Character encoding

The goal is to be offer better support for more languages and make machine processing faster.

Specification should be a little more strick to achevie this.

Proposals

All field names must be UTF-8 and all UPPERCASE for easier machine processing.

Allowing tag names to be UTF-8 instead of ASCII is a backwards-incompatible spec change. If we did this, requiring that the case mapping happen in the tagging application rather than in decoders is reasonable, since case mapping in unicode is non-trival.

The original argument for ASCII was that we need standardized tag names for interoperability, so there's no point in being able to localize them, and we might as well go with our native prejudice. Localizing the values should be done by appending a language code to the tag, since this is both machinable and there may be collisions between translated tag names.

Dates and time

The goal is to specify a format for describing dates.

Proposals

The date format for any field describing a date must follow the ISO scheme: YYYY-MM-DD or shortened to just YYYY-MM or simply YYYY.

We have been recommending this usage with the DATE tag for some time. It is proposed that the spec be amended to include this information for machinability.

The time format for any field except track duration must be specified with leading T and ending with a time zone. Schemas with and without dates: YYYY-MM-DDTHH:MM:SS+TS THH:MM+TZ

Improving license data

The goal is to provide a method for proclaiming license and copyright information (basically clarifying ‘distribution rights (if any) and ownership’).

The specification document describes LICENSE and COPYRIGHT fields. But is not clear enough about whether these should be machine-readable.

We should consider working together with Creative Commons to have complementary and interlinked information on the CC and Xiph wikis. Refer to the Ogg page in the CC wiki.

New RIGHTS field name proposal

One proposal is to replace the COPYRIGHT and LICENSE field names with RIGHTS. RIGHTS must be a human-readable copyright statement. Basic example:

RIGHTS=Copyright © Recording Company Inc. All distribution rights reserved.

But this is not machine-readable. Adding two complementary field names should do the trick: RIGHTS-DATE, describing the date of copyright; and RIGHTS-URI, providing a method for linking to a license. Software agents can assume that multiple songs uses the sameURIs, such as in the case for Creative Commons. Full example:

RIGHTS=Copyright © 2019 Recording Company Inc. All distribution rights reserved.
RIGHTS-DATE=2019-04
RIGHTS-URI=http://somewhere.com/license.xhtml

Software such as for multimedia management and playback are encouraged to display the RIGHTS statement as a linked phrase using RIGHTS-URI.

RIGHTS-DATE does not need to be displayed as it is required in the human readable version by international copyright agreements. RIGHTS-DATE can be used to determine when a copyrighted work falls under the public domain and related matters. (The Beatles' copyright on their original studio recordings (not the remixes) are soon expiering. So mechanisms such as the RIGHTS-DATE are indeed required in music management and filesharing software!)

To remain machine-readable it would be required to have at most one instance of each RIGHTS field name. All fields would of course remain optional.

The Dublin Core Metadata Initiative recommends the use of ‘rights’ to describe license and copyright matters. The web feed format Atom 1.0 has implemented a rights element in their specification.

Improving excisting fields proposal

Similar to the DATE tag above, we have generally recommended that a URL uniquely identifying the license be included in the LICENSE field to allow machine identification of the license. This is in agreement with the proposal in the CC wiki. Since the COPYRIGHT field is a human-readable statement of the copyright, like the proposed RIGHTS tag above, some people include a license url there. Therefore if a url can't be found in a LICENSE tag if any, applications should use one from the COPYRIGHT tag, if any. Contact information for verification, attribution, relicensing, etc. can be obtained from the COPYRIGHT field, but CC also recommend a separate CONTACT tag for this information. This is reasonable, so we propose it be included.

Attributing involved parties

The goal is to attribute more persons and organisations involved in audio and music productions to make room for more advanced search and sorting.

NO PROPOSALS! Needs much extending beyond just ARTIST field name. See work at proposed XML replacement for Vorbis Comments, MDMF.