Coordination Action for the integration of

Solar System Infrastructures and Science

File Metadata

Parameter Naming

The names of parameters should be meaningful and should be used consistently throughout a data system – this should be defined in the data model.

In order to ensure interoperability, names in the common part of file should conform to a standard data model.

Some file formats have built-in complications in regard to naming.

Restrictions in the FITS format on the length of keywords limits the ability to create meaningful parameter names – this has resulted in some unintended difficulties. For example, there are many different contractions of the word wavelength in order to make it fit into 8 characters; there are additional difficulties if the name of the parameter is also being used to express a relationship with other parameters

This demonstrates an inherent weakness of the FITS format. – it is too loose a standard with few rules to constrain it formulation. There is no formalized set of names for FITS keywords (although there are many recommendations). Also, nothing defines the relationship between the parameters, if there is one. An external piece of information is required to decode the file – this is assumed knowledge.

Parameter Annotation

Metadata are more interoperable if the parameters are fully and properly described; the VOTable format developed by the IVOA comes to our aid in this context.

The VOTable format is an XML standard for the interchange of data represented as a set of tables. In this context, a table is an unordered set of rows, each of a uniform structure, as specified in the table description (the table metadata). Each row in a table is a sequence of table cells, and each of these contains either a primitive data type, or an array of such primitives.

The table metadata is the header section of a VOTable file that contains a field record for each parameter. The field record can have many components: the most obvious are the name and a description; other components can include the units and that are known as a UCD and utype.

The Unified Content Descriptor (UCD) is a formal vocabulary for astronomical data that is controlled by the International Virtual Observatory Alliance (IVOA). The vocabulary is restricted in order to avoid proliferation of terms and synonyms, and controlled in order to avoid ambiguities as far as possible. It is intended to be flexible, so that it is understandable to both humans and computers. UCDs describe astronomical quantities, and they are built by combining words from the controlled vocabulary.

In many contexts, it is important to specify that parameters convey the values defined in an external data model; the utype attribute makes it possible to unambiguously define the meaning of the parameter. The utype attribute is especially useful to specify the spatial and temporal coordinates present in the table when it contains astronomical events: these parameters are essential to most applications that process multi-wavelength data.

External Annotation

One intermediate solution might be to define an external XML file that provides a standard way of describing the metadata and maps into the existing file headers. Such a solution would be a way of increasing interoperability and should be able to accommodate any type of file format, even for existing data.