Coordination Action for the integration of

Solar System Infrastructures and Science

General Issues

Need for good Metadata

In order to relate data from different domains it is important that metadata should be as complete and accurate as possible and that that it stands by itself with minimal need to refer to auxiliary information.

In order to identify what information is needed it is essential that we think of the bigger picture in considering what the data might be used for. To do this we need to ask:

  • Can someone from outside of the experiment team, but within the project, understand the data without assistance
  • Can someone from outside the project, but within the domain, understand the data without assistance
  • Can someone from a different domain understand and use the data without assistance

Assumed Knowledge

One of the problems with a lot of metadata of all types is the amount of knowledge that is assumed.

An example of this is where something is not properly described because everyone using it knows what it means. The problems can start when the experiment team disbands at the end of the project; even before then someone that is not familiar with the data can have a lot of problems using it because they need information that they do not necessarily have access to.

Looking back on previous missions, some datasets are almost impossible to find, never mind access. The Solar Maximum Mission flew in the 1980s and the only datasets that are easy to find are those of the Coronagraph-Polarimeter and the Hard X-ray Burst Spectrometer. Both of these instruments were built by teams with a long history of working on this type of instrument and still exist today, albeit with different personnel. Institutionally that have maintained an interest in the instruments and this has translated in reasonable access for everyone. The instruments that were build by teams the came together to develop something that was "state of the art" have for the most part disappeared from view.

Now it can be argued that things changed during the 1990s. On the solar side, a more comprehensive approach to data from a mission was adopted starting with the Yohkoh mission in 1991. This was then reinforced with SOHO (1995) and subsequent missions. Yohkoh used the same format for data from all instruments – this was extremely efficient but was unfortunately proprietary. SOHO adopted FITS as a standard and defined a standard set of keywords – this worked reasonably well although not all instruments truly adhered to the standard and even those that did not necessarily provide all the information in the file that they should have.

Parameter Annotation

An issue with some file formats – e,g, FITS – is that it is difficult to structure the metadata in a way that completely describes the meaning of the parameters. This problem could easily addressed if the metadata were in XML where it is easy to form constructs that show the relationship of pieces of information. For example, in the VOTable format each parameter is defined in a FIELD record that contains many different clauses.

The annotation of parameters, and a way in which existing file metadata could be enhanced using an associated external XML file, are discussed in XXXX