Coordination Action for the integration of


Solar System Infrastructures and Science

Derived Metadata

Much information useful for queries is locked away in the data themselves, for example:
  • Where were the sunspots?
  • When were there flares?
  • How strong was the spectral line?
The information must be extracted from the observations; this derived metadata can be stored as additional catalogues that are used in the search and analysis process.

Event Lists

Events lists are generated from time series data or a set of images – an event is basically when a change of some kind is observed.

It is difficult to consider any event list as definitive – each list is created by comparing the data to criteria defined by an observer and a variety of criteria can be considered to be equally valid. It is therefore essential that the purpose of a list and the criteria used are known and understood; it is also useful if the common parts of any lists are sufficiently similar that they can be easily compared.

Event lists ought to be relatively straight forward if they relate to a single observatory but this would require collaboration that may occur but is not consistently adopted/implemented.

Gaps in the data can lead to confusion if there is no way knowing that these exist; also instruments can saturate or even stop working during certain types of events

Observational Catalogues

An observational catalogue is a summary of the observations made by an instrument – such a catalogue would normally be derived from the metadata associated with the observations (i.e., in the file headers). Here we differentiate between an observational catalogue and an observing log – the former is generated from the observational data while the latter should be a record created by the observer as the observations were being made (see the section about Observational Metadata).

One problem that has been encountered is that only a few instruments generate such catalogues. While it not difficult to generate these catalogues, this ideally should be done by the experiment team, or a group that knows the data and has access to a complete copy of the data. If the catalogues are generated from partial copies of the data they can result in confusion when used in the search for observations.

Creating the logs "on the fly" from that data that are available can give a biased perspective. The site being accessed may not have a complete or up to date copy of the data; the observational catalogue would only contain what had been found not what observations were made.

This touches on to the issue of data provenance. If there are multiple copies of the data, which is the master and what is the completeness and quality of the copies. This concerns a whole extra set of Structural Metadata; it has not been handled properly (if at all) in the past and ought be addressed for future datasets.

Engineering Logs

Data coverage is rarely 100% and there can be many reasons why observations were not made during a particular time interval – by design, planned or unplanned (engineering) downtime of the instrument or observatory, loss of telemetry, radiation belt passage (spacecraft), weather (terrestrial), etc.

After a project has ended, and even while it is active, it can be very difficult to determine why observations are not available unless information describing this type of occurrence is properly recorded. If this is not done there is a danger that a scientific user may jump to the wrong conclusion about why nothing was seen – just because you do not have observations does not mean that an event did not happen.

It is therefore essential to keep records – in electronic form – that describe all times that the instruments were not operating normally.

Unfortunately at the moment instrument teams do not always create engineering logs but the need to do this should be strongly emphasized. It is desirable that the requirement for this type of logging is built into the data system from the outset since it involves information that it may be difficult to reconstruct even relatively shortly after anomalies have occurred; the system should flag any time that a unexpected gap in the data has occurred.

At a ground-based observatory there will often be a permanent team responsible for operating the telescope; this team may set the telescope up before handing over to the "visiting" observer. As a matter of course the team would records details of the configuration, weather and viewing, etc. in an observing log; along with this would be information if anything out of the ordinary happened.

Space-base observatories generally operated differently. Sets of command that represent the observing plans can be generated hours or even days before they are executed and there is often little chance to make adjustments in real time; it can then take days to retrieve the data with variable delays depending on the ground stations used. The consequence is that things become decoupled; there are often a different people responsible for planning, executing and monitoring the data and some actions take place long after the observations were planned.

While this may seem a trivial problem, the consequence is that operational issues are not always properly recorded and even a short time after the observations were made if can be difficult to establish why something did or did not happen. This can cause particular problems for people external to a project especially those from a different domain or when someone is looking at the data after the project is finished and instrument personnel are not longer available.