Skip to Content

Metadata

Metadata is sometimes defined literally as "data about data," but the term is normally understood to mean structured data about digital (and non-digital) resources that can be used to help support a wide range of operations. These might include, for example, resource description and discovery, the management of information resources (including rights management) and their long-term preservation. Most services making use of metadata are now delivered over the Web, although this was not always necessarily true in the past.

Author: Steph Taylor
Last updated on 2 May 2013 - 3:29pm

Past and Present

The first use of 'metadata' originated in contexts related to digital information (chiefly with regard to databases), but subsequently the general understanding of the term broadened to include any kind of standardised descriptive information about resources, including non-digital, physical ones. With the increasing importance of digital resources on the Web, e.g. online journals, and the rise of specialised systems for the management of content and metadata such as repositories, metadata about digital resources once again came to the fore. In the projected future Web, proponents of linked data argue that metadata about physical and non-physical resources alike will be of vital significance to the provision of commercial services, for example via handheld devices as well as the traditional Web: in this world view, metadata can constitute an alternative resource that provides information about a physical object, place or service as a real-world resource: this may add value to the resource, as is made use of as the basis for augmented reality services and devices.

Metadata Formats: Standards and Schemas

Metadata standards in most of domains on the Web were previously based on implementations of the Standard Generalised Markup Language (SGML) such as the CIMI Document Type Definition (DTD), but are now more usually expressed in the Extensible Markup Language (XML) and/or the Resource Description Framework (RDF). These allow metadata to be expressed at various levels of complexity, according to the needs of the particular service or application.

In the context of digital resources, there exist a wide variety of metadata formats, usually called schemas. Viewed on a continuum of increasing complexity, these range from the basic records used by robot-based Internet search services, through relatively simple formats like the Dublin Core Metadata Element Set (DCMES) and the more detailed Text Encoding Initiative (TEI) header and MARC formats, to highly specific formats like the FGDC Content Standard for Digital Geospatial Metadata, the Encoded Archival Description (EAD) and the Data Documentation Initiative (DDI) Codebook.

There are application profiles (APs) constructed for specific or localised purposes, which include Dublin Core Application Profiles (DCAPs), which may consist of elements from more than one metadata schema. Widely used Web standards such as FOAF can be viewed as metadata schemas with particular, limited technical purposes, whereas for example RSS, Atom and ePUB can be seen as effectively being or containing specialist application profiles. Consequently, metadata is increasingly relevant to the Web standards used by software developers, which have come to overlap considerably.

Knowledge Organisation Systems (KOS) are a class of metadata that have traditionally been used in library cataloguing systems but are also used in classifying digital resources for the purposes of efficient search and retrieval in software such as repositories and knowledge management applications.

Metadata Usage

Metadata is not only used for resource description and discovery purposes. For example, it can be used to help administer and manage resources, e.g. to record information about their location and acquisition. It can also be used to record any intellectual property rights vested in resources and to help manage user access to them. Some metadata can be organised into complex hierarchies for managing related resources for specific purpoes, for example CERIF for Research Information Management. Other metadata might be technical in nature, documenting how resources relate to particular software and hardware environments or for recording digitisation parameters. The creation and maintenance of metadata is also seen as an important factor in the long-term preservation management of digital resources and for helping to preserve the context and authenticity of resources. Examples of these 'richer' understandings of metadata are the development of an Australian Recordkeeping Metadata Schema (RKMS), the MPEG-7 Multimedia Content Description Interface standard for audio-visual resources and the NISO draft definition of technical metadata for digital still images.

Implementation

Metadata standards in most of domains on the Web  are most usually expressed in the Extensible Markup Language (XML) and/or the Resource Description Framework (RDF). These allow metadata to be expressed at various levels of complexity, according to the needs of the particular service or application. Other more readable, alternative technical mark-up formats such as JSON are in widespread us, which can also contain structured metadata.

Where is Metadata Used?

So, for example in the context of digital preservation, library catalogues, abstracting and indexing services, archival finding aids and museum documentation might all contain relevant metadata, which can be exposed on the Web. The advantages of this are twofold. Firstly, it allows librarians, archivists and museum documentation specialists to co-operate usefully across professional boundaries. Secondly, it enables the cultural heritage professions to communicate more effectively with those domains that also have an interest in metadata: e.g., software developers, publishers, the recording industry, television companies, the producers of digital educational content and those concerned with geographical and satellite-based information.

Metadata is used so widely in so many different systems, predominantly but not necessarily on the Web, that it is impossible to list them all here. In order to demonstrate how diverse these can be, however, a short list of examples follows in no particular order:

  • Agricultural metadata
  • Business data visualisation
  • Statistics and census services
  • Library and information science
  • Healthcare
  • Data warehousing
  • Linked data services
  • Geo-tagging
  • Learning objects and educational resources
  • Images e.g. architecture, art, medicine
  • Time-based media e.g. film, audio archives
  • Scientific and research metadata
  • Digital preservation, archives and cultural heritage
  • Genealogical information

Competing Standards

One consequence of there being such a wide range of communities who have an interest in metadata is that there are a bewildering number of standards and formats in existence or under development. The library world, for example, developed the MARC formats as a means of encoding metadata defined in cataloguing rules and has also defined descriptive standards in the International Standard Bibliographic Description (ISBD) series. More recently, with the advent of complex Web services dealing with diverse types and collections of resources, the number of relevant standards for any given application has mushroomed to the point where it can be difficult to know which is the best or most widely implemented technical standard for any given purpose.

Complexity and Scale of Data

The level of granularity of metadata for any given purpose remains a significant issue, particularly in terms of repositories and similar Web content services that have to reconcile the demands of practical acquisition of content from human or machine metadata creation with the purpose of making content more easily discoverable via complex relationships between resources and entities connected with their creation. Acquisition, especially when human metadata entry is required, demands relatively simple metadata in order to avoid creating a barrier of time and effort that will tend to discourage content depositors. On the other hand, deeper structured metadata can assist in resource discovery and preservation, as well as in providing relationships between resources that can be used as quality linked data for a variety of modern Web and mobile Web services.

Proponents of linked data argue that metadata about both physical and non-physical resources, objects and services are increasingly vital to the provision of services via the Web, particularly on handheld devices, and that such services will soon depend on quality metadata being available as linked data.

Another area in which linked data may be important (and potentially also application profiles, as part of this), is in providing accurate information about the increasingly complex relationships between related resources, for example in knowing which versions, translations or sources a digital resource might have, or equally in the relationships between physical, real-world resources and services and other sources of information about a particular resource or topic.

Large data sets and research data are also an area demanding increasingly complex and subject-specific metadata to be exposed as linked data. How these can be made re-usable by other services and re-used as new resources remains an area of active development. OAI-ORE can be used to describe any kind of resource by providing a wrapper for other resources that can in this way be re-purposed as a new resource, for example in producing a learning object for educational purposes out of a range of pre-existing resources or parts of resources.

Future Development

From a technical perspective, there are some issues surrounding exactly how metadata will develop in the future. There are proponents of a new but little implemented RDF-based mark-up language for linked data called the Description Set Profile (DSP) language. However, it has not yet been fully demonstrated in practical software solutions how it will bring additional functionality over and above what can be achieved with RDF, or whether such a marginal standard will ever achieve Web scale.

Linked data rely on microformats within Web resources, which are machine-readable but not visible to a human via the browser interface. Whereas it used to be argued by opponents of linked data solutions that a critical mass of Web mark-up, i.e. HTML, XHTML, XML etc would never exist in order to provide a sufficient corpus on which to construct useful Web services, the rise of content management systems and frameworks such as Drupal and Joomla increasingly give the ability to expose their resources as linked data to ordinary users, or indeed perform this seamlessly behind the scenes without their intervention. Consequently, more and more Web tools and services are being coming to depend on the automatic provision and consumption of linked data, particularly on the social Web.



Dr. Radut | subject