Dublin Core metadata has attracted the attention of formal resource description communities such as museums, libraries, government agencies and commercial organizations across the world. It is widely used across the Internet, although Dublin Core metadata schemas are by far from the only ones available for resource description in such contexts. The choice of metadata schema is dependent on a complex array of factors in any given application but, broadly speaking, Dublin Core is often a good choice where there is a need for a relatively simple and flexible formal description of Web resources.
The Dublin Core is positioned as a simple information resource description format. However, it also aims to provide a basis for semantic interoperability between other, often more complicated, formats. The Dublin Core benefits from active participation and promotion in over 50 countries across the world.
Much Dublin Core use is hidden behind the scenes (as it should be, certainly from a user point of view), so by definition is difficult to discover. Often small parts of the standard are used. This is particularly true in the linked data world. Mashups have become commonplace and DCMI itself is often unaware of certain uses. However it is of course desirable that standards are taken and used together with other technologies to create new services. Creators of open standards cannot dictate how they are used.
Many repository managers use the ‘out of the box’ solutions provided with their chosen repository software and do not customise the schemas and templates designed and supplied by the developers. Consequently, the majority of users of DSpace and EPrints, which represent 63% of UK repositories, use DC metadata. Most other repositories platforms do likewise, although some may support mappings to schemas such as METS, MARC and bibliographic models such as BIBO and others.
Although DC was originally designed for ‘document like objects’ it is also used for other resource types. The EPrints model offers a wide range of templates for describing many different resource types. Some repository software platforms allow other schemas apart from DC to be used for more specialist resource description.
In general simple Dublin Core is not able to provide the level of granularity that libraries need to make the fine grained, but important, distinctions between resources in their collections, especially national libraries such as the British Library, National Library of Wales and the Bodleian. Schemas such as MARC21 and METS are still used instead. However it still has certain specific applications. Two examples of current uses of DC metadata in the British Library are provided below:
- The UK Web Archive, Web Curator Tool schedules crawls to capture snapshots of websites. The title field is currently populated automatically, whereas the other fields need to be populated manually. This is a fairly basic use of DC metadata.
- Search Our Catalogue, a resource discovery interface, is an implementation of Ex Libris' PRIMO system. This uses a form of qualified DC metadata as a means of aggregating metadata from different sources.
The unqualifed and qualified variants of DC metadata are still widely used, despite the latter having been superseded by DC-Terms according to DCMI. It is now normal to cite the name spaces of application profiles using DC and/or other metadata in XML and/or RDF formats for the purposes of machine readability and interoperability. Previously, it was questionable to what extent other systems were making use of some of these documents widely, especially in the case of application profiles, but developments in linked data have made the effort of producing them more justifiable from a techical standpoint (see further below).
An exception to this is the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). As a minimum, repositories must be able to return records with metadata expressed in the unqualified DCMES 1.1 format, using the oai_dc schema. This schema is arguably itself a simple application profile based on DC metadata. The protocol is thus a low-barrier mechanism for repository interoperability: data providers are repositories that expose structured metadata via OAI-PMH; service providers then make OAI-PMH service requests to harvest that metadata. In practice, aggregation services often harvest only unqualified DC for maximum interoperability, although OAI-PMH may technically be used for qualified DC, DC-Terms and other metadata schemas. According to a survey by the Repositories Support Project (RSP), many repositories also provide qualified DC; other formats include METS, MODS, PREMIS etc. Given the requirement for repositories to expose records in DC in order to be harvested, oai-dc has been built into most repository software, thus making it easier for institutions to adopt. However repository platforms have implemented DC in different ways. The large number of repositories in the UK means that a form of DC is used in around 157 repositories, although in differing ways.
It may have been assumed that a significant proportion of end users in the UK would use an aggregated service based on harvesting (eg OAIster) to cross search repositories and retrieve records. However, consultation for a DC usage survey in 2011 revealed that the majority just use Google. Many institutions use Google Analytics to discover ‘traffic source’. Therefore it appears that the necessity of providing oai-dc for harvesters is no longer one of the main incentives/reasons for using DC in repositories, since the percentage of people using these search services is insignificant. Parts of the research community overestimate the importance of such search services. With a pragmatic approach, repository managers are more interested in what Google is providing, as well as Google Scholar.
It seems likely that most repository platforms are based around using DC because it offered a simple standardised way of describing and making resources available on the web, with the added incentive of OAI-PMH compatibility, and not as a result of OAI-PMH mandatory requirements.
The lack of use of aggregated search in the UK is most likely due to the lack of UK-based harvesting/search services. The MIMAS UK Institutional Repository Search provides a demonstrator, not a service (although it appears to function well). Harvesting is provided by RepUK, although this has so far maintained a low profile.
Another widespread use of DC metadata is within the Simple Web-service Offering Repository Deposit (SWORD) protocol based on the Atom syndication format, which offers a standardized method of deposit into remote repositories. Although there is no inherent reason why DC metadata should be chosen over any other schemas, this is almost always the case in practice, since the recipient repository needs to be able to process the metadata and documents packaged with SWORD, whether or not metadata from other schemas are also provided. As most repositories rely on DC metadata, this is the obvious choice.
Dublin Core is a popular component in RDF data – although mostly it follows the same patterns of use, which are quite simple.
EPrints releases in 2010 included a range of new features relating to linked data and Dublin Core Application Profiles, including the ability to establish arbitrary relations between objects or provide additional metadata in triple form. Version 3.2.1 added URIs for derived entities eg authors, events, locations as well as an extendable RDF system, by default using the BIBO Ontology. These employ DC metadata elements in the EPrints schema.
The following examples in the UK represent a sample of organisations using DC elements or terms as RDF linked data, with the vast majority using it for very similar applications (mostly defining format information, some form of title or descriptive string, and mime type information). These examples do not include the very many sites using microformats, RSS etc, which also contain DC terms (or terms inherited from DC, but not expressly stated as such).
- The BBC
- JISC Dev8d developer forum
- Data.gov.uk: general store for governmental open data in the UK, e.g. Patent Office
- Ordnance Survey
The UK data.gov.uk initiative has not been prescriptive, so government departments are using different models and vocabularies for exposing their data as linked data, with the result that a large-scale ‘clean-up’ operation is likely to be required.
An initiative at the University of Southampton brings together work and research done on a number of public sector information catalogues including the UK's national catalogue at data.gov.uk. However, changes to the contents and data formats of the originating portals, particularly at data.gov.uk made this difficult to access in winter 2011-12.
The British Library started to make the British National Bibliography (BNB) available in RDF in late 2010. Sample files are being published in an interative process for research purposes and to investigate options for structuring catalogue information as linked data. Since there are 2.9 million records, there are issues of scale in publishing all of this unusally large-scale source of high quality bibliographic data as RDF. The DC-Terms name space is employed amongst a range of equivalent metadata formats, as well as dc:date from DCMES 1.1., using a range of vocabularies and ontologies including LCSH, MARC, MESH, SKOS and OWL. It has been used and analysed by organisations such as the BBC, Ithaka, Open Library, Phil Papers, TSO, Wikimedia Commons.
The JISC-funded OpenBib project will publish a substantial corpus of bibliographic metadata as Linked Open Data, using existing semantic web tools, standards (RDF, SPARQL), linked data patterns and accepted Open ontologies (FoaF, BIBO, DC, FRBR, etc). The data is from two distinct sources: traditional library catalogues (Cambridge University Library and the British Library) and ToCs from a scientific publisher, the International Union of Crystallography IUCr. OpenBib has also been used to analyse the BNB data. The University of Mannheim Library is also working in this area – its catalogue can be searched as a linked data service as well as the data being made available for reuse. Further examples may be found in a use case report by the W3C library linked data Incubator Group.
Talis Prism 3 is a next-generation OPAC/search and discovery interface extracting MARC 21 records and using DCMES 1.1, DC Terms (and, in the future, BIBO) metadata to expose them as linked data. The Resilience Knowledge Base (RKB) has been built as part of the EU-funded ReSIST project by the University of Southampton to gather data from bibliographic and other sources (covering people, projects, organisations, publications etc) and added structure allows querying by topic. RKB Explorer provides the interface to the RKB, and some of the using OAI-PMH from OAI linked data sources around the world.
A JISC-funded project based at the Open University, LUCERO will ‘investigate and prototype the use of linked data technologies and approaches to linking and exposing data for students and researchers’. A platform has been made available which currently provides access to data from the OU institutional repository and other OU resources via a SPARQL endpoint. The OU repository is based on EPrints so already includes a function to export information as RDF using the BIBO ontology. The schemas in use include DC Terms, the W3C Media Ontology, FOAF and Media RDF. The W3C use case on collecting material related to courses at The Open University identifies a set of challenges to transferring library data to RDF.
The Locah project is making records from the Archives Hub and Copac available as linked data in order to interconnect with other data and contribute to the growth of the Semantic Web. It makes use of some common classes and properties from the DC Terms RDF vocabulary, as well as SKOS, FOAF, BIBO, ORE, Event and Time Ontologies. SPARQL endpoints will be provided to enable the data to be queried. The Archives Hub is an aggregation of archival metadata from repositories across the UK; Copac provides access to the merged catalogues of many university, specialist, and national libraries in the UK and Ireland.
Although all types of DC metadata have been used as the basis for application profiles, it is now more usual (and officially recommended) to use DC-Terms to build new application profiles, which are in any case explicitly based on the earlier elements for backwards compatibility. In other contexts, all of these variants are still in widespread use. DC metadata has been adopted in an extremely wide range of software systems and Web applications wherever there is a need to manage resources, and is by no means limited to traditional library systems and repository software, as was more often the case in the past.
There are too many APs and DCAPs to be able to provide an authoritative list, especially since many of these are developed and used within individual systems for very specific local purposes and their technical details may not even be exposed publicly where such systems are not open to general access from the Internet. There have been a number of DCAPs, many of which were funded in full or in part by the JISC (including via DCMI). These have seen varying levels of development and adoption, and some of which are still being actively developed whereas others are not, or were never implemented. For example, it appears that there is insufficient demand in the metadata community for the particular functionality offered by SWAP over qualified DC, since SWAP has seen little practical use. The following list contains both full DCAPs and scoping studies for possible DCAPs (with responsible organisations in brackets where known):
- SWAP - Scholarly Works Application Profile (DCMI & UKOLN, University of Bath)
- GAP - Geospatial Application Profile (EDINA)
- IAP - Images Application Profile (UKOLN, University of Bath)
- TBMAP - Time-Based Media Application Profile
- LMAP - Learning Materials Application Profile scoping study (JISC CETIS)
- SDAPSS - Scientific Data Application Profile Scoping Study (UKOLN, University of Bath)
- Dc-Ed - Dublin Core Education application profile (DCMI)
- Dc-Lib - Dublin Core Library application profile (DCMI)
- DC Accessibility Application Profile (see workshop, February 2011)
- DCCAP - Dublin Core Collections Application Profile (DCMI) based on the earlier RSLP Collection Description metadata schema (DCMI) - deactivated since 2010.
- The IFLA International Standard Bibliographic Description DCAP (IFLA)
- BLAP - British Library Application Profile for Objects (BLAP)
- DC-Government Application Profile (DCMI) - deactivated since 2009.
It is not clear, however, to what extent these and other application profiles have been adopted in practice, as it is extremely difficult to get feedback from HEIs in order to discover this information. It may be possible, for example in the case of repositories and related systems, to discover such usage information by machine where such systems are either registered with a central service. In the case of repositories, the majority are registered with ROAR or OpenDOAR. Information about their use of metadata standards is not publicly available, however, and it is likely that various permutations of DC metadata are used in the vast majority of applications and services. Some of these arguably represent relatively simple, de facto local application profiles.
de facto Application Profiles
There is also a rise in usage in de facto application profiles using DC metadata, i.e. those which are not offically described as such, but combine elements of DC and other schemas. Only a few examples of significant growth areas that may be worth watching are given here, since it is impossible to trace all such usage across the Web.
A large number of UK public libraries currently offer EPUB book rentals (17 county and borough councils have signed up): this therefore translates into a lot of DC-Terms usage across these libraries.
The complex CERIF standard for Research Information Management is becoming more widespread as Current Research Information Systems (CRISs) are coming into more widespread use in UK HEIs. There are current plans at euroCRIS to map from CERIF to Dublin Core.
Adobe's Extensible Metadata Platform (XMP) is a labelling technology that allows embedding of metadata into the file itself. XMP is used in PDF, photography and photo editing applications. XMP is most commonly serialised and stored using a subset of RDF, which is in turn expressed in XML. The most common metadata tags recorded in XMP data are those from Dublin Core – using both simple DC and DC terms.
One of the early uses of Dublin Core was within web pages. DC elements can be used to add descriptive information to the page content, in order to improve retrieval. Terms are inserted within the <meta> tags in the HTML header. Search engines have used meta elements to help categorise and index web content. However as search engine robots have become more sophisticated (and tags were misused for unscrupulous marketing purposes), use of meta elements has decreased dramatically and Dublin Core’s importance in this context in now unclear. A legacy tool DC-dot (UKOLN, 2000) is still in use to automatically generate DCMES 1.1 <meta> tags, although this may be largely as a teaching tool.
DC terms are becoming a standard part of RDFa markup, a W3C Recommendation that adds a set of attribute level extensions to XHTML for embedding rich metadata within Web documents. A set of attributes are provided that can be used to carry metadata in an XML language. A number of UK government departments are exploring how RDFa can be used, including the Central Office of Information.
Major UK Organisations
JISC is a member of the Dublin Core Metadata Initiative on behalf of the UK, and is represented on the DCMI Oversight Committee. Several UKOLN staff represent on the Dublin Core Advisory Board (including on its email list), moderate a number of DCMI Communities and Task Groups, and participate in DC conferences. UKOLN owns all the DC lists (currently 41) which are hosted at JISCMAIL: the largest of these are DC-GENERAL and DC-LIBRARIES with 860 members and 307 members worldwide respectively. It is not possible to find numbers or percentages of UK members.
A number of other UK organisations are not directly involved with DCMI itself, but produce materials and organise events in the UK to support the use of metadata including Dublin Core; this results from their supporting roles in their communities. For example the Repositories Support Project and InfoNet (InfoKits), the Welsh Repositories Network (workshops and a basic guide to metadata in repositories) and CETIS (metadata for learning resources).
A large number of JISC projects have either used DC in the past or are currently using the standard. However there is no easy way to obtain statistics or to search for and identify current projects via the JISC website, since completed projects are also retrieved via the standard search. Listed below are some examples of current, or recently completed projects, in addition to the many projects that have already been mentioned above.
- The BRII project (University of Oxford, completed April 2010), focused on the use of semantic web technologies to share information about researchers, interests, projects and related research activity from a wide range of sources including the repository. It used various vocabularies and ontologies including simple DC, qualified DC and DC-Terms.
- The Electronic Theses Online Service (EThOS) project was funded by partner institutions, JISC and Research Libraries UK (RLUK). The live service is now managed by the British Library. Its offers a single point of access where researchers worldwide can access all theses produced within UK Higher Education, which are automatically harvested from institutional repositories and digitised where necessary. It developed UKETD_DC, a qualified DC schema (or application profile) for UK e-theses.
- The Transatlantic Archaeology Gateway (TAG) project was funded jointly by JISC and the National Endowment for the Humanities in the US. It aimed to develop tools for transatlantic cross-searching and semantic interoperability between the Archaeology Data Service (ADS) and the Digital Archaeological record (tDAR). It first created an infrastructure to enable basic cross-search of DC compatible metadata records for digital resources, and later a much deeper and richer level of cross-searching for faunal data. ADS uses qualified DC; tDAR digital repository services are based on the OAIS reference model and resource-level metadata is disseminated in OAI-compliant DC metadata.
- 15 projects in the UKOER programme were identified in a CETIS blog post in March 2010 which used Dublin Core, some at least in part via OAI-PMH, others using DC-Terms or considering implementation.
- The Zetoc service provides Z39.50-compliant search access to the British Library's Electronic Table of Contents (ETOC), based on DC. JISC has funded an update to this service in 2011, which is managed by MIMAS. Zetoc is free of charge to UK HE and FE institutions sponsored by JISC and to the UK Research Councils.