Perhaps the best known metadata initiative is the Dublin Core (DC), a metadata element set intended to facilitate discovery of electronic resources. It is developed and maintained by the Dublin Core Metadata Initiative (DCMI). Originally conceived for author-generated description of Web resources, a role in which it has become widely used across the Internet, it has attracted the attention of formal resource description communities such as museums, libraries, government agencies and commercial organizations. Since its inception, there have been two official versions of Dublin Core metadata and a number of syntactic iterations, largely due to developments in name spaces, the increasing use of the Resource Description Framework (RDF) and related mark-up languages, and the development of application profiles.
Dublin Core
Last updated on 2 May 2013 - 3:28pm
There are three principal types of Dublin Core metadata:
- the Dublin Core Metadata Element Set (DCMES) version 1.1
- an extended syntax based on DCMES 1.1 with the addition of Dublin Core Qualifiers
- the DCMI Metadata Terms (DC-Terms).
All of the above are still considered valid by DCMI, although DC-Terms has officially superseded both DCMES 1.1 and the subsequent addition of qualifiers to the simple elements. Both of the later approaches were created in response to the perceived limitations of the original element set that was created following a meeting in Dublin, Ohio in 1995.
The key characteristics of the Dublin Core, in all its versions, are:
- Simplicity
- Semantic interoperability
- International consensus
- Extensibility
- Modularity
'Simple' Dublin Core
DCMES 1.1 is a metadata element set intended to facilitate discovery of electronic resources. It was designed as a simple metadata element set in response to the complexity of metadata schemas such as MARC, which was designed for use in cataloguing systems in libraries and is arguably not as well suited to simpler, Web-based applications such as repositories. DCMES 1.1 is often informally referred to as "Simple Dublin Core" or "DC-Simple", although these are unofficial names. Originally conceived for author-generated description of Web resources, a role in which it has become widely used across the Internet in applications where there is a need for a relatively simple and flexible formal description of Web resources.
The basic DCMES metadata element set defines fifteen metadata elements for simple resource discovery; title, creator, subject, description, publisher, contributor, date, type, format, identifier, source, language, relation, coverage and rights. Some of the elements are provided with an encoding scheme that defines and restricts the scope and form of the metadata that may be found within the element, e.g. how to encode the language element using ISO 639-2 and RFC 3066 and, in the case of type, a vocabulary is provided. One of the specific purposes of DC is to support cross-domain resource discovery; i.e. to serve as an intermediary between the numerous community-specific formats being developed. Originally, the recommended usage was the dotted syntax, e.g. dc.title, dc.subject etc, which is still officially valid, although the name space syntax dc:title etc is now preferred.
'Qualified' Dublin Core
Subsequently, the potential limitations of the basic DCMES elements were recognised by DCMI. An early solution called Dublin Core Qualifiers, now superseded, was to introduce qualifiers to the basic elements, e.g. dc.date.issued v. dc.date.available and dc.coverage.spatial v. dc.coverage.temporal, where previously only the broader dc.date and dc.coverage were available. This general approach, which allows for a broader range of elements than DCMES, is often known informally as "Qualified Dublin Core". It effectively created new metadata elements by allowing the disambiguation of different usages of the same parent element, while the original unqualified elements also remained available for use. In addition, new parent elements were added: audience, provenance, rights holder, accrual method, accrual periodicity and accrual policy. Effectively, these represent add-ons to DCMES 1.1.
The original dotted syntax also remains valid in these contexts, but is now discouraged in favour of the use of name spaces: this uses a colon after the name space, which is required by the conventions of RDF. In the case of simple or qualifed Dublin Core, this would give examples such as dc:date and dc:date.issued etc respectively., which are only marginally different from the earlier dotted syntax.
'Modern' Dublin Core
This approach has been superseded by the DCMI Metadata Terms (DC-Terms), which are officially recommended by DCMI, although the earlier approaches are still considered valid uses of Dublin Core metadata, in part for the purposes of backwards compatibility with legacy systems that may still be in use across the Internet. However, DC-Terms introduces a wider range of simple elements without the need for the qualifier construction, e.g. dc:available for the earlier dc:date.available element. It could be argued that this development goes back full circle towards the original approach, but the range of available elements is wider. It also represents an admission that, at least in some applications on the Web, the original range of elements was insufficient - or, alternatively, has become so with time, as technological demands on the scope of DC metadata have increased.
Linked Data
‘Modern’ Dublin Core usage (DCMI Terms) is in line with the linked data approach, allowing semantic applications using RDF and providing ‘formal interoperability’ for machine-processing in linked data environments. DCMI maintains semantics and provides guidelines in RDF, XML and HTML.
Representatives of the Dublin Core effort have helped to develop resource discovery using the Resource Description Framework (RDF), an infrastructure that supports the coexistence of complementary, independently maintained metadata packages.
In the RDF world, Dublin Core and Dublin Core Terms are often termed as ‘vocabularies’, along with SKOS, the Bibliographic Ontology etc. As usual, different communities have different understandings of the same terms. This has been discussed by the W3C Library Linked Data Incubator Group and some explanations are provided. This group is now using the term ‘value vocabulary’.
Application Profiles
This work led to the development of the Dublin Core Abstract Model (DCAM), which defines the information model for DC metadata. Later it led to the development of the Singapore Framework, a framework for designing metadata applications, and the derived Description Set Profile (DSP) mark-up language, which is most often used to describe Dublin Core Application Profiles (DCAPs), which are a subset of application profiles (APs) based either predominantly or solely on DC metadata elements.
Application profiles are in essence metadata vocabularies built from terms drawn from pre-existing metadata schemas and adapted for the needs of a specific application or resource type, and may also define a structure upon which information about a resource is organised. For example, common applications might include managing scholarly publications, audio and video resources or images on the Web within various specific applications. Some application profiles, for example, DC-Ed for learning resources, may cover many different types of resources, whereas others, for example SWAP for scholarly publications, may tend to cover only a restricted set of similar types of resources.
Not all APs are by definition DCAPs, although in practice a very considerable majority are largely based on elements from the Dublin Core metadata schemas. Originally, this was the broad definition of a DCAP, and this older, more basic definition is still encountered. However, the definition of a DCAP as set out by DCMI now requires it to be built on the basis of the DCAM and the Singapore Framework, which means that it should include a description set using the DSP mark-up language.
In common with other APs, DCAPs provide encoding schemes, constraints and structural relationships for the metadata model, i.e. to define more closely the syntax expected within metadata elements; they also provide a formally defined entity model. The metadata elements may come from one or more metadata schemas, referred to in RDF terms as name spaces, including DCMES 1.1, DC-Terms and non-DC metadata elements. An example of a DCAP that uses non-DC elements is the Scholarly Works Application Profile (SWAP), which adapts elements from both FOAF and MARC as well as using purpose-built elements from its own name space, the EPrints Terms metadata element set. SWAP has been influential in the development of many subsequent DCAPs and other application profiles, although use of SWAP itself has never become widespread.
Some de facto application profiles may not be described as such. For example, the Common European Research Information Format (CERIF) is effectively an application profile and the ePub file packaging format effectively contains within it a simple application profile extending DCMES 1.1 for the specific purposes of ebook readers.
Dublin Core metadata has attracted the attention of formal resource description communities such as museums, libraries, government agencies and commercial organizations across the world. It is widely used across the Internet, although Dublin Core metadata schemas are by far from the only ones available for resource description in such contexts. The choice of metadata schema is dependent on a complex array of factors in any given application but, broadly speaking, Dublin Core is often a good choice where there is a need for a relatively simple and flexible formal description of Web resources.
The Dublin Core is positioned as a simple information resource description format. However, it also aims to provide a basis for semantic interoperability between other, often more complicated, formats. The Dublin Core benefits from active participation and promotion in over 50 countries across the world.
Much Dublin Core use is hidden behind the scenes (as it should be, certainly from a user point of view), so by definition is difficult to discover. Often small parts of the standard are used. This is particularly true in the linked data world. Mashups have become commonplace and DCMI itself is often unaware of certain uses. However it is of course desirable that standards are taken and used together with other technologies to create new services. Creators of open standards cannot dictate how they are used.
Repositories
Many repository managers use the ‘out of the box’ solutions provided with their chosen repository software and do not customise the schemas and templates designed and supplied by the developers. Consequently, the majority of users of DSpace and EPrints, which represent 63% of UK repositories, use DC metadata. Most other repositories platforms do likewise, although some may support mappings to schemas such as METS, MARC and bibliographic models such as BIBO and others.
Although DC was originally designed for ‘document like objects’ it is also used for other resource types. The EPrints model offers a wide range of templates for describing many different resource types. Some repository software platforms allow other schemas apart from DC to be used for more specialist resource description.
Libraries
In general simple Dublin Core is not able to provide the level of granularity that libraries need to make the fine grained, but important, distinctions between resources in their collections, especially national libraries such as the British Library, National Library of Wales and the Bodleian. Schemas such as MARC21 and METS are still used instead. However it still has certain specific applications. Two examples of current uses of DC metadata in the British Library are provided below:
- The UK Web Archive, Web Curator Tool schedules crawls to capture snapshots of websites. The title field is currently populated automatically, whereas the other fields need to be populated manually. This is a fairly basic use of DC metadata.
- Search Our Catalogue, a resource discovery interface, is an implementation of Ex Libris' PRIMO system. This uses a form of qualified DC metadata as a means of aggregating metadata from different sources.
Machine Interoperability
The unqualifed and qualified variants of DC metadata are still widely used, despite the latter having been superseded by DC-Terms according to DCMI. It is now normal to cite the name spaces of application profiles using DC and/or other metadata in XML and/or RDF formats for the purposes of machine readability and interoperability. Previously, it was questionable to what extent other systems were making use of some of these documents widely, especially in the case of application profiles, but developments in linked data have made the effort of producing them more justifiable from a techical standpoint (see further below).
Harvesting
An exception to this is the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). As a minimum, repositories must be able to return records with metadata expressed in the unqualified DCMES 1.1 format, using the oai_dc schema. This schema is arguably itself a simple application profile based on DC metadata. The protocol is thus a low-barrier mechanism for repository interoperability: data providers are repositories that expose structured metadata via OAI-PMH; service providers then make OAI-PMH service requests to harvest that metadata. In practice, aggregation services often harvest only unqualified DC for maximum interoperability, although OAI-PMH may technically be used for qualified DC, DC-Terms and other metadata schemas. According to a survey by the Repositories Support Project (RSP), many repositories also provide qualified DC; other formats include METS, MODS, PREMIS etc. Given the requirement for repositories to expose records in DC in order to be harvested, oai-dc has been built into most repository software, thus making it easier for institutions to adopt. However repository platforms have implemented DC in different ways. The large number of repositories in the UK means that a form of DC is used in around 157 repositories, although in differing ways.
It may have been assumed that a significant proportion of end users in the UK would use an aggregated service based on harvesting (eg OAIster) to cross search repositories and retrieve records. However, consultation for a DC usage survey in 2011 revealed that the majority just use Google. Many institutions use Google Analytics to discover ‘traffic source’. Therefore it appears that the necessity of providing oai-dc for harvesters is no longer one of the main incentives/reasons for using DC in repositories, since the percentage of people using these search services is insignificant. Parts of the research community overestimate the importance of such search services. With a pragmatic approach, repository managers are more interested in what Google is providing, as well as Google Scholar.
It seems likely that most repository platforms are based around using DC because it offered a simple standardised way of describing and making resources available on the web, with the added incentive of OAI-PMH compatibility, and not as a result of OAI-PMH mandatory requirements.
The lack of use of aggregated search in the UK is most likely due to the lack of UK-based harvesting/search services. The MIMAS UK Institutional Repository Search provides a demonstrator, not a service (although it appears to function well). Harvesting is provided by RepUK, although this has so far maintained a low profile.
Deposit
Another widespread use of DC metadata is within the Simple Web-service Offering Repository Deposit (SWORD) protocol based on the Atom syndication format, which offers a standardized method of deposit into remote repositories. Although there is no inherent reason why DC metadata should be chosen over any other schemas, this is almost always the case in practice, since the recipient repository needs to be able to process the metadata and documents packaged with SWORD, whether or not metadata from other schemas are also provided. As most repositories rely on DC metadata, this is the obvious choice.
Linked Data
Dublin Core is a popular component in RDF data – although mostly it follows the same patterns of use, which are quite simple.
EPrints releases in 2010 included a range of new features relating to linked data and Dublin Core Application Profiles, including the ability to establish arbitrary relations between objects or provide additional metadata in triple form. Version 3.2.1 added URIs for derived entities eg authors, events, locations as well as an extendable RDF system, by default using the BIBO Ontology. These employ DC metadata elements in the EPrints schema.
The following examples in the UK represent a sample of organisations using DC elements or terms as RDF linked data, with the vast majority using it for very similar applications (mostly defining format information, some form of title or descriptive string, and mime type information). These examples do not include the very many sites using microformats, RSS etc, which also contain DC terms (or terms inherited from DC, but not expressly stated as such).
- The BBC
- JISC Dev8d developer forum
- Data.gov.uk: general store for governmental open data in the UK, e.g. Patent Office
- Ordnance Survey
The UK data.gov.uk initiative has not been prescriptive, so government departments are using different models and vocabularies for exposing their data as linked data, with the result that a large-scale ‘clean-up’ operation is likely to be required.
An initiative at the University of Southampton brings together work and research done on a number of public sector information catalogues including the UK's national catalogue at data.gov.uk. However, changes to the contents and data formats of the originating portals, particularly at data.gov.uk made this difficult to access in winter 2011-12.
The British Library started to make the British National Bibliography (BNB) available in RDF in late 2010. Sample files are being published in an interative process for research purposes and to investigate options for structuring catalogue information as linked data. Since there are 2.9 million records, there are issues of scale in publishing all of this unusally large-scale source of high quality bibliographic data as RDF. The DC-Terms name space is employed amongst a range of equivalent metadata formats, as well as dc:date from DCMES 1.1., using a range of vocabularies and ontologies including LCSH, MARC, MESH, SKOS and OWL. It has been used and analysed by organisations such as the BBC, Ithaka, Open Library, Phil Papers, TSO, Wikimedia Commons.
The JISC-funded OpenBib project will publish a substantial corpus of bibliographic metadata as Linked Open Data, using existing semantic web tools, standards (RDF, SPARQL), linked data patterns and accepted Open ontologies (FoaF, BIBO, DC, FRBR, etc). The data is from two distinct sources: traditional library catalogues (Cambridge University Library and the British Library) and ToCs from a scientific publisher, the International Union of Crystallography IUCr. OpenBib has also been used to analyse the BNB data. The University of Mannheim Library is also working in this area – its catalogue can be searched as a linked data service as well as the data being made available for reuse. Further examples may be found in a use case report by the W3C library linked data Incubator Group.
Talis Prism 3 is a next-generation OPAC/search and discovery interface extracting MARC 21 records and using DCMES 1.1, DC Terms (and, in the future, BIBO) metadata to expose them as linked data. The Resilience Knowledge Base (RKB) has been built as part of the EU-funded ReSIST project by the University of Southampton to gather data from bibliographic and other sources (covering people, projects, organisations, publications etc) and added structure allows querying by topic. RKB Explorer provides the interface to the RKB, and some of the using OAI-PMH from OAI linked data sources around the world.
A JISC-funded project based at the Open University, LUCERO will ‘investigate and prototype the use of linked data technologies and approaches to linking and exposing data for students and researchers’. A platform has been made available which currently provides access to data from the OU institutional repository and other OU resources via a SPARQL endpoint. The OU repository is based on EPrints so already includes a function to export information as RDF using the BIBO ontology. The schemas in use include DC Terms, the W3C Media Ontology, FOAF and Media RDF. The W3C use case on collecting material related to courses at The Open University identifies a set of challenges to transferring library data to RDF.
The Locah project is making records from the Archives Hub and Copac available as linked data in order to interconnect with other data and contribute to the growth of the Semantic Web. It makes use of some common classes and properties from the DC Terms RDF vocabulary, as well as SKOS, FOAF, BIBO, ORE, Event and Time Ontologies. SPARQL endpoints will be provided to enable the data to be queried. The Archives Hub is an aggregation of archival metadata from repositories across the UK; Copac provides access to the merged catalogues of many university, specialist, and national libraries in the UK and Ireland.
Application Profiles
Although all types of DC metadata have been used as the basis for application profiles, it is now more usual (and officially recommended) to use DC-Terms to build new application profiles, which are in any case explicitly based on the earlier elements for backwards compatibility. In other contexts, all of these variants are still in widespread use. DC metadata has been adopted in an extremely wide range of software systems and Web applications wherever there is a need to manage resources, and is by no means limited to traditional library systems and repository software, as was more often the case in the past.
There are too many APs and DCAPs to be able to provide an authoritative list, especially since many of these are developed and used within individual systems for very specific local purposes and their technical details may not even be exposed publicly where such systems are not open to general access from the Internet. There have been a number of DCAPs, many of which were funded in full or in part by the JISC (including via DCMI). These have seen varying levels of development and adoption, and some of which are still being actively developed whereas others are not, or were never implemented. For example, it appears that there is insufficient demand in the metadata community for the particular functionality offered by SWAP over qualified DC, since SWAP has seen little practical use. The following list contains both full DCAPs and scoping studies for possible DCAPs (with responsible organisations in brackets where known):
- SWAP - Scholarly Works Application Profile (DCMI & UKOLN, University of Bath)
- GAP - Geospatial Application Profile (EDINA)
- IAP - Images Application Profile (UKOLN, University of Bath)
- TBMAP - Time-Based Media Application Profile
- LMAP - Learning Materials Application Profile scoping study (JISC CETIS)
- SDAPSS - Scientific Data Application Profile Scoping Study (UKOLN, University of Bath)
- Dc-Ed - Dublin Core Education application profile (DCMI)
- Dc-Lib - Dublin Core Library application profile (DCMI)
- DC Accessibility Application Profile (see workshop, February 2011)
- DCCAP - Dublin Core Collections Application Profile (DCMI) based on the earlier RSLP Collection Description metadata schema (DCMI) - deactivated since 2010.
- The IFLA International Standard Bibliographic Description DCAP (IFLA)
- BLAP - British Library Application Profile for Objects (BLAP)
- DC-Government Application Profile (DCMI) - deactivated since 2009.
It is not clear, however, to what extent these and other application profiles have been adopted in practice, as it is extremely difficult to get feedback from HEIs in order to discover this information. It may be possible, for example in the case of repositories and related systems, to discover such usage information by machine where such systems are either registered with a central service. In the case of repositories, the majority are registered with ROAR or OpenDOAR. Information about their use of metadata standards is not publicly available, however, and it is likely that various permutations of DC metadata are used in the vast majority of applications and services. Some of these arguably represent relatively simple, de facto local application profiles.
de facto Application Profiles
There is also a rise in usage in de facto application profiles using DC metadata, i.e. those which are not offically described as such, but combine elements of DC and other schemas. Only a few examples of significant growth areas that may be worth watching are given here, since it is impossible to trace all such usage across the Web.
A large number of UK public libraries currently offer EPUB book rentals (17 county and borough councils have signed up): this therefore translates into a lot of DC-Terms usage across these libraries.
The complex CERIF standard for Research Information Management is becoming more widespread as Current Research Information Systems (CRISs) are coming into more widespread use in UK HEIs. There are current plans at euroCRIS to map from CERIF to Dublin Core.
Adobe's Extensible Metadata Platform (XMP) is a labelling technology that allows embedding of metadata into the file itself. XMP is used in PDF, photography and photo editing applications. XMP is most commonly serialised and stored using a subset of RDF, which is in turn expressed in XML. The most common metadata tags recorded in XMP data are those from Dublin Core – using both simple DC and DC terms.
Web Pages
One of the early uses of Dublin Core was within web pages. DC elements can be used to add descriptive information to the page content, in order to improve retrieval. Terms are inserted within the <meta> tags in the HTML header. Search engines have used meta elements to help categorise and index web content. However as search engine robots have become more sophisticated (and tags were misused for unscrupulous marketing purposes), use of meta elements has decreased dramatically and Dublin Core’s importance in this context in now unclear. A legacy tool DC-dot (UKOLN, 2000) is still in use to automatically generate DCMES 1.1 <meta> tags, although this may be largely as a teaching tool.
DC terms are becoming a standard part of RDFa markup, a W3C Recommendation that adds a set of attribute level extensions to XHTML for embedding rich metadata within Web documents. A set of attributes are provided that can be used to carry metadata in an XML language. A number of UK government departments are exploring how RDFa can be used, including the Central Office of Information.
Major UK Organisations
JISC is a member of the Dublin Core Metadata Initiative on behalf of the UK, and is represented on the DCMI Oversight Committee. Several UKOLN staff represent on the Dublin Core Advisory Board (including on its email list), moderate a number of DCMI Communities and Task Groups, and participate in DC conferences. UKOLN owns all the DC lists (currently 41) which are hosted at JISCMAIL: the largest of these are DC-GENERAL and DC-LIBRARIES with 860 members and 307 members worldwide respectively. It is not possible to find numbers or percentages of UK members.
A number of other UK organisations are not directly involved with DCMI itself, but produce materials and organise events in the UK to support the use of metadata including Dublin Core; this results from their supporting roles in their communities. For example the Repositories Support Project and InfoNet (InfoKits), the Welsh Repositories Network (workshops and a basic guide to metadata in repositories) and CETIS (metadata for learning resources).
JISC Projects
A large number of JISC projects have either used DC in the past or are currently using the standard. However there is no easy way to obtain statistics or to search for and identify current projects via the JISC website, since completed projects are also retrieved via the standard search. Listed below are some examples of current, or recently completed projects, in addition to the many projects that have already been mentioned above.
- The BRII project (University of Oxford, completed April 2010), focused on the use of semantic web technologies to share information about researchers, interests, projects and related research activity from a wide range of sources including the repository. It used various vocabularies and ontologies including simple DC, qualified DC and DC-Terms.
- The Electronic Theses Online Service (EThOS) project was funded by partner institutions, JISC and Research Libraries UK (RLUK). The live service is now managed by the British Library. Its offers a single point of access where researchers worldwide can access all theses produced within UK Higher Education, which are automatically harvested from institutional repositories and digitised where necessary. It developed UKETD_DC, a qualified DC schema (or application profile) for UK e-theses.
- The Transatlantic Archaeology Gateway (TAG) project was funded jointly by JISC and the National Endowment for the Humanities in the US. It aimed to develop tools for transatlantic cross-searching and semantic interoperability between the Archaeology Data Service (ADS) and the Digital Archaeological record (tDAR). It first created an infrastructure to enable basic cross-search of DC compatible metadata records for digital resources, and later a much deeper and richer level of cross-searching for faunal data. ADS uses qualified DC; tDAR digital repository services are based on the OAIS reference model and resource-level metadata is disseminated in OAI-compliant DC metadata.
- 15 projects in the UKOER programme were identified in a CETIS blog post in March 2010 which used Dublin Core, some at least in part via OAI-PMH, others using DC-Terms or considering implementation.
- The Zetoc service provides Z39.50-compliant search access to the British Library's Electronic Table of Contents (ETOC), based on DC. JISC has funded an update to this service in 2011, which is managed by MIMAS. Zetoc is free of charge to UK HE and FE institutions sponsored by JISC and to the UK Research Councils.
The original purpose of DC metadata was to provide a simple metadata element set compared to more complex standards such as MARC. Over time, however, the progression towards the use of Qualified Dublin Core and subsequently DC-Terms has tended to introduce more complexity, which can be further developed in building application profiles, which are most often based on Dublin Core metadata. The positive aspect of these changing approaches, however, is that DC offers a wide range of flexible alternatives for building metadata solutions for specific applications rather than adopting a prescriptive approach: this is a criticism often levelled, for example, at MARC, which has not made a similar transition to wider use on the Web, beyond traditional library and information systems.
Legacy Approaches
Therefore the original 15 elements of ‘Simple Dublin Core’ are now regarded by DCMI as a ‘legacy’ use of the standard. However many members of the DCMI community disagree with the term ‘legacy’ and prefer the term ‘classic’ instead.
Simple Dublin Core does not enable volume, issue and pagination metadata to be separately identified. Japan's National Institute of Informatics (NII) guidelines for metadata usage in IRs separate enumeration and pagination using the following elements:
dc.identifier.volume
dc.identifier.issue
dc.identifier.spage [start page]
dc.identifier.epage [end page]
The University of Stirling has adopted this approach. At Stirling the depositor enters the citation in a single field (dc.relation), and then repository staff break this down into the component parts during workflow. This usage has been criticised as local customisation which is not interoperable. However practitioners argue that the alternative is to have separate metadata elements mixed together in one field.
Some repository managers have a limited understanding of the underlying metadata models within their repository systems. However in DSpace for example, it is straightforward to select additional qualified DC elements to add to the default set. Decisions to customise or enhance may depend on where priorities lie – adding local enhancements to produce richer metadata allows more potential for interoperability issues; sensible mapping may also be more difficult.
Some minor issues of practice arise where there is more than one way to use DC metadata to describe a particular element, e.g. in the choice of dc:contributor:author or dc:author, but here it may generally be best to follow the OAI-PMH usage in case of doubt. In most cases, practice is informed by the default usage supplied with the software package in question.
Linked Data
Many organisations are using simple Dublin Core because it fits their requirements and is likely to continue to fit with their service objectives for the foreseeable future. They may have little incentive to adopt a linked data approach, particularly as staff resources are squeezed in the current UK financial crisis. Indeed they may have no choice but to maintain their current approach. These organisations should not therefore feel pressurised to move to linked data at the current time, as some within the DCMI community appear to advocate.
One incentive for moving towards linked data is to keep up with new developments, so that information services are able to take advantage of new advanced services as they become available. For organisations in a position to do so, this is clearly a desirable approach. However linked data adoption consists of a number of steps. Key advice in the linked data tutorials at DC-2010 was to ‘start simple’: the first step should be to map current data to the 15 DC simple elements. Next is to agree (non trivial) and adopt vocabularies - and only then should RDF be addressed and URIs assigned. Throughout all the stages data quality is of utmost importance. Therefore it could be argued that consistent high quality simple (or enhanced) Dublin Core records underpin all the more sophisticated services built on top. Since many organisations do not have this basis (metadata is low quality, sparse, inconsistent, badly mapped), the basics should be addressed first.
Some of the above advice has been formalised in the DCMI ‘Interoperability Levels for Dublin Core Metadata’, which describes four levels of interoperability, building on each other in turn. As Harper suggests: ‘The real value proposition of DC lies in its commitment to interoperability, as well as in applicability of the organization’s guidelines and recommendations to any metadata scheme, element set, or implementation syntax.’
Harper also points out that ‘One very significant value of the DCMI is its ongoing work to make tools and principles like those developed in the W3C relevant in more traditional metadata spaces, including libraries. The DCMI serves as a bridge between the linked data community and other areas of practice.’
Linked data was an unofficial key theme at the DC-2010 conference in Pittsburgh, recurring in many presentations and workshops. It is clearly seen by many as the way forward for Dublin Core usage. Many others would argue that this is only part of the overall Dublin Core picture.
However RDF was described by Weibel at DC-2010 as ‘an aspirational technology – it has a lot of promise, but is not yet proven’. In addition there is no ‘killer app’ so it is difficult to visualise the full potential of linked data (Karen Coyle).
Vocabularies are vital for the linked data world to function as it should (ie in supporting semantic interoperability across communities) and it has been recognised that Dublin Core has a key role to play. According to Bergman there is currently no true interoperability across subject domains because reference vocabularies are not available: dc-subject could be pivotal.
A frequent criticism of currently available linked data is the incorrect use of the OWL ‘SameAs’ relation (instead of some form of ‘isRelatedTo’ relation), with the result that concepts which are not the same are incorrectly linked – and incorrect inferences may be made further down the line. Again the Dublin Core community is in a good position to provide best practice solutions for these types of issues.
In February 2011 DCMI and the FOAF Project agreed to tighten the alignment between DCMI Metadata Terms and the FOAF (Friend of a Friend) Vocabulary. The specifications are often used together in applications, especially in Linked Data. This is likely to include aligning overlapping terms, documenting common usage patterns, promoting best-practice principles of vocabulary maintenance and supporting third-party extensions and companion vocabularies. In addition the DC Advisory Board is discussing how DCMI could collaborate with a wider group of vocabulary maintainers, in order to improve cross-vocabulary cooperation and help create a more coherent landscape in Linked Data.
Also in February 2011, Dublin Core appeared in second place in a snapshot of the ‘Top 100 most popular RDF namespace prefixes’ (FOAF was first). Despite the noted caveats, this is still a clear indication of the importance of DC Terms in the linked data world.
Harmonising Practice
The DRIVER guidelines (and supplementary OpenAIRE guidelines) were produced as part of the EU-funded DRIVER project with the intention to standardise record quality in harvested IRs by describing how to map from an internal format to unqualified DC in order to support harvesting. Implementation has not occurred in the UK as it has in other European countries such as Spain, Portugal and Ireland. It is unclear to what extent other metadata guidelines produced in the UK such as the IRIScotland 2007/8 metadata agreement for institutional repositories and the Welsh Repositories Network metadata overview, have been implemented, but practice continues to vary significantly across the sector.
Harvesting
It appears that the necessity of providing oai-dc for harvesters is no longer one of the main incentives/reasons for repository managers to use DC in repositories, since the percentage of people using search services based on harvesting is insignificant. Perhaps some parts of the research community overestimate the importance of such services. With a pragmatic approach, repository managers are more interested in what Google is providing.
In practice, unqualified DC is used in oai-dc despite it being technically possible to use qualified DC or DC-Terms, since these would reduce compatibility with other systems. Interoperability is thus defined by a base-line standard of the original 15 unqualified DC metadata elements. This can occasionally, however, create considerable scope for ambiguity, e.g. in the case of scholarly publications, where dc:description and dc:description.abstract are both harvested and the qualifier is discarded, the difference between the two subsequently becomes opaque to machine processing and appears merely to be a duplicated element with inexplicably different content.
OAI-PMH is relatively simple to implement within a repository software package and to use. Disadvantages include rudimentary query facilities and variations in the format of the returned data. Repositories also need to be registered with OAI in order to be discovered by harvesters. However the adage that ‘just because your repository can be harvested doesn't mean it's interoperable’ applies – there are many different interpretations of field use and many varied vocabularies.
The EThOS project in 2007 suggested that the number of OAI service providers has not kept pace with repository development. It could be argued that the UK is not capitalising on the significant benefit of oai-dc harvesting and search, which is generally seen as a key return on investment in Dublin Core.
A study in the US in 2008 was surprised to find that results based on aggregated records had not improved as institutions gained experience of using DC and mapping: this was due to mapping errors and misunderstanding and misuse of Dublin Core fields, where mapping is often based on semantic meanings of metadata fields rather than value strings; and correct mapping could improve metadata quality significantly.
However feedback in the interviews conducted for the Linking UK Repositories study suggested that simple Dublin Core is not sufficient for cross-repository retrieval, and that a richer metadata format with greater potential for subject classification may be better suited. Qualified DC can be used, but since qualifiers are not necessarily standard (unless laid down through an application profile) this approach can have the drawback of reducing interoperability rather than raising it.
Issues related to funding metadata provide a useful example of search and retrieval issues. The requirement to include funding information is currently cited by many institutions. While funder is a qualifier for dc.contributor, this would not be available via oai-dc (which is limited to simple DC); whereas if funding information was contained as text within a resource itself, this is likely to be retrievable via spidering.
One option to address this type of issue would be to seek agreement on an additional export filter for repository platforms, to include required qualified Dublin Core such as funder metadata. Again, it would be useful to know user demand for a resulting service.
The areas of aggregated repository search via OAI-PMH and Web search via spidering require further investigation, in order to determine which approach offers maximum benefit for research users, as well as optimum use of resource-intensive services. User demand, requirements and current practice all need to be examined (taking into account MIMAS UK Institutional Repository Search, RepUK, OpenAIRE/DRIVER, CERIF et al).
Unique Identifiers
Unique identification and disambiguation of author names is a significant issue for institutional repositories. DC metadata does not offer any way to distinguish a controlled name from an uncontrolled one, much less to harmonise variants. DC also does not provide for the capture of an author’s institutional affiliation, which complicates disambiguation attempts. Considerable standardisation effort is currently going on internationally in this area.
Internal Organisation
There is also the issue at DCMI (common to many similar organisations) of continuing development relying on the effort of volunteer members. Sustaining momentum on the production of technical specifications such as the Abstract Model and related syntax guidelines is an ongoing problem as a result.
Current Focus of DCMI
Criticism has been made of the Dublin Core Abstract Model (DCAM) for its complexity, in stark opposition to the simplicity of the earlier approaches taken by DCMI to metadata develoment; similarly, the Description Set Profile (DSP) language has been criticised for the lack of explicit advantages that it offers over conventional RDF: unlike RDF, DSP has not emerged as a mainstream web technology in widespread use. There remains some doubt at present whether or not it offers significant interoperability benefits, and for what purpose, within software applications in production systems and services on the Web. However, both the DCAM and the DSP have many proponents, particularly in the linked data community.
DC metadata is therefore being used in different ways by different communities depending on requirements. The focus of DCMI as an organisation and its development effort may no longer be on the 15 elements of simple DC but there are many for whom it is still an essential component of their services. Also the DCAM is often criticised for its complexity and as yet there is little implementation – the Scholarly Works Application Profile (SWAP) is the only known implementation at the time of writing (and SWAP itself is not used in any publicly exposed production services). In fact there is ongoing discussion in DCMI about whether the DCAM is needed, now that the RDF data model has become more familiar, with the increased adoption of linked data.