Skip to Content

Repositories

Repositories, also known as Digital Repositories, are essentially a kind of web application for publishing many types of documents and resources on the Web. Digital repositories could be considered as a specialised content management system with a number of additional services to aid discovery and re-use of the content. Not all repositories are necessarily described as such: other descriptions include digital archives, digital libraries and other similar terms, some of which may relate specifically to the particular type(s) and scope of content held within. Repositories in this context should not be confused with separate uses of the term in software repositories used by software developers in revision control or software installation systems.

Author: Steph Taylor
Last updated on 2 May 2013 - 3:30pm

Types of Repository

The most common type of repository is the institutional repository (IR), which are typically operated by higher education institutions. The term has most frequently been used to refer to repositories whose content consists either wholly or primarily of academic publications, largely as a result of the high profile of the steadily growing Open Access movement, which has been a closely connected parallel development. This particular subset of institutional repositories still forms the vast majority that could fall within the definition. However, a rapidly growing number of these services focus instead on content such as teaching and learning resources, images, newspapers, audiovisual materials such as radio, film and television archives, and other time-based media such as music. Other outputs such as software may be held, although it is important to note that this is not the meaning of the term “software repository” (see above).

Publications repositories may be operated by any organisation, but the vast majority are owned by higher education institutions: consequently they concentrate mainly on published academic articles, or scholarly publications, in the form of authors’ pre-prints and peer-reviewed post-prints. In addition, they may contain related resources such as book chapters, book reviews, conference papers and posters. Some may also include theses and dissertations, although these can alternatively be located in separate, dedicated repositories, often on a national basis.

There are also a number of well-known Subject Repositories, which are similar in scope to the majority of IRs in being aimed at academic publications, but they are not limited by institution. These can either have a national focus or, in many cases, will accept deposits on an international basis from any scholars within the discipline worldwide. There are also national repositories aimed at all subjects: in some countries, this may be an alternative approach to institution-based repositories; an alternative is to provide a national portal that harvests metadata from institutional repositories that harvest metadata and/or content from other repositories via OAI-PMH; elsewhere, they may be intended for those institutions without the resources to run their own repository, or who have not yet done so. There are also consortially based repositories operated by several institutions, often on a regional basis.

Although most publications repositories are intended to provide content in pursuit of the aims of the Open Access Initiative (OAI) to provide published academic articles free of charge to the public, this is by no means true in all higher education institutions. Some require full-text documents or equivalent files to be deposited along with the metadata records. On the other extreme there are repositories that act essentially as publications management systems that contain only metadata records pointing to the published articles on publishers’ web sites, which are usually accessible only by institutional subscription. Most lie between these extremes.

Repository Software

The majority of repositories employ open source software platforms such as DSpace, EPrints and Fedora, of which the latter includes an extended distribution called eSciDoc which is more commonly found in mainland Europe. However, there are successful commercial products, most notably Digital Commons by Bepress. Numerous hosting companies such as EPrints Services, @Mire and Open Repository offer paid repository services. Finally, repository services can be based on locally developed, custom software that is not commercially available, or built on the framework of more generic content management systems including commercial products such as Microsoft’s SharePoint, or open source products such as Drupal and Joomla.

The services included in repository software can typically include a browse interface, unique identifiers for each item or resource, metadata describing these resources and a machine-readable harvesting interface called OAI-PMH. The browse interface may or may not employ a Knowledge Information System (KOS) similar or identical to the shelf-marks used in libraries, of which Library of Congress subject headings are the most commonly used. Some repository software may use OAI-ORE to group disparate existing resources together in order to create new resources, for example in creating a teaching and learning resource out of existing course materials, images and articles on a reading list. Additionally, some repositories automatically generate publications lists and/or academic home pages for authors at their institution, which may be in some cases be configurable or customisable to the particular preferences of the academic.

An increasing area of overlap with the role of repositories in publications management is research information management. Current Research Information Systems (CRISs) manage every aspect of research information from project creation and grant applications through to project outputs, where publications are only one type of output. Commercial software vendors such as Symplectic provide software systems for publications management such as  Symplectic Elements (formerly Symplectic Publications) that fall somewhere in the middle of the spectrum of functionality between repositories and full CRIS systems, although in practice they are converging in the scope of their functionality with CRIS systems. In some cases, institutions may use the CRIS to management internal research information but expose publications and other outputs via a repository, whereas in others the CRIS software may include functionality that does not require a separate repository interface.

Trends in the current usage of repositories fall into two broad themes: (1) internal integration/embedding within the institution; (2) and interaction with external systems and organisations.

Embedding the IR within the Institution

Within the institution, the institutional repository is increasingly being viewed as one of a collection of systems that perform compatible functions. Other systems in this suite include Research Information Management (RIM) systems, research data repositories, learning and teaching repositories and library management systems. There are also many and varied link ups between institutional repositories and individual systems that are relevant within individual institutions and may not be reflected across the whole sector. In addition, some third-party, commercial systems designed to manage areas within scholarly communications, such as citation management, are also included in the embedding processes.

The main aims of embedding the repository within the institution are:

  • Encouraging researchers to deposit content in the IR
  • Facilitating discovery and access to the content of the IR
  • Enhancing the user experience of institutional systems as a whole, including the IR
  • Using the IR to support research within the institution
  • Integrating the IR within the workflows of researchers, research information staff, senior management and finance staff
  • Supporting teaching and learning within the institution by integrating the IR with systems such as VLEs and learning object repositories

A related area of development, digital archiving, has also begun in recent years to be integrated into the IR. The original IR was not, in most cases, designed to be a digital archive. However, over time, as deposits have grown and institutions have realised that the content they manage is increasingly in a digital format, the need for digital curation has become apparent. Many funding bodies are also now asking for digital curation plans as part of their requirements for funding research too, which has added to the imperative to manage digital curation in an effective way.  The repository can be adapted itself to also become a digital archive, or can be integrated with a digital archive system, so the seamless curation management of research outputs and data can be supported by an institution.

Within the UK, the drivers to embed the IR within institutions can come from within the institution itself, and have also been supported and encouraged by external funding such as the recent JISC Innovations Programme.

Integration with External Systems and Organisations

The integration of repositories with external systems covers four broad areas:

  • Supporting consortial collaboration between institutions
  • Integration of deposits into a regional and/or national repository
  • Integration of deposits into a subject-focussed repository
  • Integration of deposits into services run by external organisations

The repository is used to support collaborative work between institutions, such as a consortial agreements, where research is shared and pooled via either a shared IR or a linkup between individual IRs. White Rose Research Online, for example, is run by and for the universities of Leeds, Sheffield and York who operate as a consortium in a number of different areas within academia. For them, the joint repository is one facet of their collaborative work and sharing of resources across the three member institutions. 

The IR is used to feed into or linkup with a much larger repository which collects and presents deposits on a regional or national scale. For example, RIAN, in the Republic of Ireland, showcases the research of all the HE institutions in the country.

Subject repositories, such as arXiv.org for Physics, typically operate on an international scale, with either researchers depositing their work on an individual basis, or harvesting contributions on specific subjects from the IRs of contributing institutions.

Services run by external organisations, such as the ETHoS service run by the British Library, where English theses are collected from HEIs into a central thesis repository and made available online. Similar services also exist in Scotland and Wales, run by their repsective national libraries. Again in the UK, OpenDepot.org is run by Edina as an Open Access publications repository for higher education institutions who do not have a repository, and Jorum collects on open educational resources on a national scale.

Current issues for digital repositories fall into 5 broad categories:

  • Open Access
  • Rights management
  • Policies - to mandate or not to mandate?
  • What kind of content?
  • Pre-prints v. post-prints
  • Author’s final formatted post-print v. publisher’s final formatted post-print
  • Full text or not?
  • Copyright or license to publish?
  • Content formats

Open Access

The role of Open Access (OA) in a repository, especially in an institutional repository, (IR) is an ongoing debate within the higher education community. There is a general consensus among the IR community  and a growing number of researchers that OA is  good thing, supporting scholarly communications and, in broader terms, making research outputs available to a wide audience that extends well beyond the HE community, out into the general public. Indeed, funders are now, in some cases, stipulating that project outputs should be made available under Open Access terms, a role eminently suited to the IR or the subject repository.

However, on the other side of the debate, there are some who see OA as a negative force within the repository field. In practical terms, many IR managers find hostility among academic researchers themselves to the idea of OA. Researchers spend much of their careers building up their publications records, working closely with the editors of academic publishers and aiming ultimately, at having their research published in the leading academic journals within their field. Although not all publishers are hostile to OA, many are still deeply suspicious of a model they see as offering ‘free content’ - something they perceive as threat to their journal sales. Unwilling to jeopardise a relationship built up over many years, researchers themselves become wary of dealing with OA if they perceive such a move would upset a publisher who they deal with on a regular basis. In order to manage the initial hostility, some IR managers have taken a route of focussing on content first, OA second. This means they encourage deposits without insisting on OA for all deposits. Whether this compromise is a positive or negative contribution to the IR community is a matter for debate.

The question of whether all repositories should be 100% open access is a difficult one in practical terms too. Some research material may not be  suitable for OA for a number of reasons such as sensitive material (political, religious, legal cases etc), research funded by or undertaken with a commercial partner, where business confidentiality needs to be maintained, and research where the Data Protection Act could be compromised. In such cases, should the IR seek to support the deposits of all research outputs, even where those outputs can never be OA, or should it take only OA deposits?

Rights Management

Managing the rights of both copyright and/or IP owners and those who use a repository is increasingly an issue for repository managers as repositories develop and grow. Balancing both sides of this equation means being able to check and manage licenses on one hand, and, at the same time, manage the permissions of repository users so they can see content that licenses permit them to see, but can’t access material where the license does not permit them to view or use that content.

The issue of this balance has increased in importance as repositories expand their remit and become more deeply embedded into the institutional framework. For example, many repositories now handle teaching and learning objects, such as images, course material of various types and potentially e-material, where rights are negotiated via licenses with content creators/providers. Such activity results, potentially,  in a mix of open access content, paid-for, licensed content and free content available to only a subset of repository users. 

Rights management is often linked to authentication and authorisation of users, and much work is currently going on in the repository community in this area. By managing the access of individual users and/or groups of users, access to content that is available to specific users only can be made available to them and meet the requirements of content providers who use licenses as a means of permitting access to their content.

Policies - To Mandate Or Not To Mandate?

Initially, getting full-text content into institutional repositories presented a challenge. Researchers were encouraged to make deposits of their work, but this wasn’t in enough in all cases. In some institutions, the carrot of encouragement has been supplemented with the stick of the institution-wide mandate. Such mandates are often individually tailored to a specific institution in their details, but are broadly similar in requiring all academic research of a certain standard (defined, usually, as published or accepted for publication or some other mark of quality assurance such as approval by head of department etc) to be deposited within the repository. The penalties for not depositing research work vary, but again have common themes across institutions. Research not in the repository may not be counted as part of the output of an individual researcher during an annual assessment, thus jeopardising promotion and/or further research funding, as well as damaging the reputation of an individual researcher. In addition, the ultimate penalty is that, once the mandate is in place within an institution, any individuals not depositing their work are, technically, often in breach of their contract of employment. 

The drivers for institutions to adopt a mandate are usually:

  • a need for research outputs to be gathered in one place for easy assessment
  • seeing the IR as a showcase of their research output and wanting to show all research coherently to the outside world
  • an external requirement such as the UK-wide REF, where research output needs to be logged as part of a wider requirement for further funding etc.

There is much debate about the effectiveness of the mandate among the repository community. On the plus side, some see a mandate as a seal of approval for the repository from the senior management of an institution. They argue that such obvious backing and potentially serious penalties make researchers take the need for depositing research papers in the repository seriously, thus encouraging them. In the opposing camp, some feel that taking such a ‘heavy handed’ approach alienates researchers and makes them hostile to the repository, thus actually decreasing the amount of deposits. They would also argue that very few penalties are actually applied if the researcher fails to deposit. The point that both sides of the debate agree on is that getting researchers to make any deposits is a huge cultural shift. Of course, some within the repository community see a third way, that of using both carrot and stick, so researchers face penalties but also have positive reasons for making deposits, such as inclusion in subject or national research repositories/portals via their submission to the IR, being able to use the IR as a safe place to store research outputs and being able to quickly and easily showcase their work via the IR deposits they have made.

This is an ongoing debate across the IR community.



Dr. Radut | subject