The fact that classification schemes use a system of notation to represent the hierarchical structure of concepts, where each concept is represented by a notation rather than a natural language term, provides the potential for interoperable search and browsing access to multilingual databases when the databases use the same classification schemes. However, if the KOS used in the databases differ in structure, domain, language, or granularity, the KOS will need to be transformed, mapped, or merged. Moreover, multilingual KOS mapping is complex because it involves translation of concepts, not terms, and there is often significant variation between languages. Different cultural perspectives also need to be integrated (e.g. the concept space of education in one country can be rather different to that in a neighbouring country). On the one hand, communities develop KOS specific to their concepts, terminology, and needs; on the other hand searchers want to use a single search to find resources in databases serving different domains and accessed by different KOS, across which there may be no consensus regarding concepts, terminology, and knowledge organisation.
Apart from semantic interoperability, there also needs to be interoperability with applications: KOS should work with search engines, Content Management Systems, Web publishing software, etc. In order to do this they need to be made available in existing formats and protocols for data exchange, such as SKOS for representation of KOS in RDF in a simple way, and URIs for unique identification of the KOS, its concepts and terms. SKOS and URIs will allow KOS become Linked Data. While early adopters exist, there is a long way to go before the potential of these approaches is fully explored and implemented in practice.
Alternatives to manual KOS-based subject indexing and classification
Although it is very unlikely that there will be approaches that would entirely replace creating quality subject metadata by humans, there are two major attempts in current research and practice aimed at adding to subject metadata created by trained subject metadata specialists: social tagging using KOS as a basis, and automated or semi-automated means. Both approaches warrant further research:
1. Social tagging involves adapting KOS for end user tagging: it needs to be determined which modifications are most likely to make KOS more useful in this context. The changes may include more definitions, better displays and algorithms providing good automated suggestions. Motivation of end users for tagging also needs to be explored further, etc.
2. Although the vendors of today's research and commercial software sellers emphasise the high potential of automated tools for subject metadata generation, real evidence of their success is so far lacking. Software tools may be useful but only in very constrained subject domains; they are unlikely to improve with research because it is essentially "hard" artificial intelligence. The difference between reported high performance results and the reality is in part due to restricting the evaluation of these tools to comparison against existing or ad hoc metadata that serves as the gold standard in laboratory-like conditions which has inherent subjectivity problems in two areas: the correct interpretation of a document’s subject matter; any evaluation of the tools is carried out in the context of a laboratory-like environment rather than a real operational system where the most commonly used measures are precision and recall. Although this issue has been discussed widely in the literature, mainstream research has not paid much attention, and published results are widely acknowledged nonetheless. However, existing human-assigned metadata cannot be used as a gold standard. For example, the classes assigned by algorithms, rather than by humans, might be wrong; alternatively, they might be right but mistakenly omitted during human indexing. Subject metadata creation involves determining subject terms or classes under which a document should be found: this goes beyond simply capturing what the document is about to what the document could be used for; algorithms might find such terms, given a good training set, but human indexers who are not well trained might miss them.
Improvements to KOS
There are a number of areas in which existing KOS could be improved. One approach is to simplify complex KOS that are intended for use in the first instance by librarians and trained end users in a paper environment, for the benefit of non-specialists and for use on the Web. This should also include hierarchy browsing at different levels, hyperlinks for relationships, searching for compounds containing any combination of elemental concepts, adjustments for social tagging applications, etc. Replacing complex built-in concepts, which are present in some KOS, with a structure based on facets, would allow greater flexibility in building new specific concepts at the time of searching as required by the end-user and at the same time reduce the size of the KOS.
Another approach is to enrich one KOS with the benefits of other types of KOS. For example, enriching typical thesauri with hierarchical structure would enable their use both for searching and for browsing. Moreover, empowering end users in searching collections of ever increasing magnitudes, with performance far exceeding plain free-text searching, and developing systems that not only find but also process information, requires far more powerful and complex KOS: thus enriching thesauri with the characteristics of ontologies would be highly beneficial in such applications.
The slow maintenance and updating of some KOS is an issue for end-users who cannot find new concepts and terms or who cannot find out how to use them because of outdated structures, hierarchies and similar. A major reason why updating has been slow is that it would require re-indexing and re-classification of existing collections, which implies expensive re-shelving in libraries; changing the structure would also cause problems for end-users as they would have to learn the new structures when browsing either online or in a physical collection.
KOS do not simply represent the information, but also construct that information. For example, while existing classification schemes are intended to be universal, they are actually culturally specific (e.g. the Chinese Library Classification, BBK in the former Soviet Union). In the Dewey Decimal Classification, the most widespread classification system in the world, regional variants had to be introduced as a compromise. In KOS there persists a historical bias on the basis of gender, sexuality, race, age, ability, ethnicity, language and religion, which limits the representation of diversity and effective library service for diverse populations. Now used globally and in interoperable systems, the KOS should be restructured in order to address these issues in a modern context: this once again implies re-classification and re-indexing efforts which are expensive in themselves, and getting the end users to re-learn the KOS they have been used to.