Ontology Prototype
We (John G. Breslin and Guangyuan Piao, Unit for Social Semantics, Insight Centre for Data Analytics, NUI Galway) have created a prototype ontology for web archives based on two existing ontologies: Semantically-Interlinked Online Communities (SIOC) and the Common Data Model (CDM).
Figure 1: Initial Prototype of Web Archive Ontology, Linking to SIOC and CDM
In Figure 1, we give an initial prototype for a general web archive ontology, linked to concepts in the CDM, but allowing flexibility in terms of archiving works, media, web pages, etc. through the “Item” concept. Items are versioned and linked to each other, as well as to concepts appearing in the archived items themselves.
We have not shown the full CDM for ease of display in this document, but rather some of the more commonly used concepts. We can also map to other vocabulary terms shown in the last column of Table 1 below; some mappings and reused terms are shown in Figure 1.
Essentially, the top part of the model differentiates between the archive / storage mechanism for an item in an area (Container) on a website (Site), i.e. where it originally came from , who made it, when it was created / modified, when it was archived, the content stream, etc., and on the bottom, what the item actually is (for example, in terms of CDM, the single exemplar of the manifestation of an expression of a work).
Also, the agents who make the item and the work may differ (e.g. a bot may generate a HTML copy of a PDF publication written by Ms. Smith).
Relevant Public Ontologies
In Table 1, we list some relevant public ontologies and terms of interest. Some terms can be reused, and others can be mapped to for interoperability purposes.
Ontology Name | Overview | Why relevant? | What terms are useful? |
FRBR | For describing functional requirements for bibliographic records. | To describe bibliographic records. |
Expression Work |
FRBRoo | Express the conceptualisation of FRBR with an object-oriented methodology instead of the entity-relationship methodology, as an alternative. | In general, FRBRoo inherits all concepts of CIDOC-CRM and harmonises with it. |
ClassicalWork LegalWork ScholarlyWork Publication Expression |
BIBFrame | For describing bibliographic descriptions, both on the Web and in the broader networked world. | To represent and exchange bibliographic data. |
Work Instance Annotation Authority |
EDM | The Europeana Data Model models data in and supports functionality for Europeana, an internet portal that acts as an interface to millions of books, paintings, films, museum objects and archival records that have been digitised throughout Europe. | Complements FRBRoo with additional properties and classes. |
incorporate isDerivativeOf WebResource TimeSpan Agent Place PhysicalThing |
CIDOC- | For describing the implicit and explicit concepts and relationships used in the cultural heritage domain. | To describe cultural heritage information. |
EndofExistence Creation Time-Span |
EAC-CPF | Encoded Archival Context for Corporate Bodies, Persons and Families is used for encoding the names of creators of archival materials and related information. | Used closely in association with EAD to provide a formal method for recording the descriptions of record creators. |
lastDateTimeVerified Control Identity |
EU PO CDM | Ontology based on the FRBR model, for describing the relationships between resource types managed by the EU Publications Office and their views, according to the FRBR model. | To describe records. |
Expression Work Manifestation Agent Subject Item |
OAI-ORE | Defines standards for the description and exchange of aggregations of Web resources. | To describe relationships among resources (also used in EDM). |
aggregates Aggregation ResourceMap |
EAD | Standard used for hierarchical descriptions of archival records. | Terms are designed to describe archival records. |
audience abbreviation certainty repositorycode AcquisitionInformation ArchivalDescription |
WGS84 Geo | For describing information about spatially located things. | Terms can be used with the Place ontology for describing place information. |
lat long |
Media | For describing media resources on the Web. | To describe media contents for web archiving. |
compression format MediaType |
Places | For describing places of geographic interest. | To describe place information for events, etc. |
City Country Continent |
Event | For describing events. | To describe specific event in content. Also can be used for representing events at an administrative level. |
agent product place Agent Event |
SKOS | A common data model for sharing and linking knowledge organisation systems. | To capture similarities among ontologies and makes the relationships explicit. |
broader related semanticRelation relatedMatch Concept Collection |
SIOC | For describing social content. | Terms are general enough to be used for web archiving. |
previous_version next_version earlier_version later_version latest_version Item Container Site embed_knowledge |
Dublin Core | Provide a metadata vocabulary of “core” properties that is able to provide basic descriptive information about any kind of resource. | Fundamental terms used with other ontologies. |
creator date description identifier language publisher |
LOC METS Profile | The Metadata Encoding and Transmission Standard (METS) is a metadata standard for encoding descriptive, administrative, and structural metadata regarding objects within a digital library. The METS profile expresses the requirements that a METS document must satisfy. | To describe and organise the components of a digital object. |
controlled_ vocabularies external_schema |
DCAT and DCAT-AP | A specification based on the Data Catalogue vocabulary (DCAT) for describing public sector datasets in Europe. Its basic use case is to enable a cross-data portal search for data sets and make public sector data better searchable across borders and sectors. | Enable the exchange of description metadata between data portals. |
downloadURL accessURL Distribution Dataset CatalogRecord |
Formex | A format for the exchange of data between the Publication Office and its contractors. In particular (but not only), it defines the logical markup for documents, which are published in the different series of the Official Journal of the European Union. | Useful for annotating archived items as well for exchange purposes. |
Archived Annotation FT Note |
ODP | Ontology describing the metadata vocabulary for the Open Data Portal of the European Union. | To describe dataset portals. |
datasetType datasetStatus accrualPeriodicity DatasetDocumentation |
LOC PREMIS | Used to describe preservation metadata. | Applicable to archives. |
ContentLocation CreatingApplication Dependency |
VIAF | Virtual International Authority File is an international service designed to provide convenient access to the world’s major name authority files (lists of names of people, organisations, places, etc. used by libraries). Enables switching of the displayed form of names to the preferred language of a web user. | Useful for linking to name authority files and helping to serve different language communities in Europe. |
AuthorityAgency NameAuthority NameAuthorityCluster |
Table 1: Relevant Ontologies and Terms