Web Archive Ontology (SIOC+CDM)

Ontology Prototype

We (John G. Breslin and Guangyuan Piao, Unit for Social Semantics, Insight Centre for Data Analytics, NUI Galway) have created a prototype ontology for web archives based on two existing ontologies: Semantically-Interlinked Online Communities (SIOC) and the Common Data Model (CDM).

SIOC+CDM

Figure 1: Initial Prototype of Web Archive Ontology, Linking to SIOC and CDM

In Figure 1, we give an initial prototype for a general web archive ontology, linked to concepts in the CDM, but allowing flexibility in terms of archiving works, media, web pages, etc. through the “Item” concept. Items are versioned and linked to each other, as well as to concepts appearing in the archived items themselves.

We have not shown the full CDM for ease of display in this document, but rather some of the more commonly used concepts. We can also map to other vocabulary terms shown in the last column of Table 1 below; some mappings and reused terms are shown in Figure 1.

Essentially, the top part of the model differentiates between the archive / storage mechanism for an item in an area (Container) on a website (Site), i.e. where it originally came from , who made it, when it was created / modified, when it was archived, the content stream, etc., and on the bottom, what the item actually is (for example, in terms of CDM, the single exemplar of the manifestation of an expression of a work).

Also, the agents who make the item and the work may differ (e.g. a bot may generate a HTML copy of a PDF publication written by Ms. Smith).

Relevant Public Ontologies

In Table 1, we list some relevant public ontologies and terms of interest. Some terms can be reused, and others can be mapped to for interoperability purposes.

Ontology Name Overview Why relevant? What terms are useful?
FRBR For describing functional requirements for bibliographic records. To describe bibliographic records.
Expression

Work 
FRBRoo Express the conceptualisation of FRBR with an object-oriented methodology instead of the entity-relationship methodology, as an alternative. In general, FRBRoo “inherits” all concepts of CIDOC-CRM and harmonises with it.
ClassicalWork

LegalWork

ScholarlyWork

Publication

Expression
BIBFrame For describing bibliographic descriptions, both on the Web and in the broader networked world. To represent and exchange bibliographic data.
Work

Instance

Annotation

Authority
EDM The Europeana Data Model models data in and supports functionality for Europeana, an internet portal that acts as an interface to millions of books, paintings, films, museum objects and archival records that have been digitised throughout Europe. Complements FRBRoo with additional properties and classes.
incorporate

isDerivativeOf

WebResource

TimeSpan

Agent

Place

PhysicalThing
CIDOC-

CRM

For describing the implicit and explicit concepts and relationships used in the cultural heritage domain. To describe cultural heritage information.
EndofExistence

Creation

Time-Span
EAC-CPF Encoded Archival Context for Corporate Bodies, Persons and Families is used for encoding the names of creators of archival materials and related information. Used closely in association with EAD to provide a formal method for recording the descriptions of record creators.
lastDateTimeVerified

Control

Identity
EU PO CDM Ontology based on the FRBR model, for describing the relationships between resource types managed by the EU Publications Office and their views, according to the FRBR model. To describe records.
Expression

Work

Manifestation

Agent

Subject

Item
OAI-ORE Defines standards for the description and exchange of aggregations of Web resources. To describe relationships among resources (also used in EDM).
aggregates

Aggregation

ResourceMap
EAD Standard used for hierarchical descriptions of archival records. Terms are designed to describe archival records.
audience

abbreviation

certainty

repositorycode

AcquisitionInformation

ArchivalDescription
WGS84 Geo For describing information about spatially located things. Terms can be used with the Place ontology for describing place information.
lat

long
Media For describing media resources on the Web. To describe media contents for web archiving.
compression

format

MediaType
Places For describing places of geographic interest. To describe place information for events, etc.
City

Country

Continent
Event For describing events. To describe specific event in content. Also can be used for representing events at an administrative level.
agent

product

place

Agent

Event
SKOS A common data model for sharing and linking knowledge organisation systems. To capture similarities among ontologies and makes the relationships explicit.
broader

related

semanticRelation

relatedMatch

Concept

Collection
SIOC For describing social content. Terms are general enough to be used for web archiving.
previous_version

next_version

earlier_version

later_version

latest_version

Item

Container

Site

embed_knowledge
Dublin Core Provide a metadata vocabulary of “core” properties that is able to provide basic descriptive information about any kind of resource. Fundamental terms used with other ontologies.
creator

date

description

identifier

language

publisher
LOC METS Profile The Metadata Encoding and Transmission Standard (METS) is a metadata standard for encoding descriptive, administrative, and structural metadata regarding objects within a digital library. The METS profile expresses the requirements that a METS document must satisfy. To describe and organise the components of a digital object.
controlled_

vocabularies

external_schema
DCAT and DCAT-AP A specification based on the Data Catalogue vocabulary (DCAT) for describing public sector datasets in Europe. Its basic use case is to enable a cross-data portal search for data sets and make public sector data better searchable across borders and sectors. Enable the exchange of description metadata between data portals.
downloadURL

accessURL

Distribution

Dataset

CatalogRecord
Formex A format for the exchange of data between the Publication Office and its contractors. In particular (but not only), it defines the logical markup for documents, which are published in the different series of the Official Journal of the European Union. Useful for annotating archived items as well for exchange purposes.
Archived

Annotation

FT

Note
ODP Ontology describing the metadata vocabulary for the Open Data Portal of the European Union. To describe dataset portals.
datasetType

datasetStatus

accrualPeriodicity

DatasetDocumentation
LOC PREMIS Used to describe preservation metadata. Applicable to archives.
ContentLocation

CreatingApplication

Dependency
VIAF Virtual International Authority File is an international service designed to provide convenient access to the world’s major name authority files (lists of names of people, organisations, places, etc. used by libraries). Enables switching of the displayed form of names to the preferred language of a web user. Useful for linking to name authority files and helping to serve different language communities in Europe.
AuthorityAgency

NameAuthority

NameAuthorityCluster

Table 1: Relevant Ontologies and Terms

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s