We don't know whether the information we find on the Web is accurate or not. The Dublin Core model describes a resource for the purpose of discovery. The W3C PROV model describes entities and processes involved in producing and delivering that resource. This article introduces the mapping between both models.
Rationale for Mapping DC Terms to PROV:
- This mapping gives insight into the different characteristics of both data models (in particular it explains PROV from a Dublin Core point of view).
- This mapping can be used to extract PROV data from the large amount of Dublin Core data available on the Web today.
- This mapping can translate PROV data to Dublin Core and make it accessible for applications that understand Dublin Core.
- This mapping can lower the barrier to entry for PROV adoption. Simple Dublin Core statements can be used as starting point for PROV data generation.
The Dublin Core model describes a (1)common set of terms which can be used to (2)describe a resource for the (3)purpose of discovery.
The Dublin Core Metadata Element Set contains 15 metadata terms and is endorsed by multiple standards agencies. The full set of Dublin Core metadata terms can be found on the Dublin Core Metadata Initiative (DCMI) website
DCMI traces its roots to Chicago at the 2nd International World Wide Web Conference, October 1994. A hallway conversation on semantics and the Web revolved around the difficulty of finding resources (difficult even then, with only about 500,000 addressable objects on the Web).
Their initial brainstorming lead to NCSA and OCLC holding a joint workshop to discuss metadata semantics in Dublin, Ohio, March 1995. At this event, called simply the "OCLC/NCSA Metadata Workshop", more than 50 people discussed how a core set of semantics for Web-based resources would be extremely useful for categorizing the Web for easier search and retrieval. They dubbed the result "Dublin Core metadata" based on the location of the workshop.
Work on the core elements originated in 1995. The Dublin Core became ISO 15836 standard in 2006.
Provenance of a resource is a record that describes entities and processes involved in producing and delivering or otherwise influencing that resource. Provenance provides a critical foundation for assessing authenticity, enabling trust, and allowing reproducibility. Provenance assertions are a form of contextual metadata and can themselves become important records with their own provenance.
"At the toolbar (menu, whatever) associated with a document there is a button marked "Oh, yeah?". You press it when you lose that feeling of trust. It says to the Web, 'so how do I know I can trust this information?'. The software then goes directly or indirectly back to metainformation about the document, which suggests a number of reasons."
Tim Berners-Lee, W3C Chair, Web Design Issues, September 1997
"The problem is - and this is true of books and every other medium - we don't know whether the information we find [on the Web] is accurate or not. We don't necessarily know what its provenance is. So we have to teach people how to assess what they've found. [...] there's so much juxtaposition of the good stuff and not-so-good stuff and flat-out-wrong stuff or deliberate misinformation or plain ignorance."
Vinton Cerf, Internet pioneer, in Smithsonian's "40 Things you need to know about the next 40 years" issue, July, 2010
The W3C working group started in 2011. Completion is scheduled for 2013.
Provenance in DC Terms
The DC Terms set contains 25 elements that can be broadly classified as “provenance” related. A further sub-categorization of these provenance related terms is helpful.
- Terms that answer who affected a change (Who? The agent)
- Terms that answer to when a change was affected (When? The time)
- Terms that answer to how a change was affected (How? The derivation)
The complete mapping is given below.
Categorization of the Dublin Core Terms
||abstract, accessRights, accrualMethod, accrualPeriodicity, accrualPolicy, alternative, audience, bibliographicCitation, conformsTo, coverage, description, educationLevel, extent, hasPart, isPartOf, format, identifier, instructionalMethod, isRequiredBy, language, mediator, medium, relation, requires, spatial, subject, tableOfContents, temporal, title, type
||contributor, creator, publisher, rightsHolder
||available, created, date, dateAccepted, dateCopyrighted, dateSubmitted, issued, modified, valid
||isVersionOf, hasVersion, isFormatOf, hasFormat, license, references, isReferencedBy, replaces, isReplacedBy, rights, source
DC-TERMS also contains the broad term “provenance”, which overlaps with many of the existing terms, and has a formal definition that corresponds to the formal definition of provenance for artworks.
The yellow slices correspond to provenance related terms. The large blue slice corresponds to the descriptive (non provenance) terms.
Mapping DC Terms to PROV
The mapping is best viewed on the “Dublin Core to PROV Mapping” page.