W3C Provenance Model Primer
CraigTrim 110000G799 Visits (4272)
PROV is a core data model for provenance for building representations of the entities, peoples and processes involved in producing a piece of data or thing in the world.
The provenance of digital objects represents their origins. Provenance records contain descriptions of the entities and activities involved in producing and delivering (and otherwise influencing) a given object. PROV is meant to describe how these objects were created or delivered.
By knowing the provenance of an object, we can make determinations about how to use it.
Provenance can be used for many purposes, such as:
PROV accommodates different uses of provenance.
One perspective might focus on agent-centered provenance, that is, what entities were involved in generating or manipulating the information in question. For example, in the provenance of this blog post, we could capture the author who wrote it, the person who edited it, and the entity that published it (developerWorks).
A second perspective might focus on object-centered provenance, by tracing the origins of portions of a document to other documents. An example would be how this blog post has assembled content from the W3C PROV Model Primer, which in turn assembled content from other sources, quote of interviews with experts, and charts that visualize the PROV data model.
A third perspective one might take is on process-centered provenance, capturing the actions and steps taken to generate the information in question. For example, a chart may have been generated by invoking a service to retrieve data from a database, then extracting certain statistics from the data using some statistics package, and finally processing these results with a graphing tool.
High Level Overview
The following diagram provides a high-level overview of the structure of PROV records, limited to some key PROV concepts discussed in the primer
In PROV, physical, digital, conceptual, or other kinds of things are called entities. Provenance records can describe the provenance of entities, and an entities provenance may refer to other entities.
Activities are how entities come into existence, and how their attributes change to become new entities, often making use of previously existing entities to achieve this.
Use and Generation
Activities generate new entities. Writing a document brings the document into existence. Revising the document brings a new version into existence. Activities make use of entities. Revising a document to correct factual errors makes use of the original document as well as a list of corrections.
Agents and Responsibility
An agent has some degree of responsibility for the activity taking place. An agent is any entity that may be ascribed responsibility. When an agent has responsibility for an activity, PROV says the agent was associated with the activity. Several agents may be associated with an activity.
An agent may be acting on behalf of others, e.g., an employee on behalf of their organization, and we can express such chains of responsibility in the provenance. To represent the provenance of the above chart, we would state that the person who created the chart (Craig Trim) was involved in its creation, and the software used to create the chart (yED) was also an agent involved in that activity.
References & Further Reading