Toward a Basic Profile for Linked Data

A collection of best practices and a simple approach for a Linked Data architecture

W3C defines a wide range of standards for the Semantic Web and Linked Data suitable for many possible use cases. While using Linked Data as an application integration technology in the Application Lifecycle Management (ALM) domain, IBM has found that there are often several possible ways of applying the existing standards, yet little guidance is provided on how to combine them. This article explains motivating background information and a proposal for a Basic Profile for Linked Data.

Martin Nally (nally@us.ibm.com), Chief Technical Officer, IBM Rational Software, IBM

Martin Nally, an IBM Fellow, is Vice President and the Chief Technology Officer for the Rational software division of IBM. Martin joined IBM in 1990 with 10 years' of prior industry experience. He has held several architecture and development positions in IBM, including lead architect and developer for IBM VisualAge/Smalltalk and VisualAge/Java. Martin was one of a team of three that launched the IBM project that later became the Eclipse framework. He then led the architecture, design, and development of WebSphere Studio, which evolved into Rational Application Developer. More recently, he has been one of the champions behind moving the Rational portfolio to a web-based architecture and was instrumental in creating Open Services for Lifecycle Collaboration, an integration architecture, and Jazz technology, a set of common services used to combine IBM and non-IBM tools to create an integrated system.



Steve Speicher (sspeiche@us.ibm.com ), IBM Senior Technical Staff Member, OSLC Lead Architect, IBM

Photo of Steve SpeicherSteve Speicher is an IBM Senior Technical Staff Member who focuses on Rational change management solutions and integrations. He is the lead for the Open Services for Lifecycle Collaboration (OSLC) Core and Change Management topic areas, which delivers open HTTP REST and Linked Data specifications, as well as implementations within the Rational change management products. Steve formerly worked in emerging standardization efforts in healthcare and compound documents (W3C).



06 December 2011

Also available in Russian Portuguese

Update
In March 2012, IBM and its partners submitted the Linked Data Basic Profile specification to W3C.

Motivation

There is interest in using Linked Data technologies for more than one purpose. We have seen interest in it to expose information -- public records, for example -- on the Internet in a machine-readable format. We have also seen interest in using it for inferring new information from existing information, for example in pharmaceutical applications or IBM Watson™ (see the Resources section for links to more information). The IBM® Rational® team has been using Linked Data as an architectural model and implementation technology for application integration.

Rational software is a vendor of software development tools, particularly those that support the general software development process, such as bug tracking, requirements management, and test management tools. Like many vendors that sell multiple applications, we have seen strong customer demand for better support of more complete business processes (in our case, software development processes) that span the roles, tasks, and data addressed by multiple tools. This demand has existed for many years, and our industry has tried several different architectural approaches to address the problem. Here are a few:

  • Implement some sort of application programming interface (API) for each application, and then, in each application, implement "glue code" that exploits the APIs of other applications to link them together.
  • Design a single database to store the data of multiple applications, and implement each of the applications against this database. In the software development tools business, these databases are often called "repositories."
  • Implement a central "hub" or "bus" that orchestrates the broader business process by exploiting the APIs described previously.

A discussion of the failings of each of these approaches is beyond the scope of this article, but it is fair to say that, although each one of those approaches has its adherents and can point to some successes, none of them is wholly satisfactory. So, as an alternative, over the last five years we have been exploring the use of Linked Data as an application integration technology. We have shipped a number of products using this technology and are generally pleased with the result. We have more products in development that use these technologies, and we are also seeing a strong interest in this approach in other parts of our company.

Although we are pleased -- even passionate -- about the results that we have seen using Linked Data as an integration technology, but we have found successful adoption to be difficult. It has taken us several years of experimentation to achieve the level of understanding that we have today. We have made some costly mistakes along the way, and we see no immediate end to the challenges and learning that lie before us.

As far as we can tell, there are not many people who are trying to use Linked Data technologies in the ways that we are using them, and the little information that is available on best practices and pitfalls is widely dispersed. We believe that Linked Data has the potential to solve some important problems that have frustrated the IT industry for many years, or at least to make significant advances in that direction. But this potential will be realized only if we can establish and communicate a much richer body of knowledge about how to exploit these technologies. In some cases, there also are gaps in the Linked Data standards that need to be addressed.

To help with this process, we would like to share information about how we are using these technologies, the best practices and anti-patterns that we have identified, and the specification gaps that we have had to fill. These best practices and anti-patterns can be classified according to (but are not limited to) the following categories:

Resources
A summary of the HTTP and RDF standard techniques and best practices that you should use, and anti-patterns you should avoid, when constructing clients and servers that read and write Linked Data
 
Containers
Defines resources that allow new resources to be created using HTTP POST and existing resources to be found using HTTP GET
 
Paging
Defines a mechanism for splitting the information in large resources into pages that can be fetched incrementally
 
Validation
Defines a simple mechanism for describing the properties that a particular type of resource must or may have
 

The following sections provide details regarding this proposal for a Basic Profile for Linked Data.


Related work

The intention of this article is to promote ideas and motivate specification efforts in, potentially, numerous communities. These efforts are related to this proposal:

W3C Linked Enterprise Data Patterns Workshop
This proposal is intended to elaborate on what is seen as missing or needed, as discussed in an IBM position paper presented at the workshop.

Open Services for Lifecycle Collaboration (OSLC)
The OSLC Core v2 specification defines some of these patterns and anti-patterns, although perhaps not in an ideal way. This proposal can provide the basis for a simpler and more standards-aligned way for future OSLC specifications.


Terminology

These definitions are based on W3C's Architecture of the World Wide Web and Hyper-text Transfer Protocol, HTTP/1.1 (see Resources).

Link
A relationship between two resources when one resource (representation) refers to the other resource by means of a URI. (reference: WWWArch)
 
Linked Data
Defined by Tim Berners-Lee as four rules:
  1. Use URIs as names for things.
  2. Use HTTP URIs so that people can look up those names.
  3. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL).
  4. Include links to other URIs so that they can discover more things.(reference: LinkedData).
 
Specification
An act of describing or identifying something precisely or of stating a precise requirement
 
Basic Profile
A specification that defines the specification components needed from other specifications, plus provides clarifications and patterns
 
Client
A program that establishes connections for the purpose of sending requests (reference: HTTP)
 
Basic Profile Client
A client that adheres to the rules defined in the Basic Profile
 
Server
An application program that accepts connections in order to service requests by sending back responses

Note: Any given program can be capable of being both a client and a server. Our use of these terms refers only to the role being performed by the program for a particular connection, rather than to the program's capabilities in general. Likewise, any server can act as an origin server, proxy, gateway, or tunnel, switching behavior based on the nature of each request (reference: HTTP).
 
Basic Profile Server
A server that adheres to the rules defined in the Basic Profile
 

Basic Profile Resources

Basic Profile Resources are HTTP Linked Data resources that conform to simple patterns and conventions. Most Basic Profile Resources are domain-specific resources that contain data for an entity in a domain, and that domain can be commercial, governmental, scientific, religious, or another type. A few Basic Profile Resources are defined by the Basic Profile specifications and are cross-domain. All Basic Profile Resources follow the rules of Linked Data previously cited in the Terminology section:

  1. Use URIs as names for things.
  2. Use HTTP URIs so that people can look up those names.
  3. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL).
  4. Include links to other URIs so that people can discover more things.

Basic Profile adds a few rules. Some of these rules could be thought of as clarification of the basic Linked Data rules.

  1. Basic Profile Resources are HTTP resources that can be created, modified, deleted and read using standard HTTP methods.
    (Clarification or extension of Linked Data Rule 2.) Basic Profile Resources are created by HTTP POST (or PUT) to an existing resource, deleted by HTTP DELETE, updated by HTTP PUT or PATCH, and "fetched" using HTTP GET. Additionally, Basic Profile Resources can be created, updated, and deleted by using SPARQL Update.
  2. Basic Profile Resources use RDF to define their states.
    (Clarification of Linked Data Rule 3.) The state of a Basic Profile Resource (in the sense of state used in the REST architecture) is defined by a set of RDF triples. Binary resources and text resources are not Basic Profile Resources since their states cannot be easily or fully represented in RDF. XML resources might or might not be suitable as Basic Profile Resources. Some XML resources are really data-oriented resources encoded in XML that can be easily represented in RDF. Other XML documents are essentially marked up text documents that are not easily represented in RDF. Basic Profile Resources can be mixed with other resources in the same application.
  3. You can request an RDF/XML representation of any Basic Profile Resource.
    (Clarification of Linked Data Rule 3.) The resource might have other representations, as well. These could be other RDF formats, such as Turtle, N3, or NTriples, but non-RDF formats such as HTML and JSON would also be popular additions, and Basic Profile sets no limits.
  4. Basic Profile clients use Optimistic Collision Detection during update.
    (Clarification of Linked Data Rule 2.) Because the update process involves getting a resource first, and then modifying it and later putting it back on the server, there is the possibility of a conflict (for example, another client might have updated the resource since the GET action). To mitigate this problem, Basic Profile implementations should use the HTTP If-Match header and HTTP ETags to detect collisions.
  5. Basic Profile Resources use standard media types.
    (Clarification of Linked Data Rule 3.) Basic Profile does not require and does not encourage the definition of any new media types. A Basic Profile goal is that any standards-based RDF or Linked Data client be able to read and write Basic Profile data, and defining new media types would prevent that in most cases.
  6. Basic Profile Resources use standard vocabularies.
    Basic Profile Resources use common vocabularies (classes, properties, and so forth) for common concepts. Many websites define their own vocabularies for common concepts such as resource type, label, description, creator, last modification time, priority, enumeration of priority values, and so on. This is usually viewed as a good feature by users who want their data to match their local terminology and processes, but it makes it much harder for organizations to subsequently integrate information in a larger view. Basic Profile requires all resources to expose common concepts using a common vocabulary for properties. Sites can choose to additionally expose the same values under their own private property names in the same resources. In general, Basic Profile avoids inventing property names where possible. Instead, it uses ones from popular RDF-based standards, such as the RDF standards themselves, Dublin Core, and so on. Basic Profile invents property URLs where no match is found in popular standard vocabularies. Note: A number of recommended standard properties for use in Basic Profile Resources are listed below.
  7. Basic Profile Resources set rdf:type explicitly.
    A resource's membership in a class extent can be derived implicitly or indicated explicitly by a triple in the resource representation that uses the rdf:type predicate and the URL of the class or derived implicitly. In RDF, there is no requirement to place an rdf:type triple in each resource, but this is a good practice, because it makes a query more useful in cases where inferencing is not supported. Remember also that a single resource can have multiple values for rdf:type. For example, the dpbedia entry for Barack Obama has dozens of rdf:types. Basic Profile sets no limits to the number of types a resource can have.
  8. Basic Profile Resources use a restricted number of standard data types.
    RDF does not define data types to be used for property values, so Basic Profile lists a set of standard datatypes to be used in Basic Profile:
    • Boolean: A Boolean type as specified by XSD Boolean http://www.w3.org/2001/XMLSchema#boolean, reference: XSD Datatypes.
    • DateTime: A Date and Time type as specified by XSD dateTime http://www.w3.org/2001/XMLSchema#dateTime, reference: XSD Datatypes
    • Decimal: A decimal number type as specified by XSD Decimal http://www.w3.org/2001/XMLSchema#decimal, reference: XSD Datatypes
    • Double: A double floating point number type as specified by XSD Double http://www.w3.org/2001/XMLSchema#double, reference: XSD Datatypes.
    • Float: A floating point number type as specified by XSD Float http://www.w3.org/2001/XMLSchema#float, reference: XSD Datatypes.
    • Integer: An integer number type as specified by XSD Integer http://www.w3.org/2001/XMLSchema#integer, reference: XSD Datatypes.
    • String: A string type as specified by XSD String http://www.w3.org/2001/XMLSchema#string, reference: XSD Datatypes.
    • XMLLiteral: A literal XML value
      http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral
  9. Basic Profile clients expect to encounter unknown properties and content.
    Basic Profile provides mechanisms for clients to discover lists of expected properties for resources for particular purposes, but it also assumes that any given resource might have many more properties than those listed. Some servers will support only a fixed set of properties for a particular type of resource. Clients should always assume that the set of properties for a resource of a particular type at an arbitrary server might be open, in the sense that different resources of the same type might not all have the same properties, and the set of properties that are used in the state of a resource is not limited to any predefined set. However, when dealing with Basic Profile Resources, clients should assume that a Basic Profile server might discard triples for properties when it has prior knowledge. In other words, servers can restrict themselves to a known set of properties, but clients cannot. When doing an update using HTTP PUT, a Basic Profile client must preserve all property values retrieved by using HTTP GET. This includes all property values that it doesn't change or understand. (Use of HTTP PATCH or SPARQL Update rather than HTTP PUT for updates avoids this burden for clients.)
  10. Basic Profile clients do not assume the type of a resource at the end of a link.
    Many specifications and most traditional applications have a "closed model," by which we mean that any reference from a resource in the specification or application necessarily identifies a resource in the same specification (or a referenced specification) or application. In contrast, the HTML anchor tag can point to any resource addressable by an HTTP URI, not just other HTML resources. Basic Profile works like HTML in this sense. An HTTP URI reference in one Basic Profile Resource can, in general, point to any resource, not just a Basic Profile Resource. There are numerous reasons to maintain an open model like HTML's. One is that it allows data that has not yet been defined to be incorporated in the web in the future. Another reason is that it allows individual applications and sites to evolve over time. If clients assume that they know what will be at the other end of a link, then the data formats of all resources across the transitive closure of all links must be kept stable for version upgrade.

    A consequence of this independence
    is that client implementations that traverse HTTP URI links from one resource to another should always code defensively and be prepared for any resource at the end of the link. Defensive coding by client implementers is necessary to allow sets of applications that communicate through Basic Profile to be independently upgraded and flexibly extended.
  11. Basic Profile servers implement simple validations for Create and Update.
    Basic Profile servers should try to make it easy for programmatic clients to create and update resources. If Basic Profile implementations associate a lot of very complex validation rules that need to be satisfied for an update or creation to be accepted, it becomes difficult or impossible for a client to use the protocol without extensive additional information specific to the server that needs to be communicated outside of the Basic Profile specifications. The recommended approach is for servers to allow creation and updates based on the sort of simple validations that can be communicated programmatically through a Shape (see the Constraints section). Additional checks that are required to implement more complex policies and constraints should result in the resource being flagged as requiring more attention, but should not cause the basic Create or Update action to fail.

    It is possible that some applications or sites will have very strict requirements for complex constraints for data and that they are unable or unwilling to even temporarily allow the creation of resources that do not satisfy all of those constraints. Those applications or sites need to be aware that, as a consequence, they might be making it difficult or impossible for external software to use their interfaces without extensive customization.
  12. Basic Profile Resources always use simple RDF predicates to represent links.
    By always representing links as simple predicate values, Basic Profile makes it very simple to know how links will appear in representations and also makes it very simple to query them. When there is a need to express properties on a link, Basic Profile adds an RDF statement with the same subject, object, and predicate as the original link, which is retained, plus any additional "link properties." Basic Profile Resources do not use "inverse links" to support navigation of a relationship in the opposite direction, because this creates a data synchronization problem and complicates a query. Instead, Basic Profile assumes that clients can use queries to navigate relationships in the opposite direction from the direction supported by the underlying link.

Common properties

The tables that follow list properties from well-known RDF vocabularies that are recommended for use in Basic Profile Resources. Basic Profile requires none of them, but a specification based on Basic Profile might require one or more of these properties for a particular type of resource.

Commonly used namespace prefixes
PrefixNamespace URI
dcterms http://purl.org/dc/terms/
rdf http://www.w3.org/1999/02/22-rdf-syntax-ns#
rdfs http://www.w3.org/2000/01/rdf-schema#
bp http://open-services.net/ns/basicProfile#
xsd http://www.w3.org/2001/XMLSchema#

From Dublin Core

URI: http://purl.org/dc/terms/

PropertyRangeComment
dcterms:contributor dcterms:Agent The identifier of a resource (or blank node) that is a contributor of information. This resource can be a person or group of people or, possibly, an automated system.
dcterms:creator dcterms:Agent The identifier of a resource (or blank node) that is the original creator of the resource. This resource can be a person or group of people or, possibly, an automated system.
dcterms:created xsd:dateTime The creation timestamp.
dcterms:description rdf:XMLLiteral Descriptive text about the resource represented as rich text in XHTML format. Should include only content that is valid and suitable inside an XHTML <div> element.
dcterms:identifier rdfs:Literal A unique identifier for the resource. Typically read-only and assigned by the service provider when a resource is created. Not typically intended for end-user display.
dcterms:modified xsd:dateTime Date on which the resource was changed.
dcterms:relation rdfs:Resource The URI of a related resource. This is the predicate to use when you do not know what else to use. If you know what kind of relationship it is, use a more specific predicate.
dcterms:subject rdfs:Resource Should be a URI (see dbpedia.org). From Dublin Core: "Typically, the subject will be represented using keywords, key phrases, or classification codes. Recommended best practice is to use a controlled vocabulary. To describe the spatial or temporal topic of the resource, use the Coverage element."
dcterms:title rdf:XMLLiteral A name given to the resource. Represented as rich text in XHTML format. Should include only content that is valid inside an XHTML <span> element.

From RDF

URI: http://www.w3.org/1999/02/22-rdf-syntax-ns#

PropertyRangeComment
rdf:type rdfs:Class The type or types of the resource. Basic Profile recommends that the rdf:type(s) of a resource be set explicitly in resource representations to facilitate query with non-inferencing query engines.

From RDF Schema

URI: http://www.w3.org/2000/01/rdf-schema#

PropertyRangeComment
rdfs:member rdf:Resource The URI (or blank node identifier) of a member of a Container.
rdfs:label rdf:Resource "Provides a human-readable version of a resource name." (From RDFS)

Basic Profile Container

Many HTTP applications and sites have organizing concepts that partition the overall space of resources into smaller Containers. Blog posts are grouped into blogs, wiki pages are grouped into wikis, and products are grouped into catalogs. Each resource created in the application or site is created within an instance of one of these Container-like entities, and users can list the existing artifacts within one. There is no agreement across applications or sites, even within a particular domain, on what these grouping concepts should be called, but they commonly exist and are important. Containers answer two basic questions:

  1. To which URLs can I POST to create new resources?
  2. Where can I GET a list of existing resources?

In the XML world, Atom Publishing Protocol (APP) has become popular as a standard for answering these questions. APP is not a good match for Linked Data, because this Basic Profile shows how the same problems that are solved by APP for XML-centric designs can be solved by a simple Linked Data usage pattern with simple conventions for posting to RDF Containers. We call these RDF Containers that you can POST to Basic Profile Containers. Here are some of their characteristics:

  • Clients can retrieve the list of existing resources in a Basic Profile Container.
  • New resources are created in Basic Profile Containers by POSTing to them.
  • Any resource can be POSTed to a Basic Profile Container. A resource does not have to be a Basic Profile Resource with an RDF representation to be POSTed to a Basic Profile Container.
  • After POSTing a new resource to a Container, the new resource will appear as a member of the Container until it is deleted. A Container can also contain resources that were added through other means, for example through the user interface of the site that implements the Container.
  • The same resource can appear in multiple Containers. This happens commonly if one Container is a "view" onto a larger Container.
  • Clients can get partial information about a Basic Profile Container without retrieving a full representation of all of its contents.

The representation of a Basic Profile Container is a standard RDF Container representation that uses the rdfs:member predicate. For example, if you have a Container with the URL http://example.org/BasicProfile/container1, it might have the representation shown in Listing 1.

Listing 1. Representation of a Basic Profile Container
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
<http://example.org/BasicProfile/container1>
        a rdfs:Container;
        rdfs:member <http://acme.com/members/000000000>;
        # … 999999998 more triples here …
        rdfs:member <http://acme.com/members/999999999>.

The Basic Profile does not recognize or recommend the use of other forms of an RDF Container, such as Bag and Seq, because they are not friendly to query. This follows standard Linked Data guidance for RDF use.

The Basic Profile recommends the use of a set of standard Dublin Core properties with Containers. The subject of triples using these properties is the Container itself.

rdfs:Container domain properties
PropertyOccursRangeComment
dcterms:title zero or one rdf:XMLLiteral A name given to the resource. Represented as rich text in XHTML format. Should include only content that is valid inside an XHTML <span> element.
dcterms:description zero or one rdf:XMLLiteral Descriptive text about resource represented as rich text in XHTML format. Should include only content that is valid and suitable inside an XHTML <div> element.
dcterms:publisher zero or one dcterms:Agent An entity responsible for making the Basic Profile Container and its members available.
bp:containerPredicate exactly one rdfs:Property The predicate of the triples whose objects define the contents of the Container.

Retrieving non-member properties

The representation of a Container that has many members will be large. When we looked at our use cases, we saw that there were several important cases where clients needed to access only the non-member properties of the Container. (The dcterms properties listed in this page might not seem important enough to warrant addressing this problem, but we have use cases that add other predicates to Containers, such as for providing validation information and associating SPARQL end points for example.) Because retrieving the whole Container representation to get this information is onerous, we were motivated to define a way to retrieve only the non-member property values. We do this by defining a corresponding resource for each Basic Profile Container, called the "non-member resource," which has a state that is a subset of the state of the Container. The non-member resource's HTTP URI can be derived in the following way:

If the HTTP URI of the Container is {url}, then the HTTP URI of the related non-member resource is {url}?non-member-properties. The representation of {url}?non-member-properties is identical to the representation of {url}, except that the membership triples are missing. The subjects of the triples will still be {url} (or whatever they were in the representation of {url}), not {url}?non-member-properties. Any server that does not support non-member-resources should return an HTTP 404 File Not Found error when a non-member-resource is requested.

This approach is analogous to using HTTP HEAD rather that HTTP GET. The difference is that HTTP HEAD is used to fetch the response headers for a resource, as opposed to requesting the entire representation of a resource using HTTP GET. Listing 1 shows an example.

Listing 2. HTTP GET example, request
GET /container1?non-member-properties HTTP/1.1
HOST: example.org 
Accept: text/turtle
Listing 3. HTTP GET example, response
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix dcterms: <http://purl.org/dc/terms/>. 
@prefix bp: <http://open-services.net/ns/basicProfile#>.
<http://example.org/container1>
        a rdfs:Container;
        dcterms:title "An Basic Profile Container of Acme Resources";
        bp:containerPredicate rdfs:member;
        dcterms:publisher <http://acme.com/>.

Design motivation and background

The concept of non-member resources has not been especially controversial, but using the URL pattern {url}?non-member-properties to identify them has been. Some people feel it's an unacceptable intrusion into the URL space that is owned and controlled by the server that defines {url}. A more practical objection is that servers respond unpredictably to URLs that they do not understand, especially those that have a ? character in them. For example, some servers will return the resource identified by the portion of the URL that precedes the ? and simply ignore the rest.

This problem could perhaps be mitigated by using a character other than ? in the URL pattern. An alternative design that was discussed uses a header field in the response header of {url} to allow the server to control and communicate the URL of the corresponding non-member-resource. Presence or absence of the header field would let clients know whether the non-member resource is supported by the server.

  • The advantages of this approach are that it does not impinge on the server's URL space and that it works predictably for servers that do not understand the concept of a non-member-resource.
  • The disadvantages are that it requires two server round-trips, a HEAD and a GET, to retrieve the non-member resources, and it requires the definition of a custom HTTP header, which, to some people at least, seems comparatively heavyweight.

Additional considerations

Basic Profile Containers should provide guidance in these situations:

  • When dcterms:modified or Etag changes, or both, when Container membership changes to effectively allow for caching of Containers
  • When there are membership limitations (typically, a resource will only be part of a single Container, although there might be exceptions)

Basic Profile validation and constraints

Basic Profile resources are RDF resources, and RDF has the happy characteristic that "it can say anything about anything." This means that, in principle, any resource can have any property and there is no requirement that any two resources have the same set of properties, even if they have the same type or types. In practice, though, the properties that are set on resources usually follow regular patterns that are dictated by the uses of those resources. Although a particular resource might have arbitrary properties, when viewed from the perspective of a particular application or use case, the set of properties and property values that are appropriate for that resource in that application will often be predictable and constrained. For example, if a server has resources that represent software products and bugs, for the purposes of displaying information in tabular formats, creating and updating resources, or other purposes, a client might want to know what properties software products and bugs have on that server,. The Basic Profile Validation and Constraints specification aims to capture information about those properties and constraints.

The distinction between the resource and the use cases that it participates in is important to us. Traditional technologies such as relational databases constrain the total set of properties that an entity can have. In the Basic Profile, we aim only to define the properties that a resource can have when viewed through the lens of a particular application or use case, yet retaining the ability of the same resource to have an arbitrary set of properties to support other applications and use cases.

The set of properties that a resource can or will have is not necessarily linked to its type, but exploiting the pattern where resources of the same type have the same properties is a very traditional approach that supports the development of many useful applications. Sometimes, knowledge of types and properties for the application is hard-coded in software, but there are many cases where it is desirable to represent this knowledge in data. The Basic Profile provides resource types called Shape and PropertyConstraint to represent this data.

Note on the relationship of Shape to other standards:
Although we're all very familiar from relational databases and object-oriented programming with the model where the valid properties are constrained by the type, it is not the "natural" model of RDF, nor is it the model of the natural world. The familiar model says that if you are of type X, you will have these properties that will have values of certain types. RDF and, to a large degree, the natural world work the other way around; if you have these properties, you must be of type X. We are not aware of any OWL or RDFS construct that lets you say "from the perspective of application X, resources with an RDF type of Y will have the list of properties Z," nor of constraining the types of the values of these properties.

Class: PropertyConstraint

URI: http://open-services.net/ns/basicProfile#PropertyConstraint

bp:PropertyConstraint domain properties

PropertyOccursRangeComment
rdfs:label zero or one rdfs:Literal A human-readable name for the subject. (from rdfs)
rdfs:comment zero or one rdfs:Literal A description of the subject resource. (from rdfs)
bp:constrainedProperty exactly one rdfs:Property The URI of the predicate being constrained.
bp:rangeShape zero or one bp:Shape A bp:Shape that describes the rdfs:Class that is range of the property.
bp:allowedValue zero or many range of the subject A value allowed for the property. If there are both bp:allowedValue elements and an bp:AllowedValue resource, then the full set of allowed values is the union of both.
bp:AllowedValues zero or many bp:AllowedValues A resource with allowed values for the property being defined.
bp:defaultValue zero or one range of the object A default value for the property
bp:occurs exactly one rdfs:Resource Must be one of these three:
http://open-service.net/ns/basicProfile#Exactly-one
orhttp://open-service.net/ns/ basicProfile#Zero-or-one, http://open-service.net/ns/basicProfile#Zero-or-many
or http://open-service.net/ns/ basicProfile#One-or-many
bp:readOnly zero or one Boolean true if the property is read-only. If not set or set to false, then the property is writable. Providers should declare a property read-only when changes to the value of that property will not be accepted on PUT. Consumers should note that the converse does not apply: Providers may reject a change to the value of a writable property.
bp:maxSize zero or one Integer For String properties only, specifies maximum characters allowed. If not set, then there is no maximum or maximum is specified elsewhere.
bp:valueType zero or one rdfs:Resource For literals, see XSD Datatypes.

It is debatable whether we should have a separate bp:PropertyConstraint class with a property on it called bp:constrainedProperty, or whether it would be better to use rdfs:Property and simply define new predicates with rdfs:Property as the domain.

Important:

However, it is important not to use rdfs:range, because the semantics are different.

Class: bp:AllowedValues

URI: http://open-services.net/ns/basicProfile#AllowedValues

bp:AllowedValues domain properties

PropertyOccursRangeComment
bp:allowedValue zero or many same as range of owning property Allowed value

Class: bp:Shape

URI: http://open-services.net/ns/basicProfile#Shape

bp:Shape domain properties

PropertyOccursRangeComment
dcterms:title zero or one rdfs:XMLLiteral Title
bp:describedClass exactly one rdfs:Class Class described
bp:propertyConstraints zero or one rdfs:List The list of propertyConstraints for properties of this Shape. The domains of the PropertyConstraints must be compatible with the describedClass.

Validation semantics

Validation semantics are expressed by mapping the property and class definitions in terms of SPARQL ASK semantics. This enables a declarative way in RDF to define the constraints while using the existing SPARQL ASK specification.

Associating Shapes and Containers

It is useful to be able to specify for a Container what types of members it will return and accept, plus what properties it expects to be used with resources of those types. To enable this, the Basic Profile defines two new Container properties, which are shown Table 9.

rdfs:Container domain properties
PropertyOccursRangeComment
bp:createShape zero or many bp:Shape One or more Shapes that provide information on the expected data formats of resources that can be POSTed to the Container to create new members.
bp:readShape zero or many bp:Shape One or more Shapes that provide information on the expected data formats of resources that can be found as members of the Container.
Containers often add properties of their own to POSTed and PUT resources (creation date, modification date, creator), and it's useful for clients to know what these might be.

Basic Profile paging

It sometimes happens that a resource is too large to reasonably transmit its representation in a single HTTP response. A client might anticipate that a resource will be too large (for example, a client tool that accesses defects might assume that an individual defect will usually be of sufficiently constrained size that it makes sense to request all of it at once, but that the list of all the defects ever created will typically be too big). Alternatively, a server might recognize that a resource that has been requested is too big to return in a single message.

To address this problem, Basic Profile Resources can support a technique called paging that enables clients to retrieve representations of resources one page at a time. For every resource with a URL of {url}, a Basic Profile implementation might define a companion resource with a URL of {url}?firstPage. The meaning of this resource is: the first page of {url}. Clients that anticipate that a particular resource will be too large might, instead, fetch this alternative resource. Servers that determine that a requested resource is too large might respond with a 302 redirect message, directing the client to the firstPage resource.

The representation of {url}?firstPage will contain a subset of the triples that define the state of the resource with a URL of {url}. The triples are unmodified, so the subject of the triples will be whatever it was in the representation of {url}, typically {url}, not {url}?firstPage. In addition, the representation of {url}?firstPage will include a few triples with a subject of {url}?firstPage. Examples are triples with predicates of bp:nextPage, dcterms:description, and so on.

For example, if you have a Basic Profile Container with the URL of http://acme.com/BasicProfile/container/1, it might have the following representation (in Turtle notation):

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>. 
http://acme.com/BasicProfile/container/1
        rdfs:member <http://acme.com/BasicProfile/resource/000000000>; 
        # ... 999999998 more triples here … 
        rdfs:member <http://acme.com/BasicProfile/resource/999999999>.

This representation has a billion triples and over 90 billion characters, which might be a bit big. Assuming that the implementation that backs this resource supports paging, a client can chose instead to GET the related resource: http://acme.com/BasicProfile/container/1?firstPage. The representation of this latter resource would look like this:

 @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
  @prefix bp: <http://open-services.net/ns/basicProfile#>.
 <http://acme.com/BasicProfile/container/1> 
        rdfs:member <http://acme.com/BasicProfile/resource/000000000>; 
        # ... 98 more triples here … 
        rdfs:member <http://acme.com/BasicProfile/resource/000000099>.

# pay attention to the subject URL of the following triple
<http://acme.com/container/1?firstPage> bp:nextPage
<http://acme.com/xxxxxxxxx/page2>.

As you can see, the representation of this smaller firstPage resource contains the first 100 triples that you would have had in the representation of the large resource in exactly the same form -- the same subject, predicate, and object -- as in the representation of the large resource. In addition, it contains another triple with a subject that is the firstPage resource itself, not the bigger resource, that provides the URL of a third resource that will contain the following page of triples from the bigger resource. The format of the URLs of the second and subsequent pages (if they exist) is not defined by the Basic Profile; a Basic Profile implementation can use whichever URL it pleases. Note that, although this example shows the triples in a precise order for purposes of simplicity and clarity of the example, there is no concept of ordering of triples in RDF, so the triples can be in any order, both within and across pages. An obvious restriction is that all triples that reference the same blank node, either as subject or object, need to be in the same page (this is simply an observation on how RDF works, not a Basic Profile policy or limitation).

As illustrated above, when a page is returned, it will include the triple:
<url of current page> bp:nextPage <url of next page>

You can tell that you are on the last page when the <url of nextPage> is bp:nilPage.

By the time a client follows a bp:nextPage link, there might no longer be a next page. The Basic Profile server implementation in this case must respond with an empty page with bp:nextPage set to bp:nilPage.

The Basic Profile permits {url}?pageSize={n} as an alias for {url}?firstPage. Because it is just an alias, it has exactly the same meaning and behavior. A Basic Profile server implementation can (but is not obliged to) adjust the number of triples on the first and subsequent pages based on the value of n.

Note that pagination is defined only for resources with states that can be expressed in RDF as a set of RDF triples. Pagination is undefined for resources with states that cannot be represented in RDF. Pure binary resources, encrypted resources, or digitally signed resources might be examples. The representation of a page is defined by, first, paginating the underlying triples that express the state of the resource being paginated, and then performing whatever standard mapping is used to map from each page of triples to the requested representation. In other words, we do not paginate the representations; we paginate the RDF resource state itself and then create the representations of each page in whatever media type is requested. This provides a general specification for pages for both RDF and non-RDF representations. Examples of non-RDF representations are HTML and JSON.

Instability of paging

Because HTTP is a stateless protocol and Basic Profile servers manage resources that can change frequently, Basic Profile clients should assume that resources can change as they page through them using the bp:nextPage mechanism. Nevertheless, each triple of the resource that exists when the first page is returned and is not subsequently deleted during the paging interaction must be included on at least one page. (Including the same triple more than once is permissible -- identical triples are always discarded in RDF -- but servers need to ensure that the same triple is not returned multiple times with different object values.) Triples that are added after the first page is returned might or might not be included in subsequent pages by a server.

Class bp:Page

URI: http://open-services.net/ns/basicProfile#Page

Table 10. bp:Page properties
PropertyOccursRangeComment
bp:nextPage exactly one bp:Page The next page or bp:nilPage if there are no more pages

Conclusion

We believe that getting to a simple Basic Profile will enable broader adoption of Linked Data principles for application integration. Additional development of some of the concepts will be necessary to complete such a profile. The intention of this article is to initiate the much-needed development of specifications that will fill this gap.


Acknowledgements

Thanks to Arthur Ryman, Arnaud Le Hors, and John Arwe and others for review, feedback, and some of this content.

Resources

Learn

Get products and technologies

  • Download a free trial version of Rational software.
  • Evaluate other IBM software in the way that suits you best: Download it for a trial, try it online, use it in a cloud environment, or spend a few hours in the SOA Sandbox learning how to implement service-oriented architecture efficiently.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Rational software on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Rational, DevOps
ArticleID=777562
ArticleTitle=Toward a Basic Profile for Linked Data
publish-date=12062011