Research institutions and universities create knowledge. This knowledge is created through research and can be formatted as data sets, video recordings, or measurements of experiments. Often it is re-packaged in other formats such as books, journal articles, or reports. These research outputs are typically shared between researchers in each discipline to further knowledge in that area.
To gain the maximum benefit from this scholarly communication, the research outputs must be made as easily and widely available as possible. The almost ubiquitous presence of the web in research institutions makes it an ideal way to share such knowledge. Because this knowledge is the product created by the researchers, these products need to be managed, measured, and preserved by the institutions where they were created.
It was from these requirements that digital repositories for research outputs were born. In the early part of the 21st century, many projects were started to create these repositories. The most well-known and prevalent open source digital repository platforms include:
- DSpace, a collaboration between MIT and Hewlett Packard
- EPrints, developed at the University of Southampton School of Electronics and Computer Science
- Fedora, created by Cornell University's Digital Library Research Group
There are many other open source and commercial options for the provision of repositories for research outputs. See Resources for links to more information about the repositories listed here.
The first interoperability challenge for digital repositories is to expose their contents in a standardized way so external systems can harvest and make them available in other systems and searchable by larger federated search services. For this reason, most repository platforms implement the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). This protocol allows the contents of repositories and the metadata for each item in the repository to be harvested. Metadata is often harvested using the Dublin Core encoding, a simple format for exchanging metadata (see Resources).
As repositories grew over time, a need for standardized protocols not just for harvesting the contents of repositories, but also for depositing new content into repositories became apparent. The SWORD protocol was developed as a result of this need.
During 2005 and 2006, the open repository communities started to talk about the need for a standardized deposit interface, and some working groups were set up to discuss such a deposit API. In 2007 a small consortium of UK universities and interested parties gathered to develop this idea into a working solution.
Backed by some funding from the UK's Joint Information Systems Committee (JISC) (see Resources), the consortium undertook a process that quickly led to the development of the SWORD protocol. The first task was to document some of the use cases for a deposit API.
There are many use cases for wanting to deposit items into a repository using an API or web service rather than a native user interface. Some use cases are:
- Integrated desktop client
- Rather than having to use the repository's native web interface to perform deposits, a native desktop tool such as a word processor may have a SWORD interface built in to allow the deposit to be made directly from the tool where the content is being created.
- Multiple deposit tool
- Some resources may be eligible to be deposited into multiple repositories. One example of this is a research output that was funded by a research grant. The provider of the grant may expect the research output be deposited in their repository, the research institution where the work was undertaken may expect the same, and the researcher may wish to deposit the item in a subject-based community repository. Rather than having to perform the deposit three times, a multiple deposit tool would deposit the content automatically into each repository.
- Automated data deposit by laboratory equipment
- Automated and integrated laboratory equipment can often store the results that they have collected. A researcher may wish for the laboratory equipment to automatically deposit these collected data sets into a repository.
- Repository to repository deposit
- There are two methods to transfer resources between
repositories:
pullorpush. Repositories are able to pull content from another using the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH), or they may wish to push content into another.
- There are two methods to transfer resources between
repositories:
- Publisher to repository deposit
- Many journal publishers have copyright agreements that allow the author to deposit a copy of their article in an institutional repository. Some publishers say they wish to perform this deposit automatically on behalf of the author.
After gathering use cases, the consortium performed a survey to see if there were any existing potential solutions that could be used to fulfill the needs of a deposit API. Several candidates were identified, but these were often very specific to a use case, such as uploading photographs into a photo sharing website.
Eventually the candidate solutions were reduced to just two that seemed to provide a good fit. The first was the Atom Publishing Protocol (AtomPub), an application-level protocol for publishing and editing web resources using HTTP and XML (see Resources).
The alternative option was to develop a new custom protocol especially for depositing resource into repositories.
In deciding between these two options, the ability to create a new protocol specific to this domain with its associated development costs had to be compared against the adoption of a current standard with lower development costs but a potential lack of some features required by the repository deposit use cases.
One of the key benefits of AtomPub is that it was designed to be extensible for specific specialized purposes. This ability was one of the main factors in adopting AtomPub for the basis of the solution. The standard offered much of the functionality required by the use cases, but not all. The consortium dropped some of the requirements that weren't needed. In other cases, AtomPub could be extended to provide functionality to fulfill requirements from the use cases that AtomPub didn't natively support.
A brief introduction to AtomPub
AtomPub is a protocol that sits on top of the HyperText Transfer Protocol (HTTP). It makes use of the core HTTP verbs to enable a client to interact with content on a web server:
POST: Create a new resource on the serverGET: Retrieve an existing resource from the serverPUT: Update an existing resource on the serverDELETE: Remove an existing resource from the server
These verbs allow the traditional CRUD (Create, Retrieve, Update, Delete) functions to take place. Interactions are based on the use of Atom Syndication Format: Atom is an XML-based document format that describes lists of related information known as feeds. Feeds are composed of a number of items, known as entries, each with an extensible set of attached metadata.
In addition to entries, AtomPub defines services and collections. Services are the interactions with the AtomPub repository using the verbs described previously. Items are stored in groups called collections, and each repository can have one or multiple collections. Within any AtomPub repository, often a user will only have the authorizations to deposit resources into specific collections. An anonymous user may only have the ability to deposit an item into a single collection, or none at all. Authentication is typically performed using HTTP Basic authentication, although the specification allows other methods.
The acronym SWORD stands for Simple Web-service Offering Repository Deposit. The name for the new repository deposit protocol was used because it represented the core features of the protocol; it allows the deposit of resources into repositories using a web service. It was described as simple because it was based on a well-known and pre-existing standard.
The SWORD specification has passed through several versions, with the latest version being 1.3 (see Resources).
A typical interaction with SWORD takes place in two stages. The first stage requires the user to interrogate the repository to find out how to deposit resources. The second stage is to perform that deposit.
With AtomPub, and therefore SWORD, a user can retrieve a service document. The service document describes the capabilities of the repository in terms of accepted file formats, the names of the collections, and the deposit URL for each collection. Since each user is authorized to deposit items only into collections for which they are authorized, the service document is generated for the specific user that requests it.
A service document is structured around named workspaces. Within each workspace is a set of collections into which deposits may be made. This simple but effective structure is shown in Listing 1. A typical AtomPub service document might just describe the title of the collection and the file formats that may be deposited into that collection. A repository for research outputs is likely to require much more specific descriptions of itself, the resources it will accept, and what it will do with any deposits.
Listing 1. A sample SWORD service document
<?xml version="1.0" encoding='utf-8'?>
<service>
<workspace>
<collection>
...
</collection>
<collection>
...
</collection>
</workspace>
<workspace>
<collection>
...
</collection>
<collection>
...
</collection>
</workspace>
</service>
|
Listing 2 shows a typical service document from a
repository. It describes a single workspace that includes two different
collections that a user may deposit resources into. The ability to extend
AtomPub is evident in Listing 2 as additional XML elements from different
schemas are used. In addition to the atom and
app namespaces, elements from the
sword and dcterms
namespaces are included. The SWORD namespace is used to describe
attributes of the collection related to SWORD, while the
dcterms namespace allows a fuller description
of the collections using the Dublin Core Metadata Initiative (DCMI)
Metadata Terms (see Resources).
Listing 2. A sample SWORD service document
<?xml version="1.0" encoding='utf-8'?>
<service xmlns="http://www.w3.org/2007/app"
xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:sword="http://purl.org/net/sword/"
xmlns:dcterms="http://purl.org/dc/terms/">
<sword:version>1.3</sword:version>
<workspace>
<atom:title>My Open Repository</atom:title>
<collection href="http://example.com/sword/data-sets">
<atom:title>Data set collection</atom:title>
<accept>application/zip</accept>
<sword:collectionPolicy>
This collection is for data sets created at this
institution. Data sets in this collection will be made
publicly available via the repository.
</sword:collectionPolicy>
<sword:mediation>true</sword:mediation>
<dcterms:abstract>
Data created at this institution for public access.
</dcterms:abstract>
<sword:acceptPackaging q="1.0">
http://purl.org/net/sword-types/METSDSpaceSIP
</sword:acceptPackaging>
<sword:acceptPackaging q="0.8">
http://purl.org/net/sword-types/bagit
</sword:acceptPackaging>
</collection>
<collection href="http://example.com/sword/articles">
<atom:title>Journal article collection</atom:title>
<accept>application/zip</accept>
<sword:collectionPolicy>
This collection is for journal articles created at this
institution. Articles in this collection will be made
publicly available via the repository.
</sword:collectionPolicy>
<dcterms:abstract>
Articles created at this institution for public access.
</dcterms:abstract>
<sword:acceptPackaging q="1.0">
http://purl.org/net/sword-types/METSDSpaceSIP
</sword:acceptPackaging>
<sword:acceptPackaging q="0.8">
http://purl.org/net/sword-types/bagit
</sword:acceptPackaging>
</collection>
</workspace>
</service>
|
In order to fulfill use cases where deposits are made by one entity on
behalf of another, SWORD introduced a new HTTP header that allows one user
to authenticate, but to specify that they are acting on behalf of another
user. The repository therefore has to identify that the authenticating
user is valid, but also that they may act in this way. The
X-On-Behalf-Of header is used for this purpose.
If a particular collection allows mediated deposits to be made, then it
will include an element specifying this. The first collection in Listing 2
shows this by including
<sword:mediation>true</sword:mediation>.
Another optional extension added by SWORD is the
sword:collectionPolicy element as shown in Listing 2 which describes the policy for the deposit
of materials for this particular collection. This can either be the
human-readable description or a URL of the policy.
In defining what items can be added to a repository collection, as well as
the collectionPolicy, a repository can state
which formats of packages it will accept, along with a quality value of 0
to 1 (where 0 is unacceptable and 1 is preferred). Packaging formats are
described further below.
Some repositories wish to impose limits on the size of packages that may be
deposited. This can be described in the service document with the
sword:maxUploadSize element that provides a
value in kilobytes.
The varied structure of repositories is such that some may contain only a
few collections, while others contain many thousands. An AtomPub service
document usually lists all collections. For a repository containing many
collections, this may produce a very large XML document that will take
time to compile, transmit, and parse. To help this situation, SWORD
introduces a new element,
<sword:service>, to describe nested
service documents. When the
<sword:service> element appears within a
<sword:collection> element, the
<sword:collection> element describes a
set of collections instead of a single collection. The
<sword:service> element contains the URL
used to retrieve the nested service document (for example:
<sword:service>http://example.com/sword/data-sets-service-document</sword:service>).
The nested service document describes each collection in the set. Note
that the nested service document can contain
<sword:service> elements as well,
allowing for a hierarchy of sets of collections.
After a user has found the collection into which they wish to make a
deposit, they can POST a package along with
some HTTP headers to the URL of the collection. The headers that are sent
with the deposit are designed to allow the repository to understand what
is being deposited, to authenticate the user, and to ensure that the
deposited package has been transmitted correctly. Listing
3 shows a typical set of headers used in a SWORD deposit.
Listing 3. A sample SWORD Service Document
POST /sword/data-sets HTTP/1.1 Host: example.com Content-Type: application/zip Authorization: Basic FTT48DhRzq0l82iUxe== Content-Length: nnn Content-MD5: md5-digest Content-Disposition: filename=package-file-name.zip X-Packaging: http://purl.org/net/sword-types/mets/dspace User-Agent: Example SWORD client user-agent string |
The headers are used to indicate the following:
Content-Type: The MIME type of the package being deposited. In this case, it is a Zip archive fileAuthorization: The HTTP basic authorization hash used to authenticate the userContent-Length: The size of the package being deposited. This is used to ensure that the package received by the server is the same size as what the client sentContent-MD5: An MD5 checksum hash of the package, used to ensure that it has transmitted successfully by calculating the checksum of the received file and comparing it to the value in this headerContent-Disposition: Used to send the filename of the deposit package to the serverX-Packaging: Used to describe the format of the package (see more about packaging formats below)User-Agent: The name of the SWORD client (optional)
There are three optional headers defined in the SWORD specification:
X-On-Behalf-Of: As with service document requests, this header is used to indicate that the deposit is being made on-behalf of another user.X-No-Op: A mimic of the traditionalnoopassembly instruction, used to indicate that this is a trial or test deposit, and that the user does not wish for the deposit to actually take place. This header can be useful for developers.X-Verbose: This requests the server to provide a more verbose response by populating asword:verboseDescriptionelement.
In response to a deposit, the server will return an Atom document. A sample
response is shown in Listing 4. An atom document
response will contain an ID, a title, a link to the deposit item within
the repository, and optionally links to the original deposit file and a
URL where updates can be PUT. As well as
returning an Atom document, the server will respond with a status of
either 200 OK if the
noop option was used,
201 Created if the deposit resulted in the
creation of a new item in the repository, or
202 Accepted if the deposited package has been
accepted but is awaiting action from the repository manager.
Listing 4. An atom document response to a deposit
<?xml version="1.0"?>
<entry xmlns="http://www.w3.org/2005/Atom"
xmlns:sword="http://purl.org/net/sword/">
<title>Item title</title>
<id>id:123</id>
<updated>2008-08-18T14:27:08Z</updated>
<author><name>Laboratory equipment X</name></author>
<summary type="text">A summary</summary>
<content type="text/html" src="http://example.com/data-sets/123"/>
<link rel="edit-media" href="http://example.com/data-sets/123/deposit.zip"/>
<link rel="edit" href="http://example.com/example.com/sword/data-sets.atom" />
<sword:userAgent>SWORD Repository Server 1.0</sword:userAgent>
<generator uri="http://repository.example.com/" version="1.0"/>
</entry>
|
Sometimes a deposit may fail or be rejected by the server. This can occur
due to many reasons such as the package failed to transmit properly, the
user may not have sufficient authorization to make the deposit, the
package type may be unsupported, or the file size is too big. If a problem
occurs, the server returns an error document. The error document contains
a URL that defines the type of failure, such as
http://purl.org/net/sword/error/MediationNotAllowed.
In addition, HTTP status codes are returned, such as
412 Precondition failed. These elements can be
used to provide the SWORD client with an explanation of why the deposit
failed.
When a deposit is made to a repository, the repository usually requires two resource types:
- The digital file(s) to be deposited (for example, a research journal article)
- Metadata describing the digital file(s) being deposited
SWORD requires a single file to be deposited, therefore both of these elements must be combined together as a package. Most often this involves putting them into a zip file, and depositing the zip file. The repository then unzips the file, extracts the file(s) for archiving, and looks for a file containing the metadata.
DSpace, EPrints, and Fedora all support a packaging format where the metadata is encoded within a METS (Metadata Encoding and Transmission Standard) document. The METS file also contains a list of files included in the package, which are then ingested into the repository. Other packaging formats are popular in different domains. For example, the e-learning domain uses the IMS-CP packaging format (see Resources).
It is normal to make use of a SWORD client when using SWORD. Like browsing the web requires a web browser, or sending email requires a mail-sending client, a client tool is used to interact with SWORD. This shields the user from the low-level protocol and request and response formats. In addition, SWORD clients can be used to collect metadata and files from a user in order to create a package ready for deposit.
Two SWORD API libraries already exist that can be used to easily create SWORD clients. These are written in Java and PHP. As an example, Listing 5 shows how a deposit can be made from a PHP application by using the PHP SWORD client library.
Listing 5. How to use the PHP SWORD client library
// Import the library
require('swordappclient.php');
// Create an instance of the client
$sac = new SWORDAPPClient();
// Request a service document
$sdr = $sac->servicedocument($url, $user, $password, $onbehalfof);
// Import the packager library
require('packager_mets_swap.php');
// Create a new package with the root and directory of the input files, and the
// root and directory of the package
$package = new PackagerMetsSwap($rootin, $dirin, $rootout, $fileout);
// Add metadata to the package
$package->setType($test_type);
$package->setTitle($title);
$package->setAbstract($abstract);
foreach ($creators as $creator) {
$package->addCreator($creator);
}
// Add a file to the package
$package->addFile($filename, $mimetype);
// Now deposit the package
$dr = $sac->deposit($depositurl, $username, $password, $onbehalfof, $filename,
$packageformat, $pacakgecontenttype);
|
The SWORD website contains a list of SWORD clients. Here are some examples of the wide range of clients that can be created (see Resources).
- SWORD Facebook application: The SWORD Facebook application allows you to select a repository, enter a user name and password for the repository, upload a file into Facebook, and describe the file by entering its title, author, and abstract. The application will then create a package and deposit it into the repository. A status update is posted in the user's profile with a link to the deposited item.
- Deposit from within Microsoft® Word: Microsoft created an authoring add-on that allows documents to be deposited from within the Word 2007. Because the .docx file format is a zip file of XML files, this acts as a packaging format and is able to carry embedded descriptive metadata about the document to the repository.
- EasyDeposit SWORD client creation toolkit: EasyDeposit is a PHP-based toolkit that allows custom SWORD clients to be created by selecting from a list of possible options, configuring the options, and adding a local look-and-feel. The toolkit makes it easy for users to create their own web-based SWORD clients without having to write any code.
- Deposit by email: SWORD can be wrapped inside other protocols such as email. For example a script can be written that reads an email mailbox and looks for unread emails. The emails can be processed by extracting metadata from them: where the author is the email sender, the title is the subject of the email, the abstract is the main message text, and files are attachments of the email. SWORD can then be used to deposit the contents of the email into the repository.
The SWORD protocol has been successful within the world of open repositories. However, it can be developed further, and in late 2010 further funding has been given to SWORD by JISC to develop a second version of SWORD. Part of this will involve looking at what other systems could make use of SWORD, such as enterprise content management systems or other document management platforms.
One of the main limitations of SWORD version 1 is that it only supports the deposit of items into repositories. However AtomPub also allows for items to be updated and deleted. SWORD version 2 will investigate the best ways to implement this functionality for repositories. By doing so, external systems that integrate with repositories will be able to use SWORD not only for the initial deposit of resources, but for their ongoing management as they proceed through their natural life cycles.
This article presented an overview of SWORD, why it was developed, and how it works. The use of SWORD continues to grow in the world of open repositories, but the scope of the protocol could be much wider. The protocol itself is about to develop further with the SWORD version 2 initiative, allowing other systems to use it in order to interact with repositories in a much richer fashion. Clients and programming toolkits are available to assist in using SWORD and there is an associated development community.
Learn
-
Joint Information Systems Committee
(JISC) inspires UK colleges and universities in the innovative use
of digital technologies, helping to maintain the UK's position as a global
leader in education.
-
SWORD: Simple
Web-service Offering Repository Deposit, Ariadne, Issue 54,
January 2008. This article offers an introduction to the JISC-funded SWORD
Project, which ran for eight months in mid-2007.
-
Nature Publishing Group's Manuscript Deposition Service: In July
2008, NPG launched the first phase of its Manuscript Deposition Service.
The free service helps authors fulfill funding and institutional mandates.
-
Open
archival information system — Reference model. ISO
14721:2003 specifies a reference model for an open archival
information system (OAIS). The purpose of this ISO 14721:2003 is to
establish a system for archiving information, both digitalized and
physical.
-
Dublin Core metadata initiative is a
commonly used standard for descriptive metadata.
-
EPrints is an open source
repository platform.
-
DSpace open source software enables open
sharing of content that spans organizations, continents and time.
-
Fedora is another major player in
the open source repository platform space.
-
BioMed
Central is an STM (Science, Technology and Medicine) publisher
which has pioneered the open access publishing model.
- The Open
Archives Initiative Protocol for Metadata Harvesting (OAI-PMH)
provides an application-independent interoperability framework based on
metadata harvesting.
-
The Atom Publishing Protocol
(AtomPub), RFC 5023, is an application-level protocol for
publishing and editing web resources.
- Simple
Web-service Offering Repository Deposit (SWORD): Learn more about
SWORD.
-
SWORD AtomPub
Profile version 1.3 is the latest version of the SWORD
specification.
-
SWORD v1 Clients and Demonstrator Repositories: See demonstration
clients for use in testing SWORD implementations.
- SWORD technical email list: Get help for working with SWORD.
-
HTTP Authentication: Basic
and Digest Access Authentication, RFC 2617, specifies an Internet
standards track protocol for the Internet community.
-
DCMI Metadata
Terms: Learn about the Dublin Core Metadata Initiative.
-
IMS Content
Packaging v1.2 Public Draft v2.0 specification describes data
structures that can be used to exchange data between systems that wish to
import, export, aggregate, and disaggregate packages of content.
-
Metadata Encoding and
Transmission Standard: The METS schema is a standard for encoding
descriptive, administrative, and structural metadata regarding objects
within a digital library, expressed using the XML schema language of the
World Wide Web Consortium.
-
IBM developerWorks
Industries: See this site for all the latest industry-specific
technical resources for developers.
-
developerWorks
podcasts: Listen to interesting interviews and discussions for
software developers.
-
developerWorks technical events and webcasts: Stay current with
developerWorks technical events and webcasts.
Get products and technologies
- SWORD
client: download code from the SWORD website.
-
IBM trial
software: Evaluate IBM software products in the method that suits
you best. From trial downloads to cloud-hosted products, developerWorks
features software especially for developers.
Discuss
-
developerWorks
blogs: Get involved in the developerWorks community.

Stuart Lewis has worked with open repositories in various roles over the past six years. Currently, he is the Digital Development Manager at The University of Auckland Library in New Zealand. Also, he is the Community Manager of the SWORD project, which continues to develop the SWORD repository deposit standard. Stuart is one of the core developers and committers for the DSpace open source repository platform. He maintains the EasyDeposit SWORD client creation toolkit, and the Repository66 mashup map of open repositories. Prior to working in Auckland, Stuart worked in a UK university where he led a technical team that undertook funded research into open repositories, including open access and data repositories. He was a key player in the creation of the UK's Repository Support Project (RSP), a support and guidance service to higher education institutions with respect to open repositories. Stuart blogs at http://blog.stuartlewis.com




