URI formats in the index

The uniform resource identifier (URI) of each document in the index indicates the type of crawler that added the document to the collection.

You can specify URIs or URI patterns when you configure categories, scopes, and quick links for a collection. You also specify the URI when you need to remove documents from the index or view detailed status information about a specific URI.

Search the collection to determine the URIs or URI patterns for a document. You can click the URIs in the search results to retrieve documents that you are interested in. You can copy the URI from the search results to use the URI in the administration console. For example, you can specify a URI pattern to automatically associate documents that match that URI pattern with a quick link.

When you specify a URI or URI pattern, you must specify the URL-encoded format for the URI and ensure that the URI does not contain characters that are not included in the US-ASCII coded character set. For details, see RFC1738, the Internet standard for URLs.

In the following example, you cannot specify the first URI, which contains Hebrew characters. You can, however, specify the second URI, which is the URL-encoded format of the first URI.
Incorrect URI
file:///c:/shared/hebrew/עברית
Correct URI
file:///c:/shared/hebrew/%D7%A2%D7%91%D7%A8%D7%99%D7%AA

Archive files

The URI format for documents that are extracted from an archive file (such as a .zip or .tar file) and then crawled is:
Original_URI(?|&)ArchiveEntry=Entry_Name(&ArchiveEntry=Entry_Name)
Parameters
Original_URI
The location of the archive file on the data source.
Entry_Name
The URL-encoded name of the archive entry in the archive file.
Examples
file:///d:/Archive1.zip
  file:///d:/Archive1.zip?ArchiveEntry=Folder1/PowerPoint.ppt
  file:///d:/Archive1.zip?ArchiveEntry=Folder2/Text.txt

Agent for Windows file systems crawlers

The URI format for documents that are crawled by an Agent for Windows file systems crawler is:
winfs://Host_Name/Drive:/Directory_Path/File_Name
Parameters
URL encoding is applied to all of the fields.
Host_Name
The host name or IP address of the server where the document is located.
Drive
The drive on the server where the document is located.
Directory_Path
The path for a shared directory in the Windows domain.
File_Name
The name of the file.
Example
winfs:////9.187.186.83/c:/temp/test/test2/Copy+%284%29+of+dumpstore_1.txt

BoardReader crawlers

The URI format for documents that are crawled by a BoardReader crawler is as follows:
  • Replace the protocol of URL with boardreader://
  • Add the parameter boardreaderid= and the BoardReader ID to the URL
  • Add the parameter useSSL=true to the URL when the original protocol is https://
URL encoding is applied to all of the fields.
Example 1
URL: http://www.facebook.com/1102412197/posts/10202426006027108
BoardReader ID: 17005669247
URI: boardreader://www.facebook.com/1102412197/posts/10202426006027108?boardreaderid=17005669247URI: boardreader://www.facebook.com/1102412197/posts/
10202426006027108?boardreaderid=17005669247

Example 2
URL: http://foursoftpaws.yuku.com/reply/5369/Kjara-Tockica#reply-5369
BoardReader ID: 12702129376
URI: boardreader://foursoftpaws.yuku.com/reply/5369/Kjara-Tockica%23reply-5369?boardreaderid=12702129376URI: boardreader://foursoftpaws.yuku.com/reply/5369/
Kjara-Tockica%23reply-5369?boardreaderid=12702129376
Example 3
URL: https://www.flashback.org/t2284857#p46743985
BoardReader ID: 23427574780
URI: boardreader://www.flashback.org/t2284857%23p46743985?boardreaderid=23427574780&useSSL=trueURI: boardreader://www.flashback.org/t2284857%23p46743985?
boardreaderid=23427574780&useSSL=true

Case Manager crawlers

The URI format for documents that are crawled by a Case Manager crawler is:
p8ce://host_name:port/object_store/version_series_id/hash_code[/element_number]?protocol=http
p8ce://host_name:port/object_store/version_series_id/hash_code[/element_number]
?protocol=http
Parameters
URL encoding is applied to all of the fields.
host_name
A host name of a server on which the IBM® FileNet® Content Engine runs.
port
A port number on which the Content Engine Web Service runs.
object_store
A name of an object store in which a document is stored.
version_series_id
A unique document identifier. The version series ID is used because the document ID changes as the document is versioned while the version series ID does not change.
hash_code
To make a distinction between folders, a hash code is added to the path for the object URI. In the following example, 7584373 is the hash code of folder path /ObjectStore/CaseSolution/../CaseFolder/SubFolders:
p8ce://9.39.44.204:9080/wsi/FNCEWS40MTOM/ATOSAIX2/{2D09F43F-3392-485E-B338-E67D68F04FA6}.7584373?protocol=http
p8ce://9.39.44.204:9080/wsi/FNCEWS40MTOM/ATOSAIX2/
{2D09F43F-3392-485E-B338-E67D68F04FA6}.7584373?protocol=http
element_number
An index of content elements. This variable is appended only when a URI points to a document that contains multiple content elements.
protocol
A protocol for accessing the Web Service. Valid values are http or https.

Content Integrator crawlers

The URI format for documents that are crawled by a Content Integrator crawler in server access mode is:
vbr://Server_Name/Repository_System_ID/Repository_Persistent_ID
     /Item_ID/Version_ID
     /Item_Type/?[Page=Page_Number&] JNDI_properties
The URI format for documents that are crawled by a Content Integrator crawler in direct access mode is:
vbr:///Repository_System_ID/Repository_Persistent_ID
     /Item_ID/Version_ID
     /Item_Type/[?Page=Page_Number]
Parameters
URL encoding is applied to all of the fields.
Server_Name
The name of the IBM Content Integrator server.
Repository_System_ID
The system ID for the repository.
Repository_Persistent_ID
The persistent ID for the repository.
Item_ID
The ID for the item.
Version_ID
The ID for the version. If the version ID is blank, this value indicates the latest version of the document.
Item_Type
The type of the item (CONTENT or FOLDER).
Page_Number
The page number.
JNDI_properties
The JNDI properties for the J2EE application client. There are two types of properties:
java.naming.factory.initial
The name of the class for the application server that is used to create the EJB handle.
java.naming.provider.url
The URL to the naming service for the application server that is used to request the EJB handle.
Examples
Documentum:
vbr://vbrsrv.ibm.com/Documentum/c06b/094e827780000302//CONTENT/?
java.naming.provider.url=iiop%3A%2F%2Fmyvbr.ibm.com%3A2809&
java.naming.factory.initial=com.ibm.websphere.naming.WsnInitContextFactory
FileNet PanagonCS:
vbr://vbrsrv.ibm.com/PanagonCS/4a4c/003671066//CONTENT/?Page=1&
java.naming.provider.url=iiop%3A%2F%2Fmyvbr.ibm.com%3A2809&
java.naming.factory.initial=com.ibm.websphere.naming.WsnInitContextFactory

Content Manager crawlers

The URI format for documents that are crawled by a Content Manager crawler is:
cm://Server_Name/Item_Type_Name/PID
Parameters
URL encoding is applied to the PID parameter.
Server_Name
The name of the IBM Content Manager Enterprise Edition library server.
Item_Type_Name
The name of the target item type.
PID
The Content Manager EE persistent identifier.
Example
cm://cmsrvctg/ITEMTYPE1/92+3+ICM8+icmnlsdb12+ITEMTYPE159+26+A1001001A
03F27B94411D1831718+A03F27B+94411D183171+14+1018

DB2 crawlers

The URI format for documents that are crawled by a DB2 crawler is:
db2://Database_Name/Table_Name
     /Unique_Identifier_Column_Name1/Unique_Identifier_Value1
     [/Unique_Identifier_Column_Name2/Unique_Identifier_Value2/...
     /Unique_Identifier_Column_NameN/Unique_Identifier_ValueN]
Parameters:
URL encoding is applied to all of the fields.
Database_Name
The internal name of the database or the alias for the database.
Table_Name
The name of the target table, including the name of the schema.
Unique_Identifier_Column_Name1
The name of the first Unique Identifier column in the table.
Unique_Identifier_Value1
The value of the first Unique Identifier column.
Unique_Identifier_Column_NameN
The name of the nth Unique Identifier column in the table.
Unique_Identifier_ValueN
The value of the nth Unique Identifier column.
Examples
Local, cataloged database:
db2://LOCALDB/SCHEMA1.TABLE1/MODEL/ThinkPadA20
Remote, uncataloged database:
db2://myserver.mycompany.com:50001/REMOTEDB/SCHEMA2.TABLE2/NAME/DAVID

Exchange Server crawlers

Because Watson Explorer Content Analytics cannot obtain the URL of attachments through Outlook Web App (OWA), it shows alternate URLs for attached items. Because Exchange Server 2007 supports only the Internet Explorer browser, users can access OWA of Exchange Server 2007 only with that browser.

When users click titles in the results page of the enterprise search application or content analytics miner, the corresponding Exchange Server item is shown through OWA. If the user has MailboxPermission to the mailbox that contains the search results, the user can also open the item through OWA. However, if the user has MailboxFolderPermission or Delegation to the mailbox that contains the search results, the user must access the following URL before clicking the title to access the item, where user's_primarySmtpAddress is the address that the search results originally belong to
https://hostname/OWA/user's_primarySmtpAddress/?cmd=contents
The Exchange Server crawler generates original URIs for crawled documents. The crawler uses IDs for the URI that are unique values among items and attachments. If a document is an item, the URI is formatted as follows:
exchadp://hostname/mailbox_name/itemId=itemId&owa=owaURL
If document is an attachment, URI is formatted as follows:
exchadp://hostname/mailbox_name/attachmentId=attachmentId&owa=owaURL

FileNet P8 crawlers

The URI format for documents that are crawled by a FileNet P8 crawler is:
p8ce://host_name:port/object_store/object_id[/element_number]?protocol=http
Parameters
URL encoding is applied to all of the fields.
host_name
A host name of a server on which the IBM FileNet Content Engine runs.
port
A port number on which the Content Engine Web Service runs.
object_store
A name of an object store in which a document is stored.
object_id
A globally unique identifier (GUID) assigned by the Content Engine to a stored object. A character string that contains 38 characters, the GUID consists of a left curly brace, 8 hexadecimal characters, a dash, 4 hexadecimal characters, a dash, 4 hexadecimal characters, a dash, 4 hexadecimal characters, a dash, 12 hexadecimal characters, and a right curly brace. Braces are encoded by URL encoding rules. For example:

%7B1234abcd-56ef-7a89-9fe8-7d65cd43ba21%7D

element_number
An index of content elements. This variable is appended only when a URI points to a document that contains multiple content elements.
protocol
A protocol for accessing the Web Service. Valid values are http or https.
Example
p8ce://host.filenet.com:9080/STORE1/{1234abcd-56ef-7a89-9fe8-7d65cd43ba21}/2

JDBC database crawlers

The URI format for documents that are crawled by a JDBC database crawler is:
jdbc://DB_URL/Table_Name
      /Unique_Identifier_Column_Name1/Unique_Identifier_Value1
      /[Unique_Identifier_Column_Name2/Unique_Identifier_Value2
     /.../Unique_Identifier_Column_NameN/Unique_Identifier_ValueN]
Parameters
URL encoding is applied to all of the fields.
DB_URL
The URL for the database.
Table_Name
The name of the target table, including the name of the schema.
Unique_Identifier_Column_Name1
The name of the first Unique Identifier column in the table.
Unique_Identifier_Value1
The value of the first Unique Identifier column.
Unique_Identifier_Column_NameN
The name of the nth Unique Identifier column in the table.
Unique_Identifier_ValueN
The value of the nth Unique Identifier column.
Examples:
DB2 database:
jdbc:db2://host01.svl.ibm.com:50000/SAMPLE/DB2INST1.ORG/DEPTNUMB/51
Oracle database:
jdbc:oracle:thin:@/host01.svl.ibm.com:1521:ora/SCOTT.EMP/EMPNO/7934
MS SQL Server 2000 database:
jdbc:microsoft:sqlserver://host01.svl.ibm.com:1433;
DatabaseName=Northwind/dbo.Region/RegionID/100
MS SQL Server 2005 database:
jdbc:sqlserver://host01.svl.ibm.com:1433;
DatabaseName=Northwind/dbo.Region/RegionID/100

Notes crawlers

The URI format for documents that are crawled by a Notes crawler is:
domino://Server_Name[:Port_Number]/Database_Replica_ID/Database_Path_and_Name
     /[View_Universal_ID]/Document_Universal_ID
     [?AttNo=Attachment_Number&AttName=Attachment_File_Name]
Parameters
URL encoding is applied to all of the fields.
Server_Name
The name of the Lotus Notes® server.
Port_Number
The port number for the Lotus Notes server. The port number is optional.
Database_Replica_ID
The identifier for the database replica.
Database_Path_and_Name
The path and file name for the NSF database on the target Lotus Notes server.
View_Universal_ID
The View Universal ID that is defined on the target database. This ID is specified only when the document is selected from a view or folder. If you do not designate a view or folder to crawl (for example if you specify that you want to crawl all documents in a database), the View Universal ID is not specified.
Document_Universal_ID
The Document Universal ID that is defined in the document that is crawled by the crawler.
Attachment_Number
A consecutive number, starting from zero, for each attachment. The attachment number is optional.
Attachment_File_Name
The original name of the attachment file. The attachment file name is optional.
Examples
A document that was selected for crawling by view or folder:
domino://dominosvr.ibm.com/49256D3A000A20DE/Database.nsf/
8178B1C14B1E9B6B8525624F0062FE9F/0205F44FA3F45A9049256DB20042D226
A document that was not selected for crawling by view or folder:
domino://dominosvr.ibm.com/49256D3A000A20DE/Database.nsf//
0205F44FA3F45A9049256DB20042D226
A document attachment:
domino://dominosvr.ibm.com/49256D3A000A20DE/Database.nsf//
0205F44FA3F45A9049256DB20042D226?AttNo=0&AttName=AttachedFile.doc

Quickr for Domino crawlers

The URI format for documents that are crawled by a Quickr for Domino crawler is:
quickplace://Server_Name:Port_Number/Database_Replica_ID/Database_Path_and_Name
/View_Universal_ID/Document_Universal_ID
/?AttNo=Attachment_Number&AttName=Attachment_File_Name     
Parameters
URL encoding is applied to all of the fields.
Server_Name
The host name of the Quickr for Domino server.
Port_Number
Optional: The port number for the Quickr for Domino server.
Database_Replica_ID
The identifier for the database replica.
Database_Path_and_Name
The path and file name for the document NSF database on the target Quickr for Domino server.
View_Universal_ID
The View Universal ID that is used to crawl documents.
Document_Universal_ID
The Document Universal ID that is defined in the crawled document.
Attachment_Number
Optional: A consecutive number, starting from zero, for each attachment.
Attachment_File_Name
Optional: The original name of the attachment file.
Examples
A document:
quickplace://ltwsvr.ibm.com/49257043000214B3/QuickPlace%5Csampleplace
%5CPageLibrary4925704300021490.nsf
/A7986FD2A9CD47090525670800167225
/2B02B1DE3A82B2CE49257043001C2498

A page attachment:

quickplace://ltwsvr.ibm.com/49257043000214B3/QuickPlace%5Csampleplace
%5CPageLibrary4925704300021490.nsf
/A7986FD2A9CD47090525670800167225
/2B02B1DE3A82B2CE49257043001C2498
?AttNo=0&AttName==QPCons3.ppt

Seed list crawlers

The URI format for documents that are crawled by a Seed list crawler is:
seedlist://Page_URL?pageID=Page_ID[&useSSL;=true]
Parameters
URL encoding is applied to all of the fields.
Page_URL
The URL for the document (unique for each document).
Page_ID
The object identifier for the document.
useSSL
When the protocol is HTTPS, &useSSL;=true is added to the URI. Otherwise, useSSL is omitted.
Example
HTTPS protocol:
seedlist://quickrserver.ibm.com:10035/lotus/mypoc?uri=dm:bec6090046f1cd5
2bc5cfcb06e9f4550&verb;=view&pageID;=NlFSZURlMkJQNjZSMDZQMUMwM1FPNjZCQzY
2SUw2SUhPNk1RQ0M2Uk80Nk9PNjVCRUM2UUs2TDFDMA==&useSSL;=true

SharePoint crawlers

The SharePoint crawler does not generate its own format of document URI. It creates an accessible URL for the document URI. The accessible URL can be changed according to the Site and Form configuration of the SharePoint server. The crawler tries to retrieve the display form URL and append the document ID to it. If the crawler is configured to retrieve a URL from a specific field, the crawler tries to use the field value as the URI. This format is useful for crawling lists that do not generate URLs based on the primary key value. The default format is:
http://server/display_form_path?primary_key_field name=primary_key_value
Parameters
URL encoding is applied to all of the fields.
server
display_form_path
primary_key_field name
primary_key_value
Example
https://sharepoint.example.ibm.com:9999/rootDir/Shared%20Documents/
Forms/DispForm.aspx?ID=5

UNIX file system crawlers

The URI format for documents that are crawled by a UNIX file system crawler is:
file:///Directory_Name/File_Name
Parameters
URL encoding is applied to all of the fields.
Directory_Name
The absolute path name for the directory.
File_Name
The name of the file.
Example
file:///home/user/test.doc

Windows file system crawlers

The URI formats for documents that are crawled by a Windows file system crawler are:
file:///Directory_Name/File_Name
file:////Network_Folder_Name/Directory_Name/File_Name
Parameters
URL encoding is applied to all of the fields.
Directory_Name
The absolute path name for the directory.
File_Name
The name of the file.
Network_Folder_Name
For documents on remote servers only, the name of the shared folder on a Windows network.
Examples
Local file system:
file:///d:/directory/test.doc
Network file system:
file:////filesvr.ibm.com/directory/file.doc