URI formats in the index
The uniform resource identifier (URI) of each document in the index indicates the type of crawler that added the document to the collection.
You can specify URIs or URI patterns when you configure categories, scopes, and quick links for a collection. You also specify the URI when you need to remove documents from the index or view detailed status information about a specific URI.
Search the collection to determine the URIs or URI patterns for a document. You can click the URIs in the search results to retrieve documents that you are interested in. You can copy the URI from the search results to use the URI in the administration console. For example, you can specify a URI pattern to automatically associate documents that match that URI pattern with a quick link.
When you specify a URI or URI pattern, you must specify the URL-encoded format for the URI and ensure that the URI does not contain characters that are not included in the US-ASCII coded character set. For details, see RFC1738, the Internet standard for URLs.
- Incorrect URI
file:///c:/shared/hebrew/עברית
- Correct URI
file:///c:/shared/hebrew/%D7%A2%D7%91%D7%A8%D7%99%D7%AA
Archive files
Original_URI(?|&)ArchiveEntry=Entry_Name(&ArchiveEntry=Entry_Name)
- Parameters
Original_URI
- The location of the archive file on the data source.
Entry_Name
- The URL-encoded name of the archive entry in the archive file.
- Examples
file:///d:/Archive1.zip file:///d:/Archive1.zip?ArchiveEntry=Folder1/PowerPoint.ppt file:///d:/Archive1.zip?ArchiveEntry=Folder2/Text.txt
Agent for Windows file systems crawlers
winfs://Host_Name/Drive:/Directory_Path/File_Name
- Parameters
- URL encoding is applied to all of the fields.
Host_Name
- The host name or IP address of the server where the document is located.
Drive
- The drive on the server where the document is located.
Directory_Path
- The path for a shared directory in the Windows domain.
File_Name
- The name of the file.
- Example
-
winfs:////9.187.186.83/c:/temp/test/test2/Copy+%284%29+of+dumpstore_1.txt
BoardReader crawlers
- Replace the protocol of URL with boardreader://
- Add the parameter boardreaderid= and the BoardReader ID to the URL
- Add the parameter useSSL=true to the URL when the original protocol is https://
- Example 1
- URL: http://www.facebook.com/1102412197/posts/10202426006027108
BoardReader ID: 17005669247
URI: boardreader://www.facebook.com/1102412197/posts/10202426006027108?boardreaderid=17005669247URI: boardreader://www.facebook.com/1102412197/posts/
10202426006027108?boardreaderid=17005669247
- Example 2
- URL: http://foursoftpaws.yuku.com/reply/5369/Kjara-Tockica#reply-5369
BoardReader ID: 12702129376
URI: boardreader://foursoftpaws.yuku.com/reply/5369/Kjara-Tockica%23reply-5369?boardreaderid=12702129376URI: boardreader://foursoftpaws.yuku.com/reply/5369/
Kjara-Tockica%23reply-5369?boardreaderid=12702129376 - Example 3
- URL: https://www.flashback.org/t2284857#p46743985
BoardReader ID: 23427574780
URI: boardreader://www.flashback.org/t2284857%23p46743985?boardreaderid=23427574780&useSSL=trueURI: boardreader://www.flashback.org/t2284857%23p46743985?
boardreaderid=23427574780&useSSL=true
Case Manager crawlers
p8ce://host_name:port/object_store/version_series_id/hash_code[/element_number]?protocol=http
p8ce://host_name:port/object_store/version_series_id/hash_code[/element_number]
?protocol=http
- Parameters
- URL encoding is applied to all of the fields.
host_name
- A host name of a server on which the IBM® FileNet® Content Engine runs.
port
- A port number on which the Content Engine Web Service runs.
object_store
- A name of an object store in which a document is stored.
version_series_id
- A unique document identifier. The version series ID is used because the document ID changes as the document is versioned while the version series ID does not change.
hash_code
- To make a distinction between folders, a hash code is
added to the path for the object
URI. In the following
example,
7584373
is the hash code of folder path /ObjectStore/CaseSolution/../CaseFolder/SubFolders:p8ce://9.39.44.204:9080/wsi/FNCEWS40MTOM/ATOSAIX2/{2D09F43F-3392-485E-B338-E67D68F04FA6}.7584373?protocol=http
p8ce://9.39.44.204:9080/wsi/FNCEWS40MTOM/ATOSAIX2/ {2D09F43F-3392-485E-B338-E67D68F04FA6}.7584373?protocol=http
element_number
- An index of content elements. This variable is appended only when a URI points to a document that contains multiple content elements.
protocol
- A protocol for accessing the Web Service. Valid values
are
http
orhttps
.
Content Integrator crawlers
vbr://Server_Name/Repository_System_ID/Repository_Persistent_ID
/Item_ID/Version_ID
/Item_Type/?[Page=Page_Number&] JNDI_properties
The
URI format for documents that are crawled by a Content Integrator crawler in direct access
mode is:vbr:///Repository_System_ID/Repository_Persistent_ID
/Item_ID/Version_ID
/Item_Type/[?Page=Page_Number]
- Parameters
- URL encoding is applied to all of the fields.
Server_Name
- The name of the IBM Content Integrator server.
Repository_System_ID
- The system ID for the repository.
Repository_Persistent_ID
- The persistent ID for the repository.
Item_ID
- The ID for the item.
Version_ID
- The ID for the version. If the version ID is blank, this value indicates the latest version of the document.
Item_Type
- The type of the item (CONTENT or FOLDER).
Page_Number
- The page number.
JNDI_properties
- The JNDI properties for the J2EE application client. There are
two types of properties:
- java.naming.factory.initial
- The name of the class for the application server that is used to create the EJB handle.
- java.naming.provider.url
- The URL to the naming service for the application server that is used to request the EJB handle.
- Examples
- Documentum:
vbr://vbrsrv.ibm.com/Documentum/c06b/094e827780000302//CONTENT/? java.naming.provider.url=iiop%3A%2F%2Fmyvbr.ibm.com%3A2809& java.naming.factory.initial=com.ibm.websphere.naming.WsnInitContextFactory
FileNet PanagonCS:vbr://vbrsrv.ibm.com/PanagonCS/4a4c/003671066//CONTENT/?Page=1& java.naming.provider.url=iiop%3A%2F%2Fmyvbr.ibm.com%3A2809& java.naming.factory.initial=com.ibm.websphere.naming.WsnInitContextFactory
Content Manager crawlers
cm://Server_Name/Item_Type_Name/PID
- Parameters
- URL encoding is applied to the
PID
parameter.Server_Name
- The name of the IBM Content Manager Enterprise Edition library server.
Item_Type_Name
- The name of the target item type.
PID
- The Content Manager EE persistent identifier.
- Example
cm://cmsrvctg/ITEMTYPE1/92+3+ICM8+icmnlsdb12+ITEMTYPE159+26+A1001001A 03F27B94411D1831718+A03F27B+94411D183171+14+1018
DB2 crawlers
db2://Database_Name/Table_Name
/Unique_Identifier_Column_Name1/Unique_Identifier_Value1
[/Unique_Identifier_Column_Name2/Unique_Identifier_Value2/...
/Unique_Identifier_Column_NameN/Unique_Identifier_ValueN]
- Parameters:
- URL encoding is applied to all of the fields.
Database_Name
- The internal name of the database or the alias for the database.
Table_Name
- The name of the target table, including the name of the schema.
Unique_Identifier_Column_Name1
- The name of the first Unique Identifier column in the table.
Unique_Identifier_Value1
- The value of the first Unique Identifier column.
Unique_Identifier_Column_NameN
- The name of the nth Unique Identifier column in the table.
Unique_Identifier_ValueN
- The value of the nth Unique Identifier column.
- Examples
- Local, cataloged database:
db2://LOCALDB/SCHEMA1.TABLE1/MODEL/ThinkPadA20
Remote, uncataloged database:db2://myserver.mycompany.com:50001/REMOTEDB/SCHEMA2.TABLE2/NAME/DAVID
Exchange Server crawlers
Because Watson Explorer Content Analytics cannot obtain the URL of attachments through Outlook Web App (OWA), it shows alternate URLs for attached items. Because Exchange Server 2007 supports only the Internet Explorer browser, users can access OWA of Exchange Server 2007 only with that browser.
user's_primarySmtpAddress
is
the address that the search results originally belong tohttps://hostname/OWA/user's_primarySmtpAddress/?cmd=contents
exchadp://hostname/mailbox_name/itemId=itemId&owa=owaURL
exchadp://hostname/mailbox_name/attachmentId=attachmentId&owa=owaURL
FileNet P8 crawlers
p8ce://host_name:port/object_store/object_id[/element_number]?protocol=http
- Parameters
- URL encoding is applied to all of the fields.
host_name
- A host name of a server on which the IBM FileNet Content Engine runs.
port
- A port number on which the Content Engine Web Service runs.
object_store
- A name of an object store in which a document is stored.
object_id
- A globally unique identifier (GUID) assigned by the Content Engine to a stored object. A character
string that contains 38 characters, the GUID consists of a left curly
brace, 8 hexadecimal characters, a dash, 4 hexadecimal characters,
a dash, 4 hexadecimal characters, a dash, 4 hexadecimal characters,
a dash, 12 hexadecimal characters, and a right curly brace. Braces
are encoded by URL encoding rules. For example:
%7B1234abcd-56ef-7a89-9fe8-7d65cd43ba21%7D
element_number
- An index of content elements. This variable is appended only when a URI points to a document that contains multiple content elements.
protocol
- A protocol for accessing the Web Service. Valid values are
http
orhttps
.
- Example
p8ce://host.filenet.com:9080/STORE1/{1234abcd-56ef-7a89-9fe8-7d65cd43ba21}/2
JDBC database crawlers
jdbc://DB_URL/Table_Name
/Unique_Identifier_Column_Name1/Unique_Identifier_Value1
/[Unique_Identifier_Column_Name2/Unique_Identifier_Value2
/.../Unique_Identifier_Column_NameN/Unique_Identifier_ValueN]
- Parameters
- URL encoding is applied to all of the fields.
DB_URL
- The URL for the database.
Table_Name
- The name of the target table, including the name of the schema.
Unique_Identifier_Column_Name1
- The name of the first Unique Identifier column in the table.
Unique_Identifier_Value1
- The value of the first Unique Identifier column.
Unique_Identifier_Column_NameN
- The name of the nth Unique Identifier column in the table.
Unique_Identifier_ValueN
- The value of the nth Unique Identifier column.
- Examples:
- DB2 database:
jdbc:db2://host01.svl.ibm.com:50000/SAMPLE/DB2INST1.ORG/DEPTNUMB/51
Oracle database:jdbc:oracle:thin:@/host01.svl.ibm.com:1521:ora/SCOTT.EMP/EMPNO/7934
MS SQL Server 2000 database:jdbc:microsoft:sqlserver://host01.svl.ibm.com:1433; DatabaseName=Northwind/dbo.Region/RegionID/100
MS SQL Server 2005 database:jdbc:sqlserver://host01.svl.ibm.com:1433; DatabaseName=Northwind/dbo.Region/RegionID/100
Notes crawlers
domino://Server_Name[:Port_Number]/Database_Replica_ID/Database_Path_and_Name
/[View_Universal_ID]/Document_Universal_ID
[?AttNo=Attachment_Number&AttName=Attachment_File_Name]
- Parameters
- URL encoding is applied to all of the fields.
Server_Name
- The name of the Lotus Notes® server.
Port_Number
- The port number for the Lotus Notes server. The port number is optional.
Database_Replica_ID
- The identifier for the database replica.
Database_Path_and_Name
- The path and file name for the NSF database on the target Lotus Notes server.
View_Universal_ID
- The View Universal ID that is defined on the target database. This ID is specified only when the document is selected from a view or folder. If you do not designate a view or folder to crawl (for example if you specify that you want to crawl all documents in a database), the View Universal ID is not specified.
Document_Universal_ID
- The Document Universal ID that is defined in the document that is crawled by the crawler.
Attachment_Number
- A consecutive number, starting from zero, for each attachment. The attachment number is optional.
Attachment_File_Name
- The original name of the attachment file. The attachment file name is optional.
- Examples
- A document that was selected for crawling by view or folder:
domino://dominosvr.ibm.com/49256D3A000A20DE/Database.nsf/ 8178B1C14B1E9B6B8525624F0062FE9F/0205F44FA3F45A9049256DB20042D226
A document that was not selected for crawling by view or folder:domino://dominosvr.ibm.com/49256D3A000A20DE/Database.nsf// 0205F44FA3F45A9049256DB20042D226
A document attachment:domino://dominosvr.ibm.com/49256D3A000A20DE/Database.nsf// 0205F44FA3F45A9049256DB20042D226?AttNo=0&AttName=AttachedFile.doc
Quickr for Domino crawlers
quickplace://Server_Name:Port_Number/Database_Replica_ID/Database_Path_and_Name
/View_Universal_ID/Document_Universal_ID
/?AttNo=Attachment_Number&AttName=Attachment_File_Name
- Parameters
- URL encoding is applied to all of the fields.
Server_Name
- The host name of the Quickr for Domino server.
Port_Number
- Optional: The port number for the Quickr for Domino server.
Database_Replica_ID
- The identifier for the database replica.
Database_Path_and_Name
- The path and file name for the document NSF database on the target Quickr for Domino server.
View_Universal_ID
- The View Universal ID that is used to crawl documents.
Document_Universal_ID
- The Document Universal ID that is defined in the crawled document.
Attachment_Number
- Optional: A consecutive number, starting from zero, for each attachment.
Attachment_File_Name
- Optional: The original name of the attachment file.
- Examples
- A document:
quickplace://ltwsvr.ibm.com/49257043000214B3/QuickPlace%5Csampleplace %5CPageLibrary4925704300021490.nsf /A7986FD2A9CD47090525670800167225 /2B02B1DE3A82B2CE49257043001C2498
A page attachment:
quickplace://ltwsvr.ibm.com/49257043000214B3/QuickPlace%5Csampleplace %5CPageLibrary4925704300021490.nsf /A7986FD2A9CD47090525670800167225 /2B02B1DE3A82B2CE49257043001C2498 ?AttNo=0&AttName==QPCons3.ppt
Seed list crawlers
seedlist://Page_URL?pageID=Page_ID[&useSSL;=true]
- Parameters
- URL encoding is applied to all of the fields.
Page_URL
- The URL for the document (unique for each document).
Page_ID
- The object identifier for the document.
useSSL
- When the protocol is HTTPS,
&useSSL;=true
is added to the URI. Otherwise,useSSL
is omitted.
- Example
- HTTPS protocol:
seedlist://quickrserver.ibm.com:10035/lotus/mypoc?uri=dm:bec6090046f1cd5 2bc5cfcb06e9f4550&verb;=view&pageID;=NlFSZURlMkJQNjZSMDZQMUMwM1FPNjZCQzY 2SUw2SUhPNk1RQ0M2Uk80Nk9PNjVCRUM2UUs2TDFDMA==&useSSL;=true
SharePoint crawlers
http://server/display_form_path?primary_key_field name=primary_key_value
- Parameters
- URL encoding is applied to all of the fields.
server
display_form_path
primary_key_field name
primary_key_value
- Example
https://sharepoint.example.ibm.com:9999/rootDir/Shared%20Documents/ Forms/DispForm.aspx?ID=5
UNIX file system crawlers
file:///Directory_Name/File_Name
- Parameters
- URL encoding is applied to all of the fields.
Directory_Name
- The absolute path name for the directory.
File_Name
- The name of the file.
- Example
file:///home/user/test.doc
Windows file system crawlers
file:///Directory_Name/File_Name
file:////Network_Folder_Name/Directory_Name/File_Name
- Parameters
- URL encoding is applied to all of the fields.
Directory_Name
- The absolute path name for the directory.
File_Name
- The name of the file.
Network_Folder_Name
- For documents on remote servers only, the name of the shared folder on a Windows network.
- Examples
- Local file system:
file:///d:/directory/test.doc
Network file system:file:////filesvr.ibm.com/directory/file.doc