Creating a Search Collection

Search collections are created using the search-collection-create function. When creating a new collection, it is a good idea to base the new collection on a pre-defined, pre-configured collection in which most basic search collection configuration has already been done. This minimizes the amount of dynamic configuration that needs to be done for the new collection. This base collection can be created via the Watson Explorer Engine administration tool (which is usually much more convenient than creating and fully configuring a collection using the API). After creating a search collection, you will need to ensure that the appropriate services for that collection are running, as described in Starting, Stopping, Managing, and Monitoring Collection-Specific Services.

The default-push and default-broker-push collections are provided as examples of collections to which data is pushed rather than being retrieved by crawling. The default-push collection is the default collection on which collections that are created using the Watson Explorer Engine API are based. This collection will be used automatically if you do not specify a value for the based-on parameter.

Tip: If you intend to use a search collection with the Collection Broker, make sure that you use the default-collection-broker collection as a template for the new collection. This collection provides enhanced defaults for interaction with the Collection Broker.

Search collection names follow the standard rules for an XML NMTOKEN. They can consist of any combination of letters, digits, combining characters, and extenders, based on the Unicode definitions of these terms. In more practical terms, search collection names typically consist of letters, digits, and the characters '.', '-', '_', and ':'.

When a new collection is created, a source with the same name is also created unless a source by that name already exists. If a source by that name does exist, this does not generate an exception, but the new collection is not associated with the pre-existing source. (If you want them to be associated, you can manually configure a connection by modifying the source programmatically or in the Watson Explorer Engine administration tool.) When a collection is deleted, Watson Explorer Engine will also attempt to delete the corresponding source without generating an exception if the corresponding source does not exist.

Unlike projects and other configuration objects, inheritance for collection configurations is done at creation time and is not dynamic. In other word, changes to the configuration of a collection that other collections are based upon will not propagate to the collections that are based upon it.

XML message:

<SearchCollectionCreate xmlns="urn:/velocity/types">
<collection>my-new-collection</collection>
<based-on>my-base-collection</based-on>
</SearchCollectionCreate>

In C#:

    SearchCollectionCreate scc = new SearchCollectionCreate();
    scc.collection = COLLECTION;
    scc.basedon = BASED_ON;
    scc.collectionmeta = new SearchCollectionCreateCollectionmeta();
    scc.collectionmeta.vsemeta = new vsemeta();
    scc.collectionmeta.vsemeta.vsemetainfo = new vsemetainfo[1];
    scc.collectionmeta.vsemeta.vsemetainfo[0] = new vsemetainfo();
    scc.collectionmeta.vsemeta.vsemetainfo[0].livecrawldir = "e:\\parent-dir\\" + COLLECTION + "\\crawl0";
    scc.collectionmeta.vsemeta.vsemetainfo[0].stagingcrawldir = "e:\\parent-dir\\" + COLLECTION + "\\crawl1";
    scc.collectionmeta.vsemeta.vsemetainfo[0].cachedir = "e:\\parent-dir\\" + COLLECTION + "\\cache";
    
    port.SearchCollectionCreate(scc);

In Java:

    SearchCollectionCreate scc = new SearchCollectionCreate();
    scc.setCollection(COLLECTION);
    scc.setBasedOn(BASED_ON);
    port.searchCollectionCreate(scc);
warning: If you are using a custom base collection and are creating a push application in Watson Explorer Engine, make sure that this collection was based on the default-push collection that is provided with Watson Explorer Engine, or the default-collection-broker collection if you are using the Collection broker. These collections are pre-configured with many options that are useful when pushing content. In particular, these collections allow URLs by default and set the maximum number of hops to 0.