This blog promotes knowledge sharing through experience and collaboration. For more product information, visit our WebSphere Commerce CSE page. For easier navigation, utilize the Categories to find posts that match your interest.
Breaking Down a Solr Query
Querying the index directly is a great tool to help troubleshoot search related issues. I thought it would be worthwhile to put together a quick post to help understand the basics of a solr query.
A quick overview of the three different segments of a simple solr request. Each segment is defined in greater detail within its respective section. Looking at the simplest query:
(A) Server Configurations:
Server configuration is highlighted in red and is composed of the search server hostname and its port. At runtime these configurations are extracted from the wc-search.xml.
(B) Solr cores:
The solr core or the index is highlighted in purple. Cores map a request to a Solr index as there can be multiple different indexes on a search server. Each master catalog, language and indextype combination have their own set of indexes and are listed within the solr.xml file.
(C) Query Parameters:
The query which is highlighted in green is used to search the index. There are multiple types of 'query parameters' which are used to control the search against the index. Each query parameter filters the documents within the index based on name/value pairs. A few of the common query parameters are 'q' (query field), 'fq' (filter query) and 'fl' (field limit).
This is the easiest segment. It maps to the following: http://<hostname>:<port>/solr/. The hostname of a search server is determined by the configuration found in the wc-search.xml. For FEP 7+ users it can be found on your Search Server otherwise it is located within the Commerce EAR.
Location: <AppServer install dir>/profiles/<instance name_solr>/installedApps/<instance>_search_cell/Search_<instance>.ear/xml/config/com.ibm.commerce.catalog-fep/wc-search.xml
For FEP 7+ users, the query is formed by the search server itself which would mean that it would use the basic configuration in the wc-search.xml which looks like the following:
The search server hostname used by commerce to make a REST request can be found within the commerce WAS admin console using the following parameter: com.ibm.commerce.foundation.server.services.search.hostname. The following article describes the parameters in detail here: http://www.ibm.com/support/knowledgecenter/en/SSZLC2_7.0.0/com.ibm.commerce.developer.doc/tasks/tsdsearchconfignsbindings.htm
For users on prior feature packs, since the solr query is formed on the commerce server, the server configurations are defined within the wc-search.xml like so:
The core configuration is mapped to the server configuration to determine which core (masterCatalog_id, language and indextype) is mapped to which solr server. In FEP7, all cores are configured to map to the Basic Configuration like so:
<_config:core catalog="10001" indexName="CatalogEntry"
In the earlier FEP's the cores map to their respective server configuration as there can be many. Here is an example:
<_config:core catalog="10001" indexName="CatalogEntry" language="en_US"
Note: In the dev env with solr running embedded on the commerce server, there is no port. This part of the query simply becomes: http://localhost/solr
A search server can have multiple indexes. Each master catalog has its own grouping of indexes based on index type and language. Currently we have three index types for each master catalog. CatalogEntry, UnstructuredContent, and CatalogGroup. The URL segment maps to:
So if we have two Master catalogs, each with two languages, the search server could potentially have 12 indexes. This list would look something like the following:
The list of indexes on a server can be found in the solr.xml file, located at : <solr home dir>/solr.xml. The solr home dir is wherever the customer specifies it. If solr is on the same machine as Commerce, it's likely in the following directory: <CommerceServer70 installDir>/instances/<instance name>/search/solr/home
This is the most important segment as it is the actual query. We will work through a few examples to help understand how the query is formed:
Example 1: select?q=name:coffee (the query field:q )
This returns all documents from an index where the field 'name' contains the value 'coffee'. Relating this solr jargon to a database:
Index = A single table
Document = Row
Field = Column
If this were a db2 query, it would read SELECT * FROM CATALOGENTRY WHERE CATALOGENTRY.NAME = 'COFFEE'
The results of the query are returned in XML format. If you are querying from a browser, it returns generally the first 10 documents unless otherwise specified. The number of rows can be changed by adding the &rows='value' parameter to the query Here is the response header, and the first document returned. The <doc></doc> tags contain a single document:
The field name is in red, the field value is in blue. You can see, this document was returned because the field 'name' had the value 'Coffee and Espresso Bar '
Example 2 : select?q=name:coffee&fl=catentry_id+price_USD+catenttype_id_ntk_cs (field limit: fl)
Here we just added the query parameter 'fl'. This is the SQL language equivalent of the 'select' part of the query. The database query would read:
SELECT CATENTRY_ID, PRICE_USD,catenttype_id_ntk_cs,FROM CATALOGENTRY WHERE CATALOGENTRY.NAME = 'COFFEE'
Results returned (Showing a single doc as an example:
Example 3:select?q=name:coffee&fl=catentry_id+price_USD+catenttype_id_ntk_cs&fq=-catenttype_id_ntk_cs:ProductBean (filter query field: fq)
Here we added the query parameter 'fq'. This stands for 'filter query'. From a database perspective, this is another way of changing your WHERE clause. The query can have a minus sign or a plus sign in front of it.
Minus sign (in the example above): Removes all documents returned that have the value 'ProductBean' for the catentype_id_ntk_cs field. It works more like an exclude. A database query may read:
SELECT CATENTRY_ID, PRICE_USD,catenttype_id_ntk_cs,FROM CATALOGENTRY WHERE CATALOGENTRY.NAME = 'COFFEE' AND CATALOGENTRY.TYPE NOT IN ('ProductBean');
Plus sign (changing the above example to include a plus): ... &fq=+catenttype_id_ntk_cs:ProductBean Includes ONLY the documents that have the value 'ProductBean' for the catentype_id_ntk_cs field. It works more like an include. A database query may read:
SELECT CATENTRY_ID, PRICE_USD,catenttype_id_ntk_cs,FROM CATALOGENTRY WHERE CATALOGENTRY.NAME = 'COFFEE' AND CATALOGENTRY.TYPE IN ('ProductBean');
If the minus sign were replaced with a plus sign, the results returned would be: