Topic
2 replies Latest Post - ‏2012-10-11T15:46:37Z by SystemAdmin
SystemAdmin
SystemAdmin
197 Posts
ACCEPTED ANSWER

Pinned topic Non web crawler plugin error

‏2012-10-11T11:45:17Z |
Hello:

We are trying to develop a non web crawler plugin for ICA V3.0 installed in a Windows 7 platform. Our target is to use that crawler for ingesting content from files that have a csv format. This is the current code of the plugin:

package com.caixa.bigdata;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.text.SimpleDateFormat;
import java.util.ArrayList;
import java.util.Date;
import java.util.List;
import java.util.StringTokenizer;

import com.ibm.es.crawler.plugin.AbstractCrawlerPlugin;
import com.ibm.es.crawler.plugin.Content;
import com.ibm.es.crawler.plugin.CrawledData;
import com.ibm.es.crawler.plugin.CrawlerPluginException;
import com.ibm.es.crawler.plugin.FieldMetadata;

public class CaixaInterestCrawlerPlugin extends AbstractCrawlerPlugin {

+ /**+
+ * Default constructor.+
+ */+
+ public CaixaInterestCrawlerPlugin() {+
+ super();+
+ }+

+ /**+
+ * Initialize this object.+
+ **+
+ This sample program has nothing in this method.+
+ **+
+ @see com.ibm.es.crawler.plugin.AbstractCrawlerPlugin#init()+
+ */+
+ public void init() throws CrawlerPluginException {+
+ /*+
+ */+
+ }+

+ /**+
+ * Returns the Boolean value for metadata usage.+
+ **+
+ This sample program returns true.+
+ **+
+ @see com.ibm.es.crawler.plugin.AbstractCrawlerPlugin#isMetadataUsed()+
+ */+
+ public boolean isMetadataUsed() {+
+ /*+
+ */+
+ return true;+
+ }+

+ /**+
+ * Terminate this object.+
+ **+
+ This sample program has nothing in this method.+
+ **+
+ @see com.ibm.es.crawler.plugin.AbstractCrawlerPlugin#term()+
+ */+
+ public void term() throws CrawlerPluginException {+
+ /*+
+ * Tips+
+ */+
+ return;+
+ }+

+ /**+
+ * Update crawled data.+
+ **+
+ This sample program updates the security tokens.+
+ **+
+ @see com.ibm.es.crawler.plugin.AbstractCrawlerPlugin#updateDocument+
+ (com.ibm.es.crawler.plugin.CrawledData)+
+ */+
+ public CrawledData updateDocument(CrawledData crawledData) throws CrawlerPluginException+
+ {+
+ List metadataList = crawledData.getMetadataList();+
+ if (metadataList == null)+
+ {+
+ metadataList = new ArrayList();+
+ }+
+ +
+ /*+
+ * Tips+
+ * If your crawler plugin module rejects some crawled data,+
+ * add the check code here and returns null.+
+ */+
+ // This sample always returns updated document.+
+ if (false) {+
+ return null;+
+ }+
+ +
+ /*+
+ * Update Content. since 8.3+
+ */+
+ Content content = crawledData.getOriginalContent();+
+ +
+ java.io.InputStream in = null;+
+ try+
+ {+
+ // if the original crawled content is null, create the new content.+
+ if(content == null)+
+ {+
+ crawledData.createNewContent();+
+ content = crawledData.createNewContent();+
+ }+
+ else+
+ {+
+ // if the original crawled content exists, get InputStream+
+ // object to access it.+
+ in = content.getInputStream();+
+ // read the content+
+ BufferedReader br = new BufferedReader(new InputStreamReader(in));+
+ String text = br.readLine();+
+ StringTokenizer st = new StringTokenizer(text, ";");+
+ String fechaString = st.nextToken();+
+ long fechaLong = Long.valueOf(fechaString);+
+ Date fechaDate = new Date(fechaLong);+
+ SimpleDateFormat format = new SimpleDateFormat("dd/MM/yyyy hh:mm:ss");+
+ String formatedDate = format.format(fechaDate);+
+ // ahora generamos el FieldMetadata para la fecha de la entrada+
+ FieldMetadata newFieldMetaData1 = new FieldMetadata("date", formatedDate);+
+ metadataList.add(newFieldMetaData1);+
+ // ignoramos el siguiente campo+
+ st.nextToken();+
+ // y ahora leemos el campo autor+
+ String autor = st.nextToken();+
+ // ahora generamos el FieldMetadata para el autor de la entrada+
+ FieldMetadata newFieldMetaData2 = new FieldMetadata("author", autor);+
+ metadataList.add(newFieldMetaData2);+
+ // ignoramos el siguiente campo+
+ st.nextToken(); +
+ // ahora generamos el FieldMetadata para el mensaje de la entrada+
+ String mensaje = st.nextToken();+
+ FieldMetadata newFieldMetaData3 = new FieldMetadata("body", mensaje);+
+ metadataList.add(newFieldMetaData3); +
+ // ahora fijamos la lista de metadatas+
+ crawledData.setMetadataList(metadataList); +
+ in.close();+
+ }+
+ }+
+ catch(IOException ioe)+
+ {+
+ throw new CrawlerPluginException(ioe);+
+ }+
+ +
+ // set information against the content.+
+ content.setCodepage("UTF-8");+
+ content.setCodepageAutoDetection(true);+
+ content.setMimeType("text/plain");+
+ crawledData.submitContent(content);+
+ return crawledData;+
+ } +
+ +
+ /* (non-Javadoc)+
+ * @see com.ibm.es.crawler.plugin.AbstractCrawlerPlugin#isContentUsed()+
+ */+
+ public boolean isContentUsed() {+
+ return true;+
+ } +

}

We have generated a jar file named CrawlerPlugin.jar for the java project that contains this code and copied the jar file a specific folder (c:\temp\plugins). We have defined the crawler and entered "com.caixa.bigdata.CaixaInterestCrawlerPlugin" as the plugin class name and "C:\temp\plugins\CrawlerPlugin.jar" as the plugin class path. The crawl space just contains the folder where the csv files to be crawled are located.

When we start the crawler we get the following error:

Severity: Error
Component: crawler
Session:
Message: FFQD3129E A CrawlerPluginException was thrown by the crawler plug-in.
Line number: 663
Function name: com.ibm.es.crawler.publish.AbstractFormatter.format
File name: AbstractFormatter.java

Can somebody please help us to solve this problem?

Thanks in advance and best regards.
Manuel Romo
Updated on 2012-10-11T15:46:37Z at 2012-10-11T15:46:37Z by SystemAdmin
  • bfoyle
    bfoyle
    60 Posts
    ACCEPTED ANSWER

    Re: Non web crawler plugin error

    ‏2012-10-11T14:59:24Z  in response to SystemAdmin
    I'm sorry I can't help with this problem but I do want to point out that the CSV import function was designed specifically to handle importing of CSV files from a directory. Have you considered using that approach?

    bf
    • SystemAdmin
      SystemAdmin
      197 Posts
      ACCEPTED ANSWER

      Re: Non web crawler plugin error

      ‏2012-10-11T15:46:37Z  in response to bfoyle
      Hello Bob:

      We are aware of the importing function for csv files. We want to leverage some functionality available in the crawler that is not available (correct me if I am wrong) in the importing process like being able to schedule it.

      Anyway, I suppose that the error is independent of the file type and/or format.

      Best regards.
      Manuel Romo