Topic
  • 2 replies
  • Latest Post - ‏2012-12-01T12:24:22Z by SystemAdmin
SystemAdmin
SystemAdmin
197 Posts

Pinned topic Problem with using method setLinks of PostparsePlugin

‏2012-11-19T10:39:48Z |
Hi,

I want to use "setLinks()" to add URL that are not extracted by default (links generated by calling JavaScript method).

This is the code I use to do the job :

public boolean processDocument(PostparsePluginArg[] args) {
PostparsePluginArg1 arg = (PostparsePluginArg1)args[0];

if (arg.getContent() != null && arg.getContent().length > 0)
{
debugLog("PostparsePluginArg1.getURL():" + arg.getURL());

ArrayList<String> list = new ArrayList<String>();
list.add("url1");
list.add("url2");
arg.setLinks(list);

arg.setSave(true);
}

return true;
} // end processDocument

Methods to get information ("getURL()", "getLinks()") work fine.
I try to use the setLinks() method to add URLs (url1 and url2) but they are not crawled (these URLs are in the domain allowed).
I have no error in log.
Does anybody has an idea what I'm doing wrong?

Thanks a lot,
Greetings,
Matthew
Updated on 2012-12-01T12:24:22Z at 2012-12-01T12:24:22Z by SystemAdmin
  • TeruoKoyanagi
    TeruoKoyanagi
    8 Posts

    Re: Problem with using method setLinks of PostparsePlugin

    ‏2012-11-27T05:34:38Z  
    Hi,

    It may depend on the URL. For example, web crawler do not crawl the URL including user name.

    The following is the your code with my test URLs. It actually works in my environment:

    public boolean processDocument(PostparsePluginArg[] args) {
    PostparsePluginArg1 arg = (PostparsePluginArg1)args[0];

    if (arg.getContent() != null && arg.getContent().length > 0) {
    debugLog("PostparsePluginArg1.getURL():" + arg.getURL());

    ArrayList<String> list = new ArrayList<String>();
    list.add("http://teruok7/plugintest/page1.html");
    list.add("http://teruok7/plugintest/page2.html");
    arg.setLinks(list);

    arg.setSave(true);
    }

    return true;
    } // end processDocument

    Thanks,
    Teruo
  • SystemAdmin
    SystemAdmin
    197 Posts

    Re: Problem with using method setLinks of PostparsePlugin

    ‏2012-12-01T12:24:22Z  
    Hi,

    It may depend on the URL. For example, web crawler do not crawl the URL including user name.

    The following is the your code with my test URLs. It actually works in my environment:

    public boolean processDocument(PostparsePluginArg[] args) {
    PostparsePluginArg1 arg = (PostparsePluginArg1)args[0];

    if (arg.getContent() != null && arg.getContent().length > 0) {
    debugLog("PostparsePluginArg1.getURL():" + arg.getURL());

    ArrayList<String> list = new ArrayList<String>();
    list.add("http://teruok7/plugintest/page1.html");
    list.add("http://teruok7/plugintest/page2.html");
    arg.setLinks(list);

    arg.setSave(true);
    }

    return true;
    } // end processDocument

    Thanks,
    Teruo
    Hi Teruo,

    Thanks for your reply.

    As you say the code works in your environment, I delete the crawler and re-create it.
    And now it seems to work... It looks like some informations are keep in the crawler even if I do a full recrawl.
    Does it means that I must re-create the crawler after each modification of the plugin?

    Regards
    Matthew