Topic
  • 1 reply
  • Latest Post - ‏2013-01-03T16:30:51Z by jlerm
jlerm
jlerm
15 Posts

Pinned topic JAQL example to run text analytics extractor on cluster

‏2013-01-03T01:22:45Z |
Does anyone have an example of a JAQL script to execute text analytics extractors in parallel on a BigInsights 2.0 cluster?

I can see the different pieces of the puzzle in different places (including the documentation for "annotateDocument"), but I could not find a single example putting everything together.

I'm looking for something along these lines:

jaql> ls ("/path/to/files/*.txt").path
->batch(10)
->arrayRead()
->read(...)
->annotateDocument(...)
...

I understand that for the AQL, the entire content of each document should be contained in a field named "text".
However, if I read files using "read(del(...))", then each line ends up in a separate element of an array.

Thanks,

Julius
Updated on 2013-01-03T16:30:51Z at 2013-01-03T16:30:51Z by jlerm
  • jlerm
    jlerm
    15 Posts

    Re: JAQL example to run text analytics extractor on cluster

    ‏2013-01-03T16:30:51Z  
    I guess I found the answer by publishing an application and looking at the artifacts.

    This is the code segment for reading a plain text file (it uses the "strJoin" to put the lines together into a single "text" field):

    readtext = fn (path2txt) (
    [{
    text: strJoin(read(lines(path2txt)), "\n"),
    label: path2txt }]
    );
    Julius