July 20, 2020 By Jason Zhou 2 min read

IBM Watson™ Discovery makes it possible to rapidly build cognitive, cloud-based exploration applications that unlock actionable insights hidden in unstructured data.

This blog post describes a scenario where a user encounters the error of An unexpected error occurred while processing your document when uploading a document to the Watson Discovery service and how to resolve the issue. 

Problem description

When using IBM Watson™ Discovery to upload a few JSON documents, a user receive the following error message shown in the screenshot below:

Further investigation shows that there are some errors that occurred during the document conversion phase:

Error: Illegal unquoted character ((CTRL-CHAR, code 9)): has to be escaped using backslash to be included in string value at [Source: (org.apache.commons.io.input.CloseShieldInputStream); line: 11, column: 2995]

Cause of the problem

As described in the above error message, there are some illegal characters included in the string value, resulting in the conversion error.

Solution

To be able to bypass the error, the JSON documents need to be updated to conform to the JSON standard. The user may utilise some free online JSON validator tools to verify the content of JSON document. For example, this website can be used for this purpose at the time of writing this blog.

Sample output with the problematic JSON document using the above online validator:

The user may take advantage of existing Perl/Python scripts to further find/replace the illegal characters from the JSON document. (One useful post from online forum.)

The script:

perl -Mcharnames=:full -C -l -0777 -ne '
  while (/"(?:\\.|[^"])*"/g) {
    my $offset = $-[0];
    my $string = $&;
    @ctrl = map {charnames::viacode(ord($_))} $string =~ /\p{PosixCntrl}/g;
    if (@ctrl) {
       print "Offset: $offset, String: $string, Ctrl: ". join "+", @ctrl
    }
  }' file.json

Once the JSON document has been updated and passed the JSON validator, the user may try to upload the document to the Watson Discovery collection again. The same error shouldn’t occur anymore.

Summary

The IBM Watson Discovery service supports various format of documents. Supported document types for Smart Document Understanding:

  • Lite plans: PDF, Word, PowerPoint, Excel, JSON*, HTML*
  • Advanced plans: PDF, Word, PowerPoint, Excel, PNG**, TIFF**, JPG**, JSON*, HTML*

The link to the Watson Discovery supported document types is here.

In this particular scenario, the JSON document contains illegal control characters and that causes the conversion issue after uploading to Watson Discovery service.

Was this article helpful?
YesNo

More from Cloud

Bigger isn’t always better: How hybrid AI pattern enables smaller language models

5 min read - As large language models (LLMs) have entered the common vernacular, people have discovered how to use apps that access them. Modern AI tools can generate, create, summarize, translate, classify and even converse. Tools in the generative AI domain allow us to generate responses to prompts after learning from existing artifacts. One area that has not seen much innovation is at the far edge and on constrained devices. We see some versions of AI apps running locally on mobile devices with…

IBM Tech Now: April 8, 2024

< 1 min read - ​Welcome IBM Tech Now, our video web series featuring the latest and greatest news and announcements in the world of technology. Make sure you subscribe to our YouTube channel to be notified every time a new IBM Tech Now video is published. IBM Tech Now: Episode 96 On this episode, we're covering the following topics: IBM Cloud Logs A collaboration with IBM watsonx.ai and Anaconda IBM offerings in the G2 Spring Reports Stay plugged in You can check out the…

The advantages and disadvantages of private cloud 

6 min read - The popularity of private cloud is growing, primarily driven by the need for greater data security. Across industries like education, retail and government, organizations are choosing private cloud settings to conduct business use cases involving workloads with sensitive information and to comply with data privacy and compliance needs. In a report from Technavio (link resides outside ibm.com), the private cloud services market size is estimated to grow at a CAGR of 26.71% between 2023 and 2028, and it is forecast to increase by…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters