July 20, 2020 By Jason Zhou 2 min read

IBM Watson™ Discovery makes it possible to rapidly build cognitive, cloud-based exploration applications that unlock actionable insights hidden in unstructured data.

This blog post describes a scenario where a user encounters the error of An unexpected error occurred while processing your document when uploading a document to the Watson Discovery service and how to resolve the issue. 

Problem description

When using IBM Watson™ Discovery to upload a few JSON documents, a user receive the following error message shown in the screenshot below:

Further investigation shows that there are some errors that occurred during the document conversion phase:

Error: Illegal unquoted character ((CTRL-CHAR, code 9)): has to be escaped using backslash to be included in string value at [Source: (org.apache.commons.io.input.CloseShieldInputStream); line: 11, column: 2995]

Cause of the problem

As described in the above error message, there are some illegal characters included in the string value, resulting in the conversion error.

Solution

To be able to bypass the error, the JSON documents need to be updated to conform to the JSON standard. The user may utilise some free online JSON validator tools to verify the content of JSON document. For example, this website can be used for this purpose at the time of writing this blog.

Sample output with the problematic JSON document using the above online validator:

The user may take advantage of existing Perl/Python scripts to further find/replace the illegal characters from the JSON document. (One useful post from online forum.)

The script:

perl -Mcharnames=:full -C -l -0777 -ne '
  while (/"(?:\\.|[^"])*"/g) {
    my $offset = $-[0];
    my $string = $&;
    @ctrl = map {charnames::viacode(ord($_))} $string =~ /\p{PosixCntrl}/g;
    if (@ctrl) {
       print "Offset: $offset, String: $string, Ctrl: ". join "+", @ctrl
    }
  }' file.json

Once the JSON document has been updated and passed the JSON validator, the user may try to upload the document to the Watson Discovery collection again. The same error shouldn’t occur anymore.

Summary

The IBM Watson Discovery service supports various format of documents. Supported document types for Smart Document Understanding:

  • Lite plans: PDF, Word, PowerPoint, Excel, JSON*, HTML*
  • Advanced plans: PDF, Word, PowerPoint, Excel, PNG**, TIFF**, JPG**, JSON*, HTML*

The link to the Watson Discovery supported document types is here.

In this particular scenario, the JSON document contains illegal control characters and that causes the conversion issue after uploading to Watson Discovery service.

Was this article helpful?
YesNo

More from Cloud

Maximize business outcomes on IBM Cloud with Concierge Platinum Services

2 min read - In the rapidly evolving digital landscape, we see that businesses are increasingly migrating to cloud services to enhance their operations, boost productivity and foster innovation. However, the process of transitioning clients to the cloud can often be intricate and time-intensive. To tackle this challenge head-on, IBM® offers clients access to a specialized Concierge Platinum Team, which is equipped with top-tier skills and expertise, to help expedite the cloud onboarding process and provide a smooth transition to Day Two Operations. Provide…

Bigger isn’t always better: How hybrid AI pattern enables smaller language models

5 min read - As large language models (LLMs) have entered the common vernacular, people have discovered how to use apps that access them. Modern AI tools can generate, create, summarize, translate, classify and even converse. Tools in the generative AI domain allow us to generate responses to prompts after learning from existing artifacts. One area that has not seen much innovation is at the far edge and on constrained devices. We see some versions of AI apps running locally on mobile devices with…

IBM Tech Now: April 8, 2024

< 1 min read - ​Welcome IBM Tech Now, our video web series featuring the latest and greatest news and announcements in the world of technology. Make sure you subscribe to our YouTube channel to be notified every time a new IBM Tech Now video is published. IBM Tech Now: Episode 96 On this episode, we're covering the following topics: IBM Cloud Logs A collaboration with IBM watsonx.ai and Anaconda IBM offerings in the G2 Spring Reports Stay plugged in You can check out the…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters