July 20, 2020 By Jason Zhou 2 min read

IBM Watson™ Discovery makes it possible to rapidly build cognitive, cloud-based exploration applications that unlock actionable insights hidden in unstructured data.

This blog post describes a scenario where a user encounters the error of An unexpected error occurred while processing your document when uploading a document to the Watson Discovery service and how to resolve the issue. 

Problem description

When using IBM Watson™ Discovery to upload a few JSON documents, a user receive the following error message shown in the screenshot below:

Further investigation shows that there are some errors that occurred during the document conversion phase:

Error: Illegal unquoted character ((CTRL-CHAR, code 9)): has to be escaped using backslash to be included in string value at [Source: (org.apache.commons.io.input.CloseShieldInputStream); line: 11, column: 2995]

Cause of the problem

As described in the above error message, there are some illegal characters included in the string value, resulting in the conversion error.

Solution

To be able to bypass the error, the JSON documents need to be updated to conform to the JSON standard. The user may utilise some free online JSON validator tools to verify the content of JSON document. For example, this website can be used for this purpose at the time of writing this blog.

Sample output with the problematic JSON document using the above online validator:

The user may take advantage of existing Perl/Python scripts to further find/replace the illegal characters from the JSON document. (One useful post from online forum.)

The script:

perl -Mcharnames=:full -C -l -0777 -ne '
  while (/"(?:\\.|[^"])*"/g) {
    my $offset = $-[0];
    my $string = $&;
    @ctrl = map {charnames::viacode(ord($_))} $string =~ /\p{PosixCntrl}/g;
    if (@ctrl) {
       print "Offset: $offset, String: $string, Ctrl: ". join "+", @ctrl
    }
  }' file.json

Once the JSON document has been updated and passed the JSON validator, the user may try to upload the document to the Watson Discovery collection again. The same error shouldn’t occur anymore.

Summary

The IBM Watson Discovery service supports various format of documents. Supported document types for Smart Document Understanding:

  • Lite plans: PDF, Word, PowerPoint, Excel, JSON*, HTML*
  • Advanced plans: PDF, Word, PowerPoint, Excel, PNG**, TIFF**, JPG**, JSON*, HTML*

The link to the Watson Discovery supported document types is here.

In this particular scenario, the JSON document contains illegal control characters and that causes the conversion issue after uploading to Watson Discovery service.

Was this article helpful?
YesNo

More from Cloud

New IBM study: How business leaders can harness the power of gen AI to drive sustainable IT transformation

3 min read - As organizations strive to balance productivity, innovation and environmental responsibility, the need for sustainable IT practices is even more pressing. A new global study from the IBM Institute for Business Value reveals that emerging technologies, particularly generative AI, can play a pivotal role in advancing sustainable IT initiatives. However, successful transformation of IT systems demands a strategic and enterprise-wide approach to sustainability. The power of generative AI in sustainable IT Generative AI is creating new opportunities to transform IT operations…

X-Force report reveals top cloud threats: AITM phishing, business email compromise, credential harvesting and theft

4 min read - As we step into October and mark the start of Cybersecurity Awareness Month, organizations’ focus on protecting digital assets has never been more important. As innovative new cloud and generative AI solutions help advance today’s businesses, it’s also important to understand how these solutions have added to the complexity of today’s cyber threats, and how organizations can address them. That’s why IBM—as a leading global security, cloud, AI and business service provider—advocates to our global clients to take a proactive…

Top 6 innovations from the IBM – AWS GenAI Hackathon

5 min read - Eight client teams collaborated with IBM® and AWS this spring to develop generative AI prototypes to address real-world business challenges in the public sector, financial services, energy, healthcare and other industries. Over the course of several weeks, cross-functional teams comprising client teams, IBM and AWS representatives worked to design, develop and iterate on prototypes that push the boundaries of what's possible with generative AI. IBM used design thinking and user-centric approach to guide the teams throughout the hackathon. AWS provided…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters