IBM Watson™ Discovery makes it possible to rapidly build cognitive, cloud-based exploration applications that unlock actionable insights hidden in unstructured data.

This blog post describes a scenario where a user encounters the error of An unexpected error occurred while processing your document when uploading a document to the Watson Discovery service and how to resolve the issue. 

Problem description

When using IBM Watson™ Discovery to upload a few JSON documents, a user receive the following error message shown in the screenshot below:


Further investigation shows that there are some errors that occurred during the document conversion phase:

Error: Illegal unquoted character ((CTRL-CHAR, code 9)): has to be escaped using backslash to be included in string value at [Source: (org.apache.commons.io.input.CloseShieldInputStream); line: 11, column: 2995]

Cause of the problem

As described in the above error message, there are some illegal characters included in the string value, resulting in the conversion error.

Solution

To be able to bypass the error, the JSON documents need to be updated to conform to the JSON standard. The user may utilise some free online JSON validator tools to verify the content of JSON document. For example, this website can be used for this purpose at the time of writing this blog.

Sample output with the problematic JSON document using the above online validator:


The user may take advantage of existing Perl/Python scripts to further find/replace the illegal characters from the JSON document. (One useful post from online forum.)

The script:

perl -Mcharnames=:full -C -l -0777 -ne '
  while (/"(?:\\.|[^"])*"/g) {
    my $offset = $-[0];
    my $string = $&;
    @ctrl = map {charnames::viacode(ord($_))} $string =~ /\p{PosixCntrl}/g;
    if (@ctrl) {
       print "Offset: $offset, String: $string, Ctrl: ". join "+", @ctrl
    }
  }' file.json
Scroll to view full table

Once the JSON document has been updated and passed the JSON validator, the user may try to upload the document to the Watson Discovery collection again. The same error shouldn’t occur anymore.

Summary

The IBM Watson Discovery service supports various format of documents. Supported document types for Smart Document Understanding:

  • Lite plans: PDF, Word, PowerPoint, Excel, JSON*, HTML*
  • Advanced plans: PDF, Word, PowerPoint, Excel, PNG**, TIFF**, JPG**, JSON*, HTML*

The link to the Watson Discovery supported document types is here.

In this particular scenario, the JSON document contains illegal control characters and that causes the conversion issue after uploading to Watson Discovery service.

More from Cloud

Modernizing child support enforcement with IBM and AWS

7 min read - With 68% of child support enforcement (CSE) systems aging, most state agencies are currently modernizing them or preparing to modernize. More than 20% of families and children are supported by these systems, and with the current constituents of these systems becoming more consumer technology-centric, the use of antiquated technology systems is archaic and unsustainable. At this point, families expect state agencies to have a modern, efficient child support system. The following are some factors driving these states to pursue modernization:…

7 min read

IBM Cloud Databases for Elasticsearch End of Life and pricing changes

2 min read - As part of our partnership with Elastic, IBM is announcing the release of a new version of IBM Cloud Databases for Elasticsearch. We are excited to bring you an enhanced offering of our enterprise-ready, fully managed Elasticsearch. Our partnership with Elastic means that we will be able to offer more, richer functionality and world-class levels of support. The release of version 7.17 of our managed database service will include support for additional functionality, including things like Role Based Access Control…

2 min read

Connected products at the edge

6 min read - There are many overlapping business usage scenarios involving both the disciplines of the Internet of Things (IoT) and edge computing. But there is one very practical and promising use case that has been commonly deployed without many people thinking about it: connected products. This use case involves devices and equipment embedded with sensors, software and connectivity that exchange data with other products, operators or environments in real-time. In this blog post, we will look at the frequently overlooked phenomenon of…

6 min read

SRG Technology drives global software services with IBM Cloud VPC under the hood

4 min read - Headquartered in Ft. Lauderdale, Florida, SRG Technology LLC. (SRGT) is a software development company supporting the education, healthcare and travel industries. Their team creates data systems that deliver the right data in real time to customers around the globe. Whether those customers are medical offices and hospitals, schools or school districts, government agencies, or individual small businesses, SRGT addresses a wide spectrum of software services and technology needs with round-the-clock innovative thinking and fresh approaches to modern data problems. The…

4 min read