Start of change

Use binary callback

This advanced action passes an unprocessed, single or multivalued binary field to a memory hook that extracts textual information and passes it back to the decision plan as textual fields. You can invoke a C function or a Java™ class.

Configuring the action in Classification Workbench

To configure the Use binary callback action in the Add Action window:
  1. In the Hook name field, specify the name of the hook and whether it is a C function or a Java class.
  2. In the Context string area, specify the context string if you want to supply metadata to describe the buffers. You can specify a content field, temporary variable, or other data source. If a context string is not required, enter a placeholder value.
  3. Specify the binary input field. The data type of this field must be defined as binary and contain one or more binary buffers.

Creating and setting up the hook

To create a C hook called abc, compile a DLL file called abc.dll with a static RunBinary function.

Important: On Windows, you must compile the DLL file in release mode. Do not compile the DLL file in debug mode.

On AIX®, Linux, and Solaris, compile a library called abc.so.

Store the DLL file in the Classification_Home\AddOn folder that is created during installation (for example, C:\IBM\ContentClassification\AddOn).

To invoke a Java hook called xyz, create a Java class called xyz with a static RunBinary method and store it in the Addon folder as a class or jar file.

Hooks are loaded globally for each process. When you create a new hook and add it to the Classification_Home\AddOn folder, you must restart the IBM® Content Classification server for your changes to take effect.

Important: If you install the Content Classification server on a different computer than Classification Workbench, ensure that you save your DLL, class, or jar file in the Classification_Home\AddOn folder on the computer on which the server is installed.

You can include optional Init() and Finish() functions. Use these functions to load resources before processing begins and to unload resources on shutdown. The Init() function is invoked one time, the first time the hook is used. It receives the path to the AddOn folder where it can read any required configuration files into static variables. The Finish() function is also invoked one time when the program exits (for example, when you close Classification Workbench or stop the decision plan on the Content Classification server).

Tip: The binary hook cannot be tested in Classification Workbench directly. You can test the hook as follows:
  1. Create a run method in the hook, as described in Use callback.
  2. In Classification Workbench, add a Use callback advanced action. Use any method to send a file path to this hook from within the decision plan.
  3. In the run method, retrieve the buffer from the external file and send it to the runBinary method.
The hook can be tested outside of Classification Workbench by sending a buffer value the runBinary method.

If necessary, the hook can pass the buffer to an external conversion program. The context string can provide an extension or MIME type that indicates how the conversion program should process the buffer.

Output content fields are generated by the hook. If a generated output content field already exists in the content set, the content of the existing field is overwritten. If a generated output content field returns an empty value for a content item, the existing content field is deleted.

Tip: If your hook will generate new content fields, you can add them to a list of external content fields. This ensures that the generated fields are included in rule window options, decision plan reports, and lists of modified fields. In the Project Explorer panel, double-click External content fields and click "+" to add each content field that will be generated by the hook.

C and Java callback examples are provided in the following sections.

C binary callback usage

This section describes how C callback functions are written.

Prototype for the Init() function
wchar_t* Init(wchar_t* dirpath);
dirpath
The path to the AddOn folder. You can store configuration files in this folder and load them in Init().
Output
An error string allocated with malloc(), or NULL if no error occurred.
Prototype for the RunBinary() function
struct RexBuffer { size_t size; const char* data; };


int RunBinary(wchar_t* context, size_t n_buffers, RexBuffer* buffers, 
              int* out_count, wchar_t*** out_key, wchar_t*** out_val,
              wchar_t** out_error);
The callback processes input fields and then generates output fields and an optional error string. RunBinary() returns a negative value if an error occurs.
context
An input string that can direct the function to perform different actions.
n_buffers
The number of buffers passed by the decision plan to the hook.
buffers
The buffers that are passed by the decision plan to the hook.
buffers[0].size is the size of the first buffer.
buffers[0].data is the beginning of the first buffer.
out_count
The size of the output field array.
out_key, out_val
Arrays that are allocated and filled by the C callback. The arrays are allocated by the callback with malloc(). Each string in the arrays is allocated with malloc();
out_error
An error string allocated with malloc(), or NULL if no error occurred.
Prototype for the Finish() function
int Finish();
Release resources and return negative if an error occurs.

C binary callback hook example

In this example, the buffer size is returned as a string.

#define DO_EXPORT __declspec(dllexport)
#define WCSDUP _wcsdup
#else
#define DO_EXPORT
#define WCSDUP wcsdup
#endif

#define CN_MAX 100

struct RexBuffer { size_t size; const char* data; };

extern "C"
int  DO_EXPORT RunBinary(wchar_t* context, size_t n_buffers, RexBuffer* buffers,
		int* out_count, wchar_t*** out_key, wchar_t*** out_val,	wchar_t** out_error)
{
	if (n_buffers != 1 || buffers == 0)
	{
		*out_error = WCSDUP(L"Use one input buffer");
		*out_count = 0;
		return -1;
	}

	int buff_size = buffers[0].size;
	const char* buff_data = buffers[0].data;

	wchar_t **keys = (wchar_t**) malloc(sizeof(wchar_t*));
	wchar_t **vals = (wchar_t**) malloc(sizeof(wchar_t*));

	if (keys) keys[0] = WCSDUP(L"BufferSize");
	if (vals)
	{
		wchar_t tmp[CN_MAX];
		int nchar = swprintf(tmp, CN_MAX, L"%d", buff_size);
		vals[0] = WCSDUP(tmp);
	}

	if (keys && vals && keys[0] && vals[0])
	{
		*out_key = keys;
		*out_val = vals;
		*out_count = 1;
		return 0;
	}


	if (keys && keys[0]) free( keys[0]);
	if (vals && vals[0]) free( vals[0]);
	if (keys) free(keys);
	if (vals) free(vals);
	// if the error string cannot be allocated, the error code will suffice
	*out_count = 0;	
	*out_error = WCSDUP(L"allocation error");
	return -1;
}

Java binary callback hook example

This example of a binary hook extracts ASCII strings from the binary buffer. The example shows how a binary hook can be run as a regular in-memory hook for testing in Classification Workbench.
  • Binary hook usage: 'use_binary_callback BinToStr' (pass one or more buffers)
  • Regular hook usage: 'use_memory_callback BinToStr' (pass one or more file paths)
import java.util.Vector;
import java.io.File;
import java.io.FileInputStream;

public class BinToStr {

// The RunBinary function assumes that each buffer contains printable 
// 8-byte character data and copies it to a string.

	public static int RunBinary(String context, Vector buffs, Vector out_key, 
                              Vector out_val, Vector err)
	{
		String outputField = "BinaryToString";

		for (int i = 0; i < buffs.size(); ++i)
		{
			byte[] arr = (byte[]) buffs.get(i);
			StringBuilder bld = new StringBuilder();
			for (int j = 0; j < arr.length; ++j)
			{
				char c = (char) arr[j];
				if (Character.isLetterOrDigit(c) || Character.isWhitespace(c) || 
				(c >= 32 && c <= 126)) 
					bld.append(c);
				else
					bld.append(' ');
			}
			out_key.add(outputField);
			out_val.add(bld.toString());
		}

		return 0;
	}

	// This is a thin wrapper that loads the binaries from disk 
	// and then calls the binary hook.
	// Its purpose is to enable testing of the binary hook from 
	// Classification Workbench: call use_memory_hook instead of use_binary_hook,
	// and pass in file paths instead of buffers.
	// 
	public static int Run(String context, Vector in_key, Vector in_val, 
	                      Vector out_key, Vector out_val, Vector err)
	{
		Vector buffs = new Vector();
		try {
			for (int i = 0; i < in_val.size(); ++i)
			{
				String path = (String) in_val.get(i);

				File file = new File(path);
				long len = file.length();
				int ilen = (int)len;
				if ((long)ilen != len)
					throw new Exception("Input file is too big"); 

				byte[] b = new byte[ilen];
				FileInputStream fs = new FileInputStream(file);
				fs.read(b);
				fs.close();

				buffs.add(b);
			}
		}
		catch (Exception e)
		{
			err.add(e.toString());
			return -1;
		}

		return RunBinary(context, buffs, out_key, out_val, err);
	}

	public static void ThreadSafe()
	{}
}

Running multiple decide() threads in parallel

The Content Classification server can run several decide() threads in the same process, where each thread handles a different document.

Hooks do not run in parallel by default. You can override the default and boost the throughput of your hook by running parallel threads as follows:

  1. Verify that the hook is thread-safe.
  2. Add the following stub function to the hook DLL file or a static stub method to the Java class:
    void ThreadSafe() {}

When the stub is found in the hook library or class, the decision plan enables the RunBinary() function of the hook to be invoked concurrently.

Running Java hooks on AIX

On AIX operating systems, some Java hooks will not run unless you add the following directories to LIBPATH in the Classification_Home/Bin/bnsRun script:

Classification_Home/Java60/jre/lib/ppc

Classification_Home/Java60/jre/bin/j9vm

End of change