This advanced action passes an unprocessed, single or multivalued binary field to a memory hook that extracts textual information and passes it back to the decision plan as textual fields. You can invoke a C function or a Java™ class.
To create a C hook called abc, compile a DLL file called abc.dll with a static RunBinary function.
On AIX®, Linux, and Solaris, compile a library called abc.so.
Store the DLL file in the Classification_Home\AddOn folder that is created during installation (for example, C:\IBM\ContentClassification\AddOn).
To invoke a Java hook called xyz, create a Java class called xyz with a static RunBinary method and store it in the Addon folder as a class or jar file.
Hooks are loaded globally for each process. When you create a new hook and add it to the Classification_Home\AddOn folder, you must restart the IBM® Content Classification server for your changes to take effect.
You can include optional Init() and Finish() functions. Use these functions to load resources before processing begins and to unload resources on shutdown. The Init() function is invoked one time, the first time the hook is used. It receives the path to the AddOn folder where it can read any required configuration files into static variables. The Finish() function is also invoked one time when the program exits (for example, when you close Classification Workbench or stop the decision plan on the Content Classification server).
If necessary, the hook can pass the buffer to an external conversion program. The context string can provide an extension or MIME type that indicates how the conversion program should process the buffer.
Output content fields are generated by the hook. If a generated output content field already exists in the content set, the content of the existing field is overwritten. If a generated output content field returns an empty value for a content item, the existing content field is deleted.
C and Java callback examples are provided in the following sections.
This section describes how C callback functions are written.
wchar_t* Init(wchar_t* dirpath);
struct RexBuffer { size_t size; const char* data; };
int RunBinary(wchar_t* context, size_t n_buffers, RexBuffer* buffers,
int* out_count, wchar_t*** out_key, wchar_t*** out_val,
wchar_t** out_error);
int Finish();
In this example, the buffer size is returned as a string.
#define DO_EXPORT __declspec(dllexport)
#define WCSDUP _wcsdup
#else
#define DO_EXPORT
#define WCSDUP wcsdup
#endif
#define CN_MAX 100
struct RexBuffer { size_t size; const char* data; };
extern "C"
int DO_EXPORT RunBinary(wchar_t* context, size_t n_buffers, RexBuffer* buffers,
int* out_count, wchar_t*** out_key, wchar_t*** out_val, wchar_t** out_error)
{
if (n_buffers != 1 || buffers == 0)
{
*out_error = WCSDUP(L"Use one input buffer");
*out_count = 0;
return -1;
}
int buff_size = buffers[0].size;
const char* buff_data = buffers[0].data;
wchar_t **keys = (wchar_t**) malloc(sizeof(wchar_t*));
wchar_t **vals = (wchar_t**) malloc(sizeof(wchar_t*));
if (keys) keys[0] = WCSDUP(L"BufferSize");
if (vals)
{
wchar_t tmp[CN_MAX];
int nchar = swprintf(tmp, CN_MAX, L"%d", buff_size);
vals[0] = WCSDUP(tmp);
}
if (keys && vals && keys[0] && vals[0])
{
*out_key = keys;
*out_val = vals;
*out_count = 1;
return 0;
}
if (keys && keys[0]) free( keys[0]);
if (vals && vals[0]) free( vals[0]);
if (keys) free(keys);
if (vals) free(vals);
// if the error string cannot be allocated, the error code will suffice
*out_count = 0;
*out_error = WCSDUP(L"allocation error");
return -1;
}
import java.util.Vector;
import java.io.File;
import java.io.FileInputStream;
public class BinToStr {
// The RunBinary function assumes that each buffer contains printable
// 8-byte character data and copies it to a string.
public static int RunBinary(String context, Vector buffs, Vector out_key,
Vector out_val, Vector err)
{
String outputField = "BinaryToString";
for (int i = 0; i < buffs.size(); ++i)
{
byte[] arr = (byte[]) buffs.get(i);
StringBuilder bld = new StringBuilder();
for (int j = 0; j < arr.length; ++j)
{
char c = (char) arr[j];
if (Character.isLetterOrDigit(c) || Character.isWhitespace(c) ||
(c >= 32 && c <= 126))
bld.append(c);
else
bld.append(' ');
}
out_key.add(outputField);
out_val.add(bld.toString());
}
return 0;
}
// This is a thin wrapper that loads the binaries from disk
// and then calls the binary hook.
// Its purpose is to enable testing of the binary hook from
// Classification Workbench: call use_memory_hook instead of use_binary_hook,
// and pass in file paths instead of buffers.
//
public static int Run(String context, Vector in_key, Vector in_val,
Vector out_key, Vector out_val, Vector err)
{
Vector buffs = new Vector();
try {
for (int i = 0; i < in_val.size(); ++i)
{
String path = (String) in_val.get(i);
File file = new File(path);
long len = file.length();
int ilen = (int)len;
if ((long)ilen != len)
throw new Exception("Input file is too big");
byte[] b = new byte[ilen];
FileInputStream fs = new FileInputStream(file);
fs.read(b);
fs.close();
buffs.add(b);
}
}
catch (Exception e)
{
err.add(e.toString());
return -1;
}
return RunBinary(context, buffs, out_key, out_val, err);
}
public static void ThreadSafe()
{}
}
The Content Classification server can run several decide() threads in the same process, where each thread handles a different document.
Hooks do not run in parallel by default. You can override the default and boost the throughput of your hook by running parallel threads as follows:
When the stub is found in the hook library or class, the decision plan enables the RunBinary() function of the hook to be invoked concurrently.
On AIX operating systems, some Java hooks will not run unless you add the following directories to LIBPATH in the Classification_Home/Bin/bnsRun script:
Classification_Home/Java60/jre/lib/ppc
Classification_Home/Java60/jre/bin/j9vm