gRPC for elastic distributed inference

gRPC is available as part of the elastic distributed inference technical preview.

Important:

The code and application programming interfaces herein are technology preview information that may not be made generally available by IBM as or in a product.

You are permitted to use the information only for internal use for evaluation purposes and not for use in a production environment.

IBM provides the information without obligation of support and "as is" without warranty of any kind.

Using gRPC, a client application can directly call methods on a server application on a different machine as if it was a local object, making it easier to create distributed applications and services.

On the server side, the server implements this interface and runs a gRPC server to handle client calls. On the client side, the client has a stub (referred to a client in some programming languages) that provides the same methods as the server. Protocol buffers are used to define the data structure between the server and client.

What is included

For an elastic distributed inference service, there are two ways to authenticate to a stream connection between client and server, either through a user name and password, or through a token.

The auth() method authenticates the user by username and password and return user an authorized token. The stream_infer() method transmits data to the service and does the inferencing using the authorization token.

The entire content of stream.proto is as follows:

syntax = "proto3";
package redhare;

// The inference service definition.
service Inference {
  //auth
  rpc auth(AuthRequest) returns (AuthResponse) {}
  // create a inference stream
  rpc stream_infer (stream Request) returns (stream Response) {}
}

message AuthRequest {
  // key string {"model_name" , "user_name", "password", "token"}
  map<string, string> attributes = 1;
}

message AuthResponse {
  string url = 1;
  string token = 2;
}

message Request {
  oneof request_oneof {
      bytes data = 1;
      string token = 2;
  }
}

message Response { 
  bytes data = 1;
}

Client authentication

Client authentication aims to authenticate user by password and establishes a connection to LMD server.

rpc auth(AuthRequest) returns (AuthResponse);

It requests that the following is provided:

The name of the model to be used
Username and password of the user that has login access to the inference service

Once authentication is passed, the following is returned:

The URL of the inference service for the model specified
A token for further authentication

Creating an inference stream for a client

Create an inference stream to a specific model with token and then transmit data and get inference result.

rpc stream_infer (stream Request) returns (stream Response) {}

To create an inference stream, either a token or data must be provided. Either a token that was received from a previous authentication or data that is transmitted to the inference service.

First, transmit a token through this API. If this token passes authentication, a stream connection will be established. Then, transmit data using this connection through this API continuously. The server will process data and return the inference result.

Using gRPC on client side

In this section, learn how to use gRPC APIs on client side by a step-by-step sample using Python language.

Before starting to use gRPC, make sure of the following:
1. An inference model kernel has been published using elastic distributed inference in the cluster management console. In this example, the model used is named darkflow. To learn more about creating a kernel file, see Create a kernel file for an inference service.
2. The LBD service has been started and the host IP and port information is found in ${DLI_SHARED_FS}/dlim/conf/dlim.conf.
  For example:
```
dlim.lbd.ip = 9.3.89.250
dlim.lbd.stream.port = 901
```
  If you do not have access, contact the cluster administrator for this information.
Compile stream protocol buffers and define import classes.
1. Copy the content of stream.proto and compile it.
```
python -m grpc_tools.protoc --python_out=. --grpc_python_out=. stream.proto
```
2. The stream protocol buffers generate two python files stream_pb2.py and stream_pb2_grpc.py, import them using the gRPC APIs.
```
import stream_pb2
import stream_pb2_grpc
```

Authenticate to the client using user name and password. For example:

channel = grpc.insecure_channel('9.3.89.250:9010')
stub = stream_pb2_grpc.InferenceStub(channel)
request = stream_pb2.AuthRequest()
request.attributes["model_name"] = "darkflow"
request.attributes["user_name"] = "Admin"
request.attributes["password"] = "Admin"
 try:
   esponse = stub.auth(request)
 except Exception, e:
   print(" authentication failed, duo to: " + e.details())
   return
 else:
   sendstream(response.token, response.url, video_path, quality, fps)
request.attributes["password"] = "Admin"

Connect the lbd service and get the stub.
Input the model name, user name and password into an AuthReques.
Call the auth() method to pass the authentication information to the server and save the result to the response.
If authentication passes, a response will return the URL of model inference service and an authorized token.

Creating an inference stream for the client.

Connect to the lbd model and get the stub.

channel = grpc.insecure_channel(url)
stub = stream_pb2_grpc.InferenceStub(channel)

Call stream_infer method to pass token and then data to server and save the result in response.

Pass the token to establish a connection.

auth_request = stream_pb2.Request()
auth_request.token = self.token
...
responses = self.stub.stream_infer(auth_request)

Once the connection is ready, pass inference data to server for inferencing.

tinput['key'] = self.count
tinput['data'] = base64.encodestring(imgData.tostring()).decode("utf-8").replace('\n','').replace('\r','')
request.data = json.dumps(tinput)
...
responses = self.stub.stream_infer(request)

An inference stream is now running using gRPC.