gRPC for elastic distributed inference
gRPC is available as part of the elastic distributed inference technical preview.
The code and application programming interfaces herein are technology preview information that may not be made generally available by IBM as or in a product.
You are permitted to use the information only for internal use for evaluation purposes and not for use in a production environment.
IBM provides the information without obligation of support and "as is" without warranty of any kind.
Using gRPC, a client application can directly call methods on a server application on a different machine as if it was a local object, making it easier to create distributed applications and services.
On the server side, the server implements this interface and runs a gRPC server to handle client calls. On the client side, the client has a stub (referred to a client in some programming languages) that provides the same methods as the server. Protocol buffers are used to define the data structure between the server and client.
What is included
For an elastic distributed inference service, there are two ways to authenticate to a stream connection between client and server, either through a user name and password, or through a token.
The auth()
method authenticates the user by username and password and return
user an authorized token. The stream_infer()
method transmits data to the service
and does the inferencing using the authorization token.
syntax = "proto3";
package redhare;
// The inference service definition.
service Inference {
//auth
rpc auth(AuthRequest) returns (AuthResponse) {}
// create a inference stream
rpc stream_infer (stream Request) returns (stream Response) {}
}
message AuthRequest {
// key string {"model_name" , "user_name", "password", "token"}
map<string, string> attributes = 1;
}
message AuthResponse {
string url = 1;
string token = 2;
}
message Request {
oneof request_oneof {
bytes data = 1;
string token = 2;
}
}
message Response {
bytes data = 1;
}
Client authentication
rpc auth(AuthRequest) returns (AuthResponse);
- The name of the model to be used
- Username and password of the user that has login access to the inference service
- The URL of the inference service for the model specified
- A token for further authentication
Creating an inference stream for a client
rpc stream_infer (stream Request) returns (stream Response) {}
To create an inference stream, either a token or data must be provided. Either a token that was received from a previous authentication or data that is transmitted to the inference service.
First, transmit a token through this API. If this token passes authentication, a stream connection will be established. Then, transmit data using this connection through this API continuously. The server will process data and return the inference result.
Using gRPC on client side
- Before starting to use gRPC, make sure of the following:
- An inference model kernel has been published using elastic distributed inference in the cluster management console. In this example, the model used is named darkflow. To learn more about creating a kernel file, see Create a kernel file for an inference service.
- The LBD service has been started and the host IP and port information
is found in ${DLI_SHARED_FS}/dlim/conf/dlim.conf. For example:
dlim.lbd.ip = 9.3.89.250 dlim.lbd.stream.port = 901
If you do not have access, contact the cluster administrator for this information.
- Compile stream protocol buffers and define import classes.
- Copy the content of stream.proto and compile
it.
python -m grpc_tools.protoc --python_out=. --grpc_python_out=. stream.proto
- The stream protocol buffers generate two python files stream_pb2.py and
stream_pb2_grpc.py, import them using the gRPC
APIs.
import stream_pb2 import stream_pb2_grpc
- Copy the content of stream.proto and compile
it.
- Authenticate to the client using user name and password. For
example:
channel = grpc.insecure_channel('9.3.89.250:9010') stub = stream_pb2_grpc.InferenceStub(channel) request = stream_pb2.AuthRequest() request.attributes["model_name"] = "darkflow" request.attributes["user_name"] = "Admin" request.attributes["password"] = "Admin" try: esponse = stub.auth(request) except Exception, e: print(" authentication failed, duo to: " + e.details()) return else: sendstream(response.token, response.url, video_path, quality, fps) request.attributes["password"] = "Admin"
- Connect the lbd service and get the stub.
- Input the model name, user name and password into an AuthReques.
- Call the auth() method to pass the authentication information to the server and save the result to the response.
- If authentication passes, a response will return the URL of model inference service and an authorized token.
- Creating an inference stream for the client.
- Connect to the lbd model and get the
stub.
channel = grpc.insecure_channel(url) stub = stream_pb2_grpc.InferenceStub(channel)
- Call
stream_infer
method to pass token and then data to server and save the result in response.- Pass the token to establish a
connection.
auth_request = stream_pb2.Request() auth_request.token = self.token ... responses = self.stub.stream_infer(auth_request)
- Once the connection is ready, pass inference data to server for
inferencing.
tinput['key'] = self.count tinput['data'] = base64.encodestring(imgData.tostring()).decode("utf-8").replace('\n','').replace('\r','') request.data = json.dumps(tinput) ... responses = self.stub.stream_infer(request)
- Pass the token to establish a
connection.
- An inference stream is now running using gRPC.
- Connect to the lbd model and get the
stub.