Skip to main content

alphaWorks  >  Forums  >  IBM Dynamic Application Virtualization  >  developerWorks

Error code: 10000 - what does it mean?    Point your RSS reader here for a feed of the latest messages in this thread


     

 
 

My developerWorks
 Welcome, Guest
Sign in or register
This question is answered.

Permlink Replies: 5 - Pages: 1 - Last Post: Nov 4, 2009 2:38 PM Last Post By: _Big_Mac_ Threads: [ Previous | Next ]
_Big_Mac_

Posts: 19
Registered: Oct 05, 2009 12:53:35 PM
Error code: 10000 - what does it mean?
Posted: Oct 21, 2009 11:07:35 AM
 
Click to report abuse...   Click to reply to this thread Reply
Hi
I'm trying to make DAV work with NVIDIA CUDA.
The way I do this is I compile a piece of CUDA C code into a shared library and hook it up as a normal service. It should be completely encapsulated.

When I compile my code in "device emulation" mode (meaning it gets executed on the CPU) it works fine. But when I target a GPU, problems start.
I start the broker and the server, status shows my cuda library is running:
[root@localhost bin]# ./davServiceBroker -d &
[1] 4947
[root@localhost bin]# ./dav start
[root@localhost bin]# ./dav status
Service		Port		Status
Library		12300		Running
blas		12301		Running
cudaDavApp		12303		Running


I then launch a test app on the client (properly virtualized) and it fails, throwing a std::string exception with no text (empty string). On the server side, this error gets logged as:

[ERR] 4991-3087718096 : WB-Error in executing the request: 
<0>:SIGNAL_IN_LIBRARY:Severe error in library (cudaDavApp) fuction (vectorAdd).  Error code: 10000


After this, the process "davService cudaDavApp" hangs and doesn't get properly killed by ./davServiceBroker -s (I have to do this manually).

My CUDA code is here:

#include "cudaDavApp.h"
 
__global__ void vectorAddKernel(float a[], float b[], float result[], int size)
{
	int tid = threadIdx.x + blockIdx.x * blockDim.x;
	if(tid < size)
		result[tid] = a[tid] + b[tid];
}
 
 
 
#if defined(__cplusplus)
extern "C" {
#endif
 
void vectorAdd(float a[], float b[], float output[], int size)
{
	float *da=0, *db=0, *dout=0;
	cudaMalloc((void**) &da, sizeof(float) * size);
	cudaMalloc((void**) &db, sizeof(float) * size);
	cudaMalloc((void**) &dout, sizeof(float) * size);
	//printf("%s\n",cudaGetErrorString(cudaGetLastError()));
	cudaMemcpy(da, a, sizeof(float)*size, cudaMemcpyHostToDevice);
	cudaMemcpy(db, b, sizeof(float)*size, cudaMemcpyHostToDevice);
	//printf("%s\n",cudaGetErrorString(cudaGetLastError()));
	int blockSize=64;
	int gridSize=size/blockSize;
	if(size % blockSize) gridSize++;
	vectorAddKernel<<<gridSize, blockSize>>>(da,db,dout,size);
	//printf("%s\n",cudaGetErrorString(cudaGetLastError()));
	cudaMemcpy(output, dout, sizeof(float)*size, cudaMemcpyDeviceToHost);
	//printf("%s\n",cudaGetErrorString(cudaGetLastError()));
	cudaFree(da);
	cudaFree(db);
	cudaFree(dout);
	//printf("%s\n",cudaGetErrorString(cudaGetLastError()));
}
 
#if defined(__cplusplus)
};
#endif


cudaDavApp.h is:
#if defined(__cplusplus)
extern "C" {
#endif
 
/**IBMDAV* @function vectorAdd
    @param[in] a @dimensions [size]
    @param[in] b @dimensions [size]
    @param[inout] output @dimensions [size]
    */
void vectorAdd(float a[], float b[], float output[], int size);
 
#if defined(__cplusplus)
};
#endif


When I compile this as:
nvcc cudaDavApp.cu -o libcudaDavApp.so -shared -Xcompiler '-fPIC' -deviceemu
everything works (I can call vectorAdd from my client-side app and get proper results)
When I drop the -deviceemu flag, I get the error reported above.

I confirmed that CUDA works on my system and that the code itself can run on the GPU without problems.

I realize this is not a CUDA forum and I don't expect CUDA-specific insight. But knowing what this DAV error means might point me towards the right direction.
robtie

Posts: 21
Registered: May 13, 2008 12:49:42 PM
Re: Error code: 10000 - what does it mean?
Posted: Oct 30, 2009 10:59:00 AM   in response to: _Big_Mac_ in response to: _Big_Mac_'s post
 
Click to report abuse...   Click to reply to this thread Reply
Hi,

thanks for all the detail.
Could you confirm the linkline for the DAV-enabled client executable ?
Also could you confirm the client code works when linked directly to libcudaDavApp.so
( i.e. not linked with the DAV generated library ) ?

Please try starting the DAV service in dedicated mode i.e. add a line to the
server-side IBM_DAV.conf for your service as follows

dav.server.service.<your_service_name>.dedicated=1

Thanks,

Robert
_Big_Mac_

Posts: 19
Registered: Oct 05, 2009 12:53:35 PM
Re: Error code: 10000 - what does it mean?
Posted: Oct 31, 2009 08:00:41 PM   in response to: robtie in response to: robtie's post
 
Click to report abuse...   Click to reply to this thread Reply
Hi
Thanks for the suggestions, here are my results.
My client application statically links to cudaDavApp_oai.lib (generated by davGen) and I also have cudaDavApp_oai.dll in the same folder as the executable.

I've tried linking libcudaDavApp.so (the version that targets GPU, not emulation) with my app using
g++ main.cpp -o cudaDavAppSoTest -lcudaDavApp

and it ran fine (locally of course), so the .so itself seems to be working as intended, only for some reason not when called from libcudaDavApp_oai.so.

I've also tried setting the service in dedicated mode and it didn't help.
Thanks
_Big_Mac_

Posts: 19
Registered: Oct 05, 2009 12:53:35 PM
Re: Error code: 10000 - what does it mean?
Posted: Oct 31, 2009 08:09:45 PM   in response to: _Big_Mac_ in response to: _Big_Mac_'s post
 
Click to report abuse...   Click to reply to this thread Reply
Oh, and just for the record, this main.cpp I've used to test the .so is here:
#include "cudaDavApp.h"
#include <cstdlib>
 
int main()
{
	float a[4] = {3, 3, 3, 3};
	float b[4] = {1, 1, 1, 1};
	float out[4] = {0, 0, 0, 0};
	vectorAdd(a,b,out,4);
	for(int i=0 ; i<4; i++)
		printf("%f ", out[i]);
	
	return 0;
}


This is very similar to my Windows client code (the one that statically links to cudaDavApp_oai.lib), only I added a try-catch in an effort to find out what was happening:
#include "cudaDavApp.h"
#include <cstdio>
#include <exception>
#include <string>
 
int main()
{
	float a[4] = {3, 3, 3, 3};
	float b[4] = {1, 1, 1, 1};
	float out[4] = {0, 0, 0, 0};
	try {
	vectorAdd(a,b,out,4);
	}
	catch(std::string& s)
		{
			printf("\nstd::string thrown: %s", s.c_str());
		}
	for(int i=0; i<4; i++)
		printf("%f ", out[i]);
 
	return 0;
}


Sadly, when I use the non-emulated .so on the server, the code above catches an empty exception so it's not very informative.
robtie

Posts: 21
Registered: May 13, 2008 12:49:42 PM
Re: Error code: 10000 - what does it mean?
Posted: Nov 03, 2009 06:39:17 AM   in response to: _Big_Mac_ in response to: _Big_Mac_'s post
 
Click to report abuse...   Click to reply to this thread Reply
Hi,

could you try the following :

instead of starting the DAV daemom using 'dav start' please run:

davService -t cudaDavApp -p 12303

You will need to add the following paths to the LD_LIBRARY_PATH e.g.

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$IBM_DAV_PATH/bin:$IBM_DAV_PATH/services/cudaDavApp

There is a known bug in relation to CUDA which will be fixed in a later release.
Let me know if this resolves the issue.

Thanks,

-Robert.

_Big_Mac_

Posts: 19
Registered: Oct 05, 2009 12:53:35 PM
Re: Error code: 10000 - what does it mean?
Posted: Nov 04, 2009 02:38:02 PM   in response to: robtie in response to: robtie's post
 
Click to report abuse...   Click to reply to this thread Reply
Hi
This has solved the issue, my program now works correctly. Thanks!

Point your RSS reader here for a feed of the latest messages in all forums