My dear debuggers,
Last week I worked again on a case with Branch ID error. Understanding, how Branch database table and the Branch cache is used may help to understand and solve such a problem.
My client reported the following problem:
When he run a sub-process it failed with the following error:
[16:28:37:694 EST] 00000127 wle E CWLLG2229E: An exception occurred in an EJB call. Error: Service with ID TWProcess.acfefea4-e42e-48fd-ab81-5dc04d662cbe not found.
com.lombardisoftware.core.TeamWorksException: Service with ID TWProcess.acfefea4-e42e-48fd-ab81-5dc04d662cbe not found.
From the requested traces I could see details about the sub-process:
[16:28:37:691 EST] 00000127 wle 1 com.lombardisoftware.component.subprocess.worker.SubProcessWorker doJobStartSubProcess BEGIN: SubProcessWorker.doJobStartProcessItem(), subProcess =
Later, code is invoked with VersioningContext (VC) which has correct snapshotId:
[16:28:37:692 EST] 00000127 wle_repocore > com.lombardisoftware.server.ejb.persistence.PersistenceServicesCore getVersionSummaryId ENTRY VersioningContext context=Snapshot.fa806295-1938-4e89-8220-7b32e107a1e9, final ID<T> id=TWProcess.acfefea4-e42e-48fd-ab81-5dc04d662cbe
And the following BranchID
[16:28:37:692 EST] 00000127 wle_versionin 1 com.lombardisoftware.server.ejb.persistence.versioning.BranchManager getContext branchId Branch.16fdc746-d9e4-44ef-bd5a-c056a08c9b66
[16:28:37:693 EST] 00000127 wle_versionin 1 com.lombardisoftware.server.ejb.persistence.versioning.BranchManager getContext return branchContext com.lombardisoftware.server.ejb.persistence.versioning.BranchContextImpl@e9121a74. branches.size is 400
[2/14/19 16:28:37:694 EST] 00000127 wle E CWLLG2229E: An exception occurred in an EJB call. Error: Service with ID TWProcess.acfefea4-e42e-48fd-ab81-5dc04d662cbe not found.
By generally, I always start my investigation by requesting / checking ProcessApplication’s twx file as well as DB tables like LSW_SNAPSHOT, LSW_BRANCH, LSW_PROJECT, LSW_PROCESS.
(in our specific case, table with query: … where branch_ID = 16fdc746-d9e4-44ef-bd5a-c056a08c9b66 ). The main result of that check was that the specific Branch_ID could not be found !!!)
A Branch is a kind of 'workspace' in a project for purpose of parallel development, Project --> Branch --> Snapshot
A Project may have several Branches, together with Snapshot(s) per Branch. However, a deployed Snapshot should be always related to a specific Branch reflected by BranchID record in DB table.
But why there isn't such a record available for the snapshot ?
Well, ONE OPTION could be that the Snapshot installation already failed for any reason (however, in this case you should get an error message at deployment time).
When the snapshot is deployed, all the information you need to run the Process Application are hold in DB tables and in cache. Now, it could be the Branch record in LSW_BRANCH table is NOT getting created correctly because of a transaction timeout (TX) event - this should be also visible in the log.
"WTRN0124I: When the timeout occurred the thread with which the transaction is, or was most recently, associated was Thread[Default : 3,5,main]. The stack trace of this thread when the timeout occurred was: java.net.SocketInputStream.socketRead0(Native Method)"
You can increase the timeout value definition by following the instruction from the BPM Knowledgecenter
A SECOND OPTION of missing Branch record is a cache corruption.
When we checked the records of the requested DB tables for Snapshot "fa806295-1938-4e89-8220-7b32e107a1e9"
"fa806295-1938-4e89-8220-7b32e107a1e9","6e34dcbb-e086-49b3-8ad6-8c82906791c3",9,19-02-13 18:20:06.489000000,"7.1.8 Steps REL.7.1.8","7.1.8SR",<html><body><p> <font size="2">Updated dependencies </font> </p></body></html>,"df589b9c-2bcb-4a97-14c-0e5cddb33111","1c35691f-85f4-4a9c-84ed-9548b4a2c38a",256,"T","F","F","T","F","F","F",14,(BLOB),"F",19-02-08 13:12:20.561000000,19-02-13 18:23:53.059000000,"New","0","F",,19-02-14,9
the BranchId should be "df589b9c-2bcb-4a97-b14c-0e5cddb33111" and not "16fdc746-d9e4-44ef-bd5a-c056a08c9b66", as this ID is not the one associated with the snapshot and it doesn't exist in the database at all !
When we run the application we get a Snapshot Versioned Context and the related Branch ID, which is either from an in-memory-cache or from the DB table.
Assuming the table is good (we should get back "df589b9c-2bcb-4a97-b14c-0e5cddb33111" ) then the failing part MUST be the cache !
As soon as a deployment is performed, the new snapshot is created with a unique Branch ID that is first loaded into the cache before it is inserted into the database table. If something goes wrong in sending the snapshot to the database, then you could end up with a snapshot in the cache that does not match the database.
For example, in the traces we could see that the Branch cache (held in the BranchManager class) is being checked for an artifact and is looking in a Branch, 16fdc746-d9e4-44ef-bd5a-c056a08c9b66, that does not exist in the DB for the LSW_BRANCH table:
00000127 wle_versionin 1 com.lombardisoftware.server.ejb.persistence.versioning.BranchManager getContext branchId Branch.16fdc746-d9e4-44ef-bd5a-c056a08c9b66
In summary, we have a situation with a Branch ID being traced out from the cache that is not located in the database tables.
A possible solution could be to restart the server (this cache is per Node!). A restart would lead to the cache in the RAM getting cleared out, and thus it would refresh itself by getting the snapshots from the database. Since the database side is good, this means that the cache after restart does not have the bad entry in it.
Why could it be happen ?
Only a guess - If two different snapshots are deployed at the same time that reference the exact same toolkit snapshot (and the toolkit snapshot is not already installed on the system). At that point, if the toolkit snapshot can being deploy twice at the same time and can result in very odd behavior including a bad snapshot being inserted into the cache.
I hope that some day this will also help you.
And if it does not, take two of these and call me in the morning.
Yours Dr. Debug