Dr. Debug's Advice
Dr.Debug 3100008NWU 7,191 Views
Some of you may already know me, some may not. And some might know me, but might not be aware of it.
But don't worry, we'll sure get to know each other soon.
As I said, my name is Dr. Debug and I work in IBM Support. I have a combined age of about 1400 years and more than 600 years of experience in IBM Support.
My area of specialisation is the WebSphere Portfolio, including the Application Server, DataPower, Business Process Manager (BPM) as well as the IBM PureApplication System.
You may think: Wow, this guy is a bit of a know it all - and you're right. I have seen many many things in my career. And I have always found a solution to a problem.
Put a bandaid on the wound first, then find the cause, kill the pain, and get you back on track all healthy and happy.
That's my job. And I love it.
I thought I'd share my experience with you, so you gain some insights of how I help patients, so you may soon be able to assist me.
And I hope the hints, procedures, tipps and tricks I will show you will be beneficial for your future work with our products. At least they became quite handy over the years for me.
So, welcome to my new blog, enjoy your stay, come again, bring some friends and if nothing helps:
Take two of these and call me in the morning....
G'day my friends,
Today I'd like to share with you an excellent piece of work that will make your debugging life so much easier.
This IBM Support TV video hosted on our youtube channel will show you various ways to record HTTP traffic using your browser:
This becomes quite handy in cases where IBM Support asks your for HTTP traffic captures or Fiddler traces, but your company policy does not allow you to install fiddler.
I would suggest to bookmark this video. We will need this as a basis for future debugging.
And if neither this video, nor fiddler helps...
Take two of these and call me in the morning,
Dr.Debug 3100008NWU Tags:  har performance slow bpm debug loading coaches long coach 11,015 Views
Good morning, Fellas!
Today, I will tell you why my last posting with the video of how to capture http traffic is so important.
You all probably know the symptoms - you run your Coach and it's loading. You check the news in a separate tab, you jump back to the Coach, still loading,
loading aaaaaaand loading. Okay, I am exaggerating a bit, but sometimes the loading times feel like they are taking forever.
Now, before contacting IBM Support to help you solving your performance issue, I wanted to share some steps you can use to further diagnose your problem.
And for that, we will need this *.HAR (HTTP Archive) file that is being generated.
Now, to look at this HAR file, you will require an HAR Viewer (http://www.softwareishard.com/blog/har-viewer/).
Now simply drag&drop your HAR file onto the page and click on the Preview tab.
Here you will see all the requests with request/response, as well as header information, the response codes to the requests and most importantly - the timings.
Since we have long loading times for our Coach, we are particularly interested in those loading times.
A good indicator is the bar on the right hand side. If it is purple and the number given next to it is quite small (it's the actual time this request took in milliseconds), that's good.
Just some general remarks:
The purple bar indicates the wait time, i.e. the time the client waited for a response from the server. The smaller, the better. Of course it depends on the size of the request, network latency, server load etc).
You may see a brownish bar that indicates Blocking time. This is the time the request spent waiting for a network connection.
Blue bars indicate the time the DNS lookup took, green is the time it took to create the actual TCP connection.
Red bars show the time spent on sending the HTTP request to the server, and as said earlier, purple is the waiting time for the response.
They grey bar stands for the time it took to receive and read the complete response from the server.
In some cases you might even see SSL timings, which show how long the handshake etc took.
A good sample can be found here.
Just open the twisties at the bottom named "Cuzillion" and you can review these timings.
Or you can reload this page, grap the HAR from your browser console and it might look something like this:
I had a patient the other day, whose HAR file showed something very interesting. The actual waiting times were pretty small, all only a couple of milliseconds. Yet he still told me his Coach was ailing. There were pages and pages of requests. I stopped counting after 200. The farther i scrolled down in the list, I noticed that the brown bar continuously increased.
Brown bar? Yes! Blocking time! What was that again?
Exactly! The time waiting for a connection. I am not a plumber, but this smelled like a clogged drain. To use the medical term - the patient was probably suffering of constipation.
I asked my patient to take the longest taking call (it was a REST call - I could see that from the request details) - and run it against the server from the REST-API tester while capturing the HTTP traffic again. In isolation, no other requests, just this one.
The result: It finished in no time. So we knew it was not the DB or Server that had a problem with that request. This supported the drain theory.
What can cause such a constipation are the connection- and thread-pool settings for your server. In other words: If you eat more than you can egest, at some point you will feel constipated.
I gave him the advice to increase the pool sizes in an iterative process, always monitoring the performance of his Coach while doing so.
And voilá - the clogged drain was free again.
And if none of this helps, take two of these and call me in the morning.
Your Dr. D
Tutorial: How to investigate startup issues of Eclipse based products where the Java Process terminates unexpectedly during the start-up
Dr.Debug 3100008NWU Tags:  im crash iid howto log tutorial java trace dr.debug eclipse debug 10,596 Views
Good morning friends of the hands-on debugging,
Some of you may have already encountered such a situation. You get your morning coffee, boot up your workstation, you start your IBM Integration Designer (IID) and bang - there it is. A crash notification. What a good start in your day.
Java was started but returned exit code=1
If you have not encountered this yet - Congratulations! I would still recommend reading this post, maybe some days you will need this .
So, what's next?
Step 1: Prepare yourself for some debug fun. If your coffee is gone, I recommend getting another one, then continuing with the below steps. If you still have coffee left, move right on to the next step.
Step 2: Did this installation ever work or did you just install it? If this error shows up after a new installation of the Eclipse based product (here IID), check out the installation logs to make sure it was installed without any exceptions. To do so, have a look at the IBM Installation Manager log files under <Installation_Manager_Install_Dir>/logs
In my case:
Open the index.xml file and review all logs linked in there for any exceptions or failures during the installation.
Failed or partially failed installations may cause such statup issues.
Step 3: Did anybody mess with your Eclipse startup parameters? If this is not a new installation and the exception happened all of a sudden, maybe some JVM arguments were changed and are hence causing this crash. Always a good candidate is the -Xmx value, which specifies the maximum memory allocation pool for the Java Virtual Machine (JVM). A lot of users still think: The more, the better! Unfortunately they overtune the JVM and the memory allocation is simply too large for the JVM to be successfully created. If changes were made to the JVM settings, please revert those changes back to the default values (you can for example compare them to a working or untouched system). Then try to launch your Eclipse based application again.
Step 4: Create Javacore log files to gain more insight into the root cause of the crash. As Eclipse is still in the start-up process when the crash occurs, the JVM errors cannot be written to the Eclipse error logs under <workspace_root>/.metadata/.log
This should enforce the creation of a javacore.txt file in case of a JVM crash. In our case here, you will find it in the IID root directory after the next restart.
And if nothing helps, take two of these and call me in the morning.
Your Dr. D
Dr.Debug 3100008NWU 8,482 Views
Good morning dear Debuggers,
Tutorial: How to investigate startup issues of Eclipse based products where the Java Process hangs during the start-up
Dr.Debug 3100008NWU Tags:  error equinox iid startup freeze eclipse hang log debug osgi 18,482 Views
Dear fellow Debuggers,
The other day I ran into a problem, where my IBM Integration Designer (IID) did not start properly. I tried to start the application, but the startup never finished and was hung. Frozen - no way to stop it except killing the process. Alternatively I could have waited for it to crash..
If it does crash during the workspace initialization phase, a log file is written to the workspace/.metadata/ directory. The file is named ".log" and is always worth a look. In my case, I saw the following exception:
!ENTRY org.eclipse.osgi 2 0 2015 !MESSAGE While loading class "org.eclipse.core.runtime.RegistryFactory", thread "Thread[Start Level Event Dispatcher,5,main]" timed out waiting (5000ms) for thread "Thread[Component Resolve Thread,5,main]" to finish starting bundle "org.eclipse.equinox.registry_3.5.0.v20100503 ". To avoid deadlock, thread "Thread[Start Level Event Dispatcher,5,main]" is proceeding but "org.eclipse.core.runtime.RegistryFactory" may not be fully initialized. !STACK 0 org.osgi.framework.BundleException: State change in progress for bundle "reference:file:/C:/IBM/SDPShared/plugins/org.eclipse.equinox.registry_3.5.0.v20100503.jar" by thread "Component Resolve Thread". at org.eclipse.osgi.framework.internal.core.AbstractBundle.beginStateChange(AbstractBundle.java:1077) at org.eclipse.osgi.framework.internal.core.AbstractBundle.start(AbstractBundle.java:282) at org.eclipse.osgi.framework.util.SecureAction.start(SecureAction.java:417)
If it do Org.eclipse.equinox.registry is one of the first eclipse plugins started and usually indicates that a bundle was not loaded properly.
In my case, a simple restart of my machine and then starting IID with the -clean option resolved the error.
And if nothing helps, take two of these and call me in the morning.
Dr.Debug 3100008NWU Tags:  fixlist bpm server debug ps dbschema process_server fix table lsw_server 7512 post-installation upgrade 12,818 Views
Yesterday I came across an interesting issue.
After upgrading from BPM v18.104.22.168 to v22.214.171.124 the Server List in my Process Center was completely empty. The servers were reachable and available though.
When I clicked the Servers tab, I found the following exception in the log:
[3/11/15 12:16:12:430 GMT+01:00] 00000061 wle E CWLLG1274E: An exception occurred.
I looked at the LSW_SERVER table of my BPM database and the server entries were listed.
Reviewing the table definition of my LSW_SERVER table, I saw that the password column had a definition of varchar(256), although after the installation of FP2 (or JR42774) it was supposed to have varchar(1000). So something must have went wrong during the installation of the fixpack - most likely during the post-install steps where the db schema was upgraded.
After I updated the schema (as described in the post install steps) I nulled the passwords in the LSW_SERVER table, as the heartbeat would simply re-add them afterwards.
Dr.Debug 3100008NWU Tags:  integrationdesigner bpm flow tutorial view iid mediation how-to ibm support debug 10,171 Views
Good morning friends,
Today I would like to show you, how you can set a specific editor to display a certain file type.
As you all may already know, the IBM Integration Designer (IID) is a development tool where you can use a Graphical User Interface to create and test your applications. For this GUI there is a set of editors available that will display your artifacts in a graphical view instead of a file view.
An example would be the mediation flow, which is opened with the "Mediation Flow Editor" by default when you open the flow.
So, you may run into situations where the view is not available, so the mediation flow might open in the XML file view. First attempt would be to open it from the context menu (BI view >> right click on your mediation flow name >> Open With >> Mediation Flow Editor). If you are as unlucky as I am, this option is not available, and the context menu looks like this:
How can I get my Mediation Flow Editor back? Well, it's not hard if you know how. And I am going to show you how.
All you need to do is to register the editor for the related file type again. In our case the mediation flow (associated to the .mfc file type).
To do so, click on Window >> Preferences and click on the little eraser icon to remove all filters.
Then click on General >> Editors >> File Associations
You will find a list of file types where you can associate the correct editor to the according selected file type.
In our example, we will select the file type .mfc. You will now see the editors in the "Associated editors" section. With the "Add..." button we can now add additional editors for this file type. And we will now add the 'Mediation Flow Editor".
All we need to do now is save our changed and go back to the BI view where we will open the context menu to verify the Mediation Flow Editor was added:
And we're there. It worked.
And if all of that does not help, take two of these and call me in the morning.
Your Doc D.
When a Coach validation error occurs in a Client-Side Human Service in BPM 8.5.5 previously updated variables remain unchanged
Dr.Debug 3100008NWU Tags:  client-side v855 bpm human-service cshs error 8.5.5 debug variable coach validation 9 Comments 21,287 Views
Dear fellow Debuggers,
Last week I came accross an interesting issue and I thought I would share my findings with you.
In BPM version 8.5.5 the new Client-Side Human Service (CSHS) was introduced.
The Service looked like this:
It is important to understand, that the validation step was a script named "Validation Script" by the user. The validation service has a green icon (V) leading to the script step. If you don't see it, this is just a regular script step (not true validation). See this Knowledge Center link.
From the Knowledge Center:
Please note the green checkmarks or Vs.
However, to configure this validation service, you would have to set the "fire validation property" to "before" for the line to the boundary event. But the CSHS does not support this kind of validation. This is only available for Heritage Coaches in v8.5.5.
In our usecase, when we clicked the button the values of the variables should have been changed when the validation error was triggered, however, the variables still showed the "old" values.
So why was that?
Well, in v8.5.5 for the CSHS, when validation errors occur, ONLY the validation errors info is returned back to Coach for display. Any changed variables are NOT returned to the Coach. This explains why our variables showed the old values, and not the ones we assigned to them in our Client-Side Script.
With this in mind, we can now understand why our variables remained unchanged.
My dear Debug-Friends,
Today, I would like to explain a topic, that arises from time to time when you need to upload large data to a PMR and email just won't work.
IBM Support offers you to upload the data using FTP. And I will show you how you can utilize that.
You must use the appropriate naming conventions as shown in the examples below to make sure your data makes it to your PMR:
Where the parts have the meaning:
xxxxx = PMR number
bbb = Branch Office (if known)
ccc = IBM Country Code (e.g. Germany 724)
yyy.zzz= Short description and the file type, e.g. myLogs.rar, traces.zip etc.
As the login was done with an anonymous UserID, no file listing is allowed on the server. You will not be able to check what files you have already uploaded, which means that you will need to do this very carefully.
If this does not work for you, take two of these and call me in the morning.
Your Dr. D
Dr.Debug 3100008NWU Tags:  ibm decrypt fiddler https youtube video tv trace tutorial dr.debug debug support traffic http ibmsupporttv recording 33,090 Views
Good morning my dearest debuggers,
Today, I would like to share a new IBMSupportTV video with you. It shows, how to capture HTTP traffic using Fiddler and can become quite handy if IBM Support asks you for a fiddler trace.
You will find it on the IBMSupportTV youtube channel.
And if this does not help you gathering the trace, take two of these and call me in the morning.
Dr.Debug 3100008NWU Tags:  bpm appcluster long cwllg2155i wait debug apar performance dr.debug cache 13,107 Views
Dear Friends of the Debugging-Fun,
A few days ago, I was presented an interesting issue. One of my customers experienced an unacceptable wait problem while starting their BPM AppCluster.
select data,version_id from lsw_bpd where version_id in (select distinct ver.po_version_id from lsw_po_versions ver inner join lsw_snapshot snap on (ver.branch_id=snap.branch_id and ver.start_seq_num<=snap.seq_num and ver.end_seq_num > snap.seq_num) where ver.po_type=25 and snap.name is not null)
Yep, yet again another Performance debugging.
As my customer mentioned the delay starts right then, when cache settings were loaded, I checked the SystemOut.log for anything Cache related during the start-up.
I found the following line:
000000ae wle I CWLLG2155I: Cache settings read have been from file file:
As you can see, the threadID for that activity is 000000ae. So I grepped for 000000ae to see the whole thread.
And I was quite surprised to see a !9 MINUTE! delay in the start-up. You know, most of the performance cases we see are on a very high level, meaning we are talking of improvements of a few seconds. But this one, I have to tell you, had a good point. Now 9 minutes is an awefully long time to wait. 9 minutes - that is about 2 cups of coffee. Too long, as you will most likely agree.
From the information in the logs it seemed that the delay was related to the Cache, as it occured between the cache settings being loaded and the cache being initialized.
So, where could this be coming from? Maybe some slow DB causing this delay? Well, we do not yet have enough information to tell, hence we need some traces.
The following trace string is the one to start with for such issues:
The trace.log gives us more info.
We can see the known code from the screenshot above "CWLLG2155I" and some cache activities for e.g. LRU Cache, BDAC(BusinessDataAliasCache), etc.
[3/24/15 9:10:09:109 EET] 000000ae wle I CWLLG2155I: Cache settings read have been from file file:/opt/IBM/BPM/v85/BPM/Lombardi/process-server/lib/procsrv_resources.jar!/LombardiTeamWorksCache.xml
Here we can see the Cache size, which reminds me of a useful article on how to configure them for tuning purposes. Find the link here.
[3/24/15 9:10:17:496 EET] 000000ae wle 1 com.lombardisoftware.core.cache.LocalCache initCache BusinessDataAliasesCache initC()
I took a deeper look at the cache activities and stumbled across the following:
The BDAC cache load started at: 09:10:17
And it finished at: 09:16:26
So, what does this mean? These are 6 minutes only for the BDAC cache (which is 2/3 of 9 minutes load time). Aha!
So, now we need some further info on the BDAC. Upon further research, I found this:
* Searchable business data aliases are *
* the columns that you can use in saved *
* searches, the TWSearch API, and the *
* REST Search API. If you set the *
* <use-business-aliases-for-process- *
* instances> property value to "false", *
* the list of searchable business data *
* aliases are returned based on *
* snapshot definitions. *
This APAR was introduced for BPM 126.96.36.199 with a new property named <use-business-aliases-for-process-instances>.
TRUE means that the list of searchable Business data aliases are returned only when they are used in actual process instances.
If FALSE is selected, the Business data aliases will be created at the server startup already BEFORE the first access by process instances. If there is a large number of BPDs, searching all the BPDs can take a very long time.
The more process applications are deployed on server the more aliases will be created, the longer the cache load time takes.
So, if this was introduced in 188.8.131.52 why did I still see this in 8.5.5? APAR JR47818 is part of BPM v8.5.5.
Hm, what could have happened here?
As my grandmother would have said: Never trust a running config (if she had known what a running config was). Hence lets have a look at the 99local.xml file on my BPM 8.5.5 setup to verify the default configuration for that attribute.
Indeed, the value is set to " F A L S E " !
Checking the customer's 99local.xml and 100Custom.xml showed the same.
Let's wrap up our findings:
1) APAR JR47818 (with BPM 184.108.40.206 ) introduced a new property <use-business-aliases-for-process-instances>. The default value for that property is " T R U E ".
2) BPM 220.127.116.11 includes APAR JR47818 but in that version the default value for the property <use-business-aliases-for-process-instances> is " F A L S E ".
That is quite a surprise! But at least, we know what to do next.
So I advised my customer to change the setting for that property to " T R U E " and the problem was solved.
As you can see, sometimes all it takes is a bit of time and research to tackle these kind of issues.
Dr.Debug 3100008NWU Tags:  howto trace was screenshots support how-to debug bpm mustgather drdebug versioninfo 10,905 Views
Today's posting will be all about Mustgather information.
The patient needs to report an urgent issue to IBM BPM Support and wants to provide valuable information that expedites the resolution right from scratch.
For Dr. Debug and the IBM Support team it is of utmost importance to have as much information as possible in order for being able to diagnose the problem. Remember: You might know pretty much all about the error and the environment, the steps to recreate and what was done prior to the error. But this is all unknown and new to the support folks.
Should you require Support's assistance, the following steps may help you providing some basic information that should help Support to start investigating your problem.
In general it is quite handy when all the information Support needs is available right from the beginning. If anything is missing, do not worry. Support will let you know what they need in addition to what you have provided.
And if this article does not help, take two of these and call me in the morning.
Dr.Debug 3100008NWU Tags:  smartcloud debug sco import smartcloudorchestrator error bpm export fail howto 12,108 Views
When you are trying to export your freshly developed code from a test system and import it on a production system you would usually create a snapshot on test, export it and import it on production. Pretty much straight forward, no magic.
If you confirm this with 'OK' you may receive the following error message indicating the import failed:
Dr.Debug 3100008NWU Tags:  drdebug howto coaches documentlist tutorial 8.5.5 troubleshooting debug 8.5.6 document ecm bpm how-to coach 1 Comment 13,032 Views
Last week I stumbled across an interesting question in BPM. As it took me some time to figure out why BPM was behaving the way it did, I thouhgt I'd share my findings with you, as it may help you some day as well.
The scenario was the following:
Here is my HS and the Document List Configuration:
As you can see, I used a custom Search Service. My goal was to filter the document list based on the Content Type of the documents stored. I wanted to accomplish that my filter looked up the ContentType in a local variable, so I could simply change the value of the variable without touching my search criteria to adapt my searches.
To do so, I added a search criterion in the Search Service by browsing the Content Filters tab and clicking the "Add Search Criterion..." button:
This will add a new criteria. As I wanted to check for the Mime Type, I selected Mime Type in the first field, and selected "is equal to" in the second field:
In the third field as a first test I wrote "application/pdf" (without the quotes) to see, if it would work:
I ran the HS and it did work:
Now I wanted to use a variable instead of hardcoding the MimeType in my Search Criteria. So I created a new Local Input variable named "ContentType" and set it's default value to "application/pdf":
And I adapted my Search Criteria by entering the name of my variable (tw.local.ContentType) into the third field:
I ran the HS and was surprised to see the following results:
Why was that?
Looking at the DataMapping, I found the following CMIS query:
"SELECT cmis:name, cmis:contentStreamMimeType, cmis:objectId, cmis:versionSeriesId, IBM_BPM_Document_FileNameURL, IBM_BPM_Document_FileType FROM IBM_BPM_Document WHERE (cmis:contentStreamMimeType = 'tw.local.ContentType') ORDER BY cmis:name ASC"
As you may note, the variable name I entered in the search criteria was interpreted as a string. No surprise this was not working. There was no document with a Mime Type of "tw.local.ContentType".
So, how was I able to solve it?
It is fairly simple, just follow these three steps:
The result will look like this:
Note the different font, compared to when I wrote the variable name in the field manually:
Looking at the Data Mapping now, I saw the following CMIS query:
"SELECT cmis:name, cmis:contentStreamMimeType, cmis:objectId, cmis:versionSeriesId, IBM_BPM_Document_FileNameURL, IBM_BPM_Document_FileType FROM IBM_BPM_Document WHERE (cmis:contentStreamMimeType = '" + tw.local.ContentType + "') ORDER BY cmis:name ASC"
Now that looked way better. to confirm it was working, I simply re-ran the HS:
And it worked like a charm.
I hope this will help you some day. And if it does not help, take two of these and call me in the morning.