Topic
  • 13 replies
  • Latest Post - ‏2013-06-13T16:09:10Z by JoeMorganNTST
JoeMorganNTST
JoeMorganNTST
427 Posts

Pinned topic Why does the Probe NOT capture?

‏2013-05-22T22:50:22Z |

XI52 on 5.0.0.6.

I know one answer to this... which is if the processing is set to passthrough.  This is not my case.  Request/Response is set to JSON on an MPG.

We are trying to debug a situation where messages are being randomly "dropped".  So, I have a probe on the MPG that handles the servicing of the requests.  Here is the traffic pattern:

  1. JSON Inbound to MPG-1
  2. MPG-1 forwards the JSON request to an APP server
  3. The APP server then sends a different JSON request out to a different MPG-2
  4. MPG-2 sends the request on to another server.....

At least, that's the happy day scenario.  Every so often, DataPower is throwing an error with "Missing Input on Action".  There is no rhyme or reason to when or why.  The same exact request can go through successfully many times, then it just fails.  There are many different requests.  Same thing.  Some go through, others don't.  It isn't always the same request that fails.

Anyway... enough of that.  The service has 2 request rules, 1 Response rule, and 1 Error rule.  In error rule, there is a transform that creates a JSON error response.  I have setup an XSLT parameter to give it a simple uniquely identifying String to tell us which device (on an AO load balanced pair of DPs) is throwing the error.  (Also no science to which appliance generates the error).

So, we start sending requests through.  Say we send 5 requests through and 2 of them fail.  We clearly see the probes on the 3 that succeed.  However, I do not see the probe on the 2 that fail, yet, we get back the JSON error transformed by the appliance!

Again, policy is JSON/JSON, no passthrough.

Does anyone have any explanation why I cannot see anything in the probes just because the request fails?

 

  • swlinn
    swlinn
    1348 Posts

    Re: Why does the Probe NOT capture?

    ‏2013-05-23T02:43:55Z  

    Hi Joe,

    So before I throw in my two cents with some speculative suggestions or questions, I will ask if you've opened a PMR on this?  This behavior does sound odd.

    Having said that, since you indicate your using AO, any chance that the 2 that failed happened on the other appliance that your have the probe?  And if the ones that fail happen on one of the pair of appliances, are the service configs the same?  I believe there is a 3 concurrent transaction threshold in the probe, so if your 5 requests come in concurrently, the probe may only show the three, although it is very odd that the two that don't show up happen to be the two that fail.  What are the odds of that??

    Also, instead of a stylesheet param with the unique identifying string, you could use var://service/system/ident ... it has a node set that comes from your system settings ... something like ...

    <identification build="218804" timestamp="Tue May 14 14:56:05 2013 ">

        <product-id>92354BX</product-id>
        <product>XI50</product>
        <display-product>XI50</display-product>
        <model>DataPower XI50</model>
        <display-model>DataPower XI50</display-model>
        <device-name>unique name from system settings</device-name>
        <serial-number>xxxxxxx</serial-number>
        <firmware-version>XI50.5.0.0.3</firmware-version>
        <display-firmware-version>XI50.5.0.0.3</display-firmware-version>
        <firmware-build>218804</firmware-build>
        <firmware-timestamp>2012/10/04 20:13:07</firmware-timestamp>
        <current-date>2013-05-14</current-date>
        <current-time>14:56:05 EDT</current-time>
        <reset-date>2013-03-21</reset-date>
        <reset-time>09:41:20 EDT</reset-time>
        <login-message />
        <custom-ui-file />

    </identification>

    Regards,

    Steve

  • JoeMorganNTST
    JoeMorganNTST
    427 Posts

    Re: Why does the Probe NOT capture?

    ‏2013-05-23T14:59:30Z  
    • swlinn
    • ‏2013-05-23T02:43:55Z

    Hi Joe,

    So before I throw in my two cents with some speculative suggestions or questions, I will ask if you've opened a PMR on this?  This behavior does sound odd.

    Having said that, since you indicate your using AO, any chance that the 2 that failed happened on the other appliance that your have the probe?  And if the ones that fail happen on one of the pair of appliances, are the service configs the same?  I believe there is a 3 concurrent transaction threshold in the probe, so if your 5 requests come in concurrently, the probe may only show the three, although it is very odd that the two that don't show up happen to be the two that fail.  What are the odds of that??

    Also, instead of a stylesheet param with the unique identifying string, you could use var://service/system/ident ... it has a node set that comes from your system settings ... something like ...

    <identification build="218804" timestamp="Tue May 14 14:56:05 2013 ">

        <product-id>92354BX</product-id>
        <product>XI50</product>
        <display-product>XI50</display-product>
        <model>DataPower XI50</model>
        <display-model>DataPower XI50</display-model>
        <device-name>unique name from system settings</device-name>
        <serial-number>xxxxxxx</serial-number>
        <firmware-version>XI50.5.0.0.3</firmware-version>
        <display-firmware-version>XI50.5.0.0.3</display-firmware-version>
        <firmware-build>218804</firmware-build>
        <firmware-timestamp>2012/10/04 20:13:07</firmware-timestamp>
        <current-date>2013-05-14</current-date>
        <current-time>14:56:05 EDT</current-time>
        <reset-date>2013-03-21</reset-date>
        <reset-time>09:41:20 EDT</reset-time>
        <login-message />
        <custom-ui-file />

    </identification>

    Regards,

    Steve

    I will ask if you've opened a PMR on this?

    No.  I generally post questions here first to see if I can resolve before filing a PMR.

    Having said that, since you indicate your using AO, any chance that the 2 that failed happened on the other appliance that your have the probe?'

    Now you know why I put in the transform to ID the machine generating the error.  I watched BOTH probes.  The probes simply do not display the request/response if the response failure, then error rule, is triggered.

    are the service configs the same?

    Identical.  Because they are AO, I can always export from one and import to the other without having to change anything else.  The listening address on the service is via a host alias.

    I believe there is a 3 concurrent transaction threshold in the probe, so if your 5 requests come in concurrently, the probe may only show the three, although it is very odd that the two that don't show up happen to be the two that fail.  What are the odds of that??

    I didn't know that.  That's good information to know.  However, the probes *always* show the successful requests, and never the failed ones.

    Also, instead of a stylesheet param with the unique identifying string, you could use var://service/system/ident

    Yes, I know.... However, we use the network machine name there for SNMP, and it is much easier for me to ID the machine if I can put in a user-friendly name (we're debugging against many machines now, not just the 2.  Same behavior in all of them!!!)

    One thing.  When I originally created this service, it was created as an XMLFW.  That thing never acted right.  In fact, it didn't work *unless* I had a probed turned on (recently asked question by me on this forum).  We have an MPG that's JSON->JSON, and it is working flawlessly.  So, I developed some suspicion this service may have been broken, and created a new MPG, with an XMLFW.  Instead of recreating the whole processing policy, I simply reused the policy I created for the XMLFW.  This fixed the issue with not needing the probe turned on for it to work.

    Now I'm wondering if there's not some kind of remnant thing I cannot see or cannot easily spot in that policy, its rules, or its actions.  So... as I write this, I'm deleting the entirety of the all things that were in the XMLFW config, and going to rebuild it as an MPG from scratch.

     

     

  • HermannSW
    HermannSW
    4736 Posts

    Re: Why does the Probe NOT capture?

    ‏2013-05-24T11:02:14Z  

    I will ask if you've opened a PMR on this?

    No.  I generally post questions here first to see if I can resolve before filing a PMR.

    Having said that, since you indicate your using AO, any chance that the 2 that failed happened on the other appliance that your have the probe?'

    Now you know why I put in the transform to ID the machine generating the error.  I watched BOTH probes.  The probes simply do not display the request/response if the response failure, then error rule, is triggered.

    are the service configs the same?

    Identical.  Because they are AO, I can always export from one and import to the other without having to change anything else.  The listening address on the service is via a host alias.

    I believe there is a 3 concurrent transaction threshold in the probe, so if your 5 requests come in concurrently, the probe may only show the three, although it is very odd that the two that don't show up happen to be the two that fail.  What are the odds of that??

    I didn't know that.  That's good information to know.  However, the probes *always* show the successful requests, and never the failed ones.

    Also, instead of a stylesheet param with the unique identifying string, you could use var://service/system/ident

    Yes, I know.... However, we use the network machine name there for SNMP, and it is much easier for me to ID the machine if I can put in a user-friendly name (we're debugging against many machines now, not just the 2.  Same behavior in all of them!!!)

    One thing.  When I originally created this service, it was created as an XMLFW.  That thing never acted right.  In fact, it didn't work *unless* I had a probed turned on (recently asked question by me on this forum).  We have an MPG that's JSON->JSON, and it is working flawlessly.  So, I developed some suspicion this service may have been broken, and created a new MPG, with an XMLFW.  Instead of recreating the whole processing policy, I simply reused the policy I created for the XMLFW.  This fixed the issue with not needing the probe turned on for it to work.

    Now I'm wondering if there's not some kind of remnant thing I cannot see or cannot easily spot in that policy, its rules, or its actions.  So... as I write this, I'm deleting the entirety of the all things that were in the XMLFW config, and going to rebuild it as an MPG from scratch.

     

     

    No.  I generally post questions here first to see if I can resolve before filing a PMR.

    I appreciate that!

    I didn't know that.  That's good information to know.  However, the probes *always* show the successful requests, and never the failed ones.

    Always capturing the good ones sounds interesting.

    May you enable "unbounded" Probe mode for trying to capture all 5?
    You can do that of you open the MPGW via Object Screen, then you will see "Probe Settings" tab.


     

    Hermann<myXsltBlog/> <myXsltTweets/> <myCE/>

  • swlinn
    swlinn
    1348 Posts

    Re: Why does the Probe NOT capture?

    ‏2013-05-29T01:12:52Z  

    I will ask if you've opened a PMR on this?

    No.  I generally post questions here first to see if I can resolve before filing a PMR.

    Having said that, since you indicate your using AO, any chance that the 2 that failed happened on the other appliance that your have the probe?'

    Now you know why I put in the transform to ID the machine generating the error.  I watched BOTH probes.  The probes simply do not display the request/response if the response failure, then error rule, is triggered.

    are the service configs the same?

    Identical.  Because they are AO, I can always export from one and import to the other without having to change anything else.  The listening address on the service is via a host alias.

    I believe there is a 3 concurrent transaction threshold in the probe, so if your 5 requests come in concurrently, the probe may only show the three, although it is very odd that the two that don't show up happen to be the two that fail.  What are the odds of that??

    I didn't know that.  That's good information to know.  However, the probes *always* show the successful requests, and never the failed ones.

    Also, instead of a stylesheet param with the unique identifying string, you could use var://service/system/ident

    Yes, I know.... However, we use the network machine name there for SNMP, and it is much easier for me to ID the machine if I can put in a user-friendly name (we're debugging against many machines now, not just the 2.  Same behavior in all of them!!!)

    One thing.  When I originally created this service, it was created as an XMLFW.  That thing never acted right.  In fact, it didn't work *unless* I had a probed turned on (recently asked question by me on this forum).  We have an MPG that's JSON->JSON, and it is working flawlessly.  So, I developed some suspicion this service may have been broken, and created a new MPG, with an XMLFW.  Instead of recreating the whole processing policy, I simply reused the policy I created for the XMLFW.  This fixed the issue with not needing the probe turned on for it to work.

    Now I'm wondering if there's not some kind of remnant thing I cannot see or cannot easily spot in that policy, its rules, or its actions.  So... as I write this, I'm deleting the entirety of the all things that were in the XMLFW config, and going to rebuild it as an MPG from scratch.

     

     

    Hi Joe,

    Like you and Hermann, it's just too coincidental that the 2 that fail don't show up, but since the magic number is 3 unless you select unbounded as Hermann suggested, I'd be curious if you sent 10 and 3 failed, you'd see the 7 successful requests in the probe???  My guess is the ones that fail are failing before your processing rule.  Here's what I would try.  Setup a continous packet capture and then set a trigger on the event code for "Missing Input on Action" that turns off the packet capture.  Hopefully then you can see what the specific request is that is failing and then diagnose as to why.  Here's a link in the info center on how to enable an event trigger in your logging target, with the CLI to turn on/off a packet capture.  http://pic.dhe.ibm.com/infocenter/wsdatap/v5r0m0/topic/com.ibm.dp.doc/problemdetermination69.htm?path=0_5_1_8_11_0#wq72  I'd just use SSH to enable the packet capture and then have the trigger turn it off when you get the error.  Upload the packet capture file and review with wireshark.

    Regards,

    Steve

     

  • JoeMorganNTST
    JoeMorganNTST
    427 Posts

    Re: Why does the Probe NOT capture?

    ‏2013-05-29T21:52:52Z  
    • swlinn
    • ‏2013-05-29T01:12:52Z

    Hi Joe,

    Like you and Hermann, it's just too coincidental that the 2 that fail don't show up, but since the magic number is 3 unless you select unbounded as Hermann suggested, I'd be curious if you sent 10 and 3 failed, you'd see the 7 successful requests in the probe???  My guess is the ones that fail are failing before your processing rule.  Here's what I would try.  Setup a continous packet capture and then set a trigger on the event code for "Missing Input on Action" that turns off the packet capture.  Hopefully then you can see what the specific request is that is failing and then diagnose as to why.  Here's a link in the info center on how to enable an event trigger in your logging target, with the CLI to turn on/off a packet capture.  http://pic.dhe.ibm.com/infocenter/wsdatap/v5r0m0/topic/com.ibm.dp.doc/problemdetermination69.htm?path=0_5_1_8_11_0#wq72  I'd just use SSH to enable the packet capture and then have the trigger turn it off when you get the error.  Upload the packet capture file and review with wireshark.

    Regards,

    Steve

     

    I'd be curious if you sent 10 and 3 failed, you'd see the 7 successful requests in the probe???

    That's exactly what we are seeing.  It is perfectly consistent.  If we send 10 and 6 fail, we see 4 successful requests in the probe.

    In another strange twist, we now have probes on a WSP, and never see any request coming through, even though there is no other way for them to reach their destinations without coming through the probe.  I'm thinking something is very sick about 5.0.0.6 firmware.  It didn't happen right away, but as time goes on, these appliances are getting more sick by the day.

    My guess is the ones that fail are failing before your processing rule

    Actually, when it does fail, it is failing on the results action on the request rule. This is not universal.  About 50% of the requests make it through.  I can now control it.  It is a GET request (JSON/JSON).  The results action is sometimes failing with a "Missing input on action".  If I set the input to default, I have a 50% (more or less) failure rate.  If I explicitly set the input for the action to NULL, it never fails.

    It still doesn't explain why, when it does fail, I cannot see the probe process the error rule, which generates a very specific JSON response that only DataPower will produce.

    Upload the packet capture file and review with wireshark

    We had a lot of trouble doing this which adds to the mystery.  We cut off all other traffic, turned on packet capture, sent the requests until it failed, then turned off packet capture.  We could not find the request in the packet capture at all!  However, within a dedicated log target logging all activity on the service (and its references), we can see the transaction in the log.  I can only guess that we're running out of space on the capture, even though I did set it to continuous and set it's maximum size boundary to 500000 KB.  This, too, even when the failure occurs on the first request.  I do notice the packet capture output is nowhere near 500000 KB.  So, I'm going to have to try to figure out this other way.

    Joe

     

     

     

    Updated on 2013-05-29T21:55:15Z at 2013-05-29T21:55:15Z by JoeMorganNTST
  • swlinn
    swlinn
    1348 Posts

    Re: Why does the Probe NOT capture?

    ‏2013-05-29T22:21:20Z  

    I'd be curious if you sent 10 and 3 failed, you'd see the 7 successful requests in the probe???

    That's exactly what we are seeing.  It is perfectly consistent.  If we send 10 and 6 fail, we see 4 successful requests in the probe.

    In another strange twist, we now have probes on a WSP, and never see any request coming through, even though there is no other way for them to reach their destinations without coming through the probe.  I'm thinking something is very sick about 5.0.0.6 firmware.  It didn't happen right away, but as time goes on, these appliances are getting more sick by the day.

    My guess is the ones that fail are failing before your processing rule

    Actually, when it does fail, it is failing on the results action on the request rule. This is not universal.  About 50% of the requests make it through.  I can now control it.  It is a GET request (JSON/JSON).  The results action is sometimes failing with a "Missing input on action".  If I set the input to default, I have a 50% (more or less) failure rate.  If I explicitly set the input for the action to NULL, it never fails.

    It still doesn't explain why, when it does fail, I cannot see the probe process the error rule, which generates a very specific JSON response that only DataPower will produce.

    Upload the packet capture file and review with wireshark

    We had a lot of trouble doing this which adds to the mystery.  We cut off all other traffic, turned on packet capture, sent the requests until it failed, then turned off packet capture.  We could not find the request in the packet capture at all!  However, within a dedicated log target logging all activity on the service (and its references), we can see the transaction in the log.  I can only guess that we're running out of space on the capture, even though I did set it to continuous and set it's maximum size boundary to 500000 KB.  This, too, even when the failure occurs on the first request.  I do notice the packet capture output is nowhere near 500000 KB.  So, I'm going to have to try to figure out this other way.

    Joe

     

     

     

    Hi Joe,

    So now you really have me scratching my head.  For the request to fail at the result action means it did execute your processing policy rule which thereby should be in your probe.  I've seen WSPs fail as you describe when the inbound message is not SOAP or you are doing a request schema validation which fails, so your rule never gets initiated which is why it doesn't show up in the probe, which is what I was thinking might be your case, but if you're failing in your processing policy then I'm at a loss to explain why.  Perhaps we are getting to PMR time. 

    As far as the packet capture, you can filter the packets captured based on the IP of your test generator (filter string of "host w.x.y.z") so you don't have to look at all of the traffic coming thru the appliance, just send your test traffic from one place.  This will be much simpler and your pcap file can be smaller for sure.  A capture that big and you'll not see the forest through the trees.

    Regards,

    Steve

  • JoeMorganNTST
    JoeMorganNTST
    427 Posts

    Re: Why does the Probe NOT capture?

    ‏2013-05-29T22:32:59Z  
    • swlinn
    • ‏2013-05-29T22:21:20Z

    Hi Joe,

    So now you really have me scratching my head.  For the request to fail at the result action means it did execute your processing policy rule which thereby should be in your probe.  I've seen WSPs fail as you describe when the inbound message is not SOAP or you are doing a request schema validation which fails, so your rule never gets initiated which is why it doesn't show up in the probe, which is what I was thinking might be your case, but if you're failing in your processing policy then I'm at a loss to explain why.  Perhaps we are getting to PMR time. 

    As far as the packet capture, you can filter the packets captured based on the IP of your test generator (filter string of "host w.x.y.z") so you don't have to look at all of the traffic coming thru the appliance, just send your test traffic from one place.  This will be much simpler and your pcap file can be smaller for sure.  A capture that big and you'll not see the forest through the trees.

    Regards,

    Steve

    I've seen WSPs fail as you describe when the inbound message is not SOAP or you are doing a request schema validation which fails, so your rule never gets initiated which is why it doesn't show up in the probe,

    It's the standard pattern.  MPG in the DMZ routing to other services in a trusted zone.  At some point, WebLogic makes a request out through DataPower via a WSP.   We are seeing completely successful requests go through these WSPs (as evidenced by the responses in the MPG and the Weblogic Logs) and it is not capturing in the probe at all! 

    This is only bothering me because it is continuing the Twilight Zone kind of behavior that I cannot begin to explain.

    With all this said.  I did set the probes to "Unbounded" and it is now capturing.  I'm going to set them back to just "On", and see what happens.

    I may have to go back to try this same thing with the JSON/JSON MPG.  I may have to set the Results action back to default, and then set the probe to unbounded to see if it captures the error rule (at the very least).

     

     

     

    Updated on 2013-05-29T22:36:43Z at 2013-05-29T22:36:43Z by JoeMorganNTST
  • HermannSW
    HermannSW
    4736 Posts

    Re: Why does the Probe NOT capture?

    ‏2013-05-29T22:51:39Z  

    I've seen WSPs fail as you describe when the inbound message is not SOAP or you are doing a request schema validation which fails, so your rule never gets initiated which is why it doesn't show up in the probe,

    It's the standard pattern.  MPG in the DMZ routing to other services in a trusted zone.  At some point, WebLogic makes a request out through DataPower via a WSP.   We are seeing completely successful requests go through these WSPs (as evidenced by the responses in the MPG and the Weblogic Logs) and it is not capturing in the probe at all! 

    This is only bothering me because it is continuing the Twilight Zone kind of behavior that I cannot begin to explain.

    With all this said.  I did set the probes to "Unbounded" and it is now capturing.  I'm going to set them back to just "On", and see what happens.

    I may have to go back to try this same thing with the JSON/JSON MPG.  I may have to set the Results action back to default, and then set the probe to unbounded to see if it captures the error rule (at the very least).

     

     

     

    > With all this said.  I did set the probes to "Unbounded" and it is now capturing.
    >

    That is good.

    But I want to make another comment here:
    Do not use Probe in production!

    This can be read in InfoCenter as well (the bold text):
    http://pic.dhe.ibm.com/infocenter/wsdatap/v5r0m0/index.jsp?topic=%2Fcom.ibm.dp.doc%2Fproblemdetermination73.htm

     

    Hermann<myXsltBlog/> <myXsltTweets/> <myCE/>

     

  • JoeMorganNTST
    JoeMorganNTST
    427 Posts

    Re: Why does the Probe NOT capture?

    ‏2013-05-30T14:12:48Z  
    • HermannSW
    • ‏2013-05-29T22:51:39Z

    > With all this said.  I did set the probes to "Unbounded" and it is now capturing.
    >

    That is good.

    But I want to make another comment here:
    Do not use Probe in production!

    This can be read in InfoCenter as well (the bold text):
    http://pic.dhe.ibm.com/infocenter/wsdatap/v5r0m0/index.jsp?topic=%2Fcom.ibm.dp.doc%2Fproblemdetermination73.htm

     

    Hermann<myXsltBlog/> <myXsltTweets/> <myCE/>

     

    Do not use Probe in production!

    Duh! :-)  This isn't a production domain, but, unfortunately, it is on a production appliance in a QA domain.  Totally against my recommendations, but it is what it is.