Topic
13 replies Latest Post - ‏2013-06-13T16:09:10Z by JoeMorganNTST
JoeMorganNTST
JoeMorganNTST
210 Posts
ACCEPTED ANSWER

Pinned topic Why does the Probe NOT capture?

‏2013-05-22T22:50:22Z |

XI52 on 5.0.0.6.

I know one answer to this... which is if the processing is set to passthrough.  This is not my case.  Request/Response is set to JSON on an MPG.

We are trying to debug a situation where messages are being randomly "dropped".  So, I have a probe on the MPG that handles the servicing of the requests.  Here is the traffic pattern:

  1. JSON Inbound to MPG-1
  2. MPG-1 forwards the JSON request to an APP server
  3. The APP server then sends a different JSON request out to a different MPG-2
  4. MPG-2 sends the request on to another server.....

At least, that's the happy day scenario.  Every so often, DataPower is throwing an error with "Missing Input on Action".  There is no rhyme or reason to when or why.  The same exact request can go through successfully many times, then it just fails.  There are many different requests.  Same thing.  Some go through, others don't.  It isn't always the same request that fails.

Anyway... enough of that.  The service has 2 request rules, 1 Response rule, and 1 Error rule.  In error rule, there is a transform that creates a JSON error response.  I have setup an XSLT parameter to give it a simple uniquely identifying String to tell us which device (on an AO load balanced pair of DPs) is throwing the error.  (Also no science to which appliance generates the error).

So, we start sending requests through.  Say we send 5 requests through and 2 of them fail.  We clearly see the probes on the 3 that succeed.  However, I do not see the probe on the 2 that fail, yet, we get back the JSON error transformed by the appliance!

Again, policy is JSON/JSON, no passthrough.

Does anyone have any explanation why I cannot see anything in the probes just because the request fails?

 

  • This reply was deleted by swlinn 2013-05-29T00:44:20Z. Reason for deletion: Duplicate posting
  • swlinn
    swlinn
    1327 Posts
    ACCEPTED ANSWER

    Re: Why does the Probe NOT capture?

    ‏2013-05-23T02:43:55Z  in response to JoeMorganNTST

    Hi Joe,

    So before I throw in my two cents with some speculative suggestions or questions, I will ask if you've opened a PMR on this?  This behavior does sound odd.

    Having said that, since you indicate your using AO, any chance that the 2 that failed happened on the other appliance that your have the probe?  And if the ones that fail happen on one of the pair of appliances, are the service configs the same?  I believe there is a 3 concurrent transaction threshold in the probe, so if your 5 requests come in concurrently, the probe may only show the three, although it is very odd that the two that don't show up happen to be the two that fail.  What are the odds of that??

    Also, instead of a stylesheet param with the unique identifying string, you could use var://service/system/ident ... it has a node set that comes from your system settings ... something like ...

    <identification build="218804" timestamp="Tue May 14 14:56:05 2013 ">

        <product-id>92354BX</product-id>
        <product>XI50</product>
        <display-product>XI50</display-product>
        <model>DataPower XI50</model>
        <display-model>DataPower XI50</display-model>
        <device-name>unique name from system settings</device-name>
        <serial-number>xxxxxxx</serial-number>
        <firmware-version>XI50.5.0.0.3</firmware-version>
        <display-firmware-version>XI50.5.0.0.3</display-firmware-version>
        <firmware-build>218804</firmware-build>
        <firmware-timestamp>2012/10/04 20:13:07</firmware-timestamp>
        <current-date>2013-05-14</current-date>
        <current-time>14:56:05 EDT</current-time>
        <reset-date>2013-03-21</reset-date>
        <reset-time>09:41:20 EDT</reset-time>
        <login-message />
        <custom-ui-file />

    </identification>

    Regards,

    Steve

    • JoeMorganNTST
      JoeMorganNTST
      210 Posts
      ACCEPTED ANSWER

      Re: Why does the Probe NOT capture?

      ‏2013-05-23T14:59:30Z  in response to swlinn

      I will ask if you've opened a PMR on this?

      No.  I generally post questions here first to see if I can resolve before filing a PMR.

      Having said that, since you indicate your using AO, any chance that the 2 that failed happened on the other appliance that your have the probe?'

      Now you know why I put in the transform to ID the machine generating the error.  I watched BOTH probes.  The probes simply do not display the request/response if the response failure, then error rule, is triggered.

      are the service configs the same?

      Identical.  Because they are AO, I can always export from one and import to the other without having to change anything else.  The listening address on the service is via a host alias.

      I believe there is a 3 concurrent transaction threshold in the probe, so if your 5 requests come in concurrently, the probe may only show the three, although it is very odd that the two that don't show up happen to be the two that fail.  What are the odds of that??

      I didn't know that.  That's good information to know.  However, the probes *always* show the successful requests, and never the failed ones.

      Also, instead of a stylesheet param with the unique identifying string, you could use var://service/system/ident

      Yes, I know.... However, we use the network machine name there for SNMP, and it is much easier for me to ID the machine if I can put in a user-friendly name (we're debugging against many machines now, not just the 2.  Same behavior in all of them!!!)

      One thing.  When I originally created this service, it was created as an XMLFW.  That thing never acted right.  In fact, it didn't work *unless* I had a probed turned on (recently asked question by me on this forum).  We have an MPG that's JSON->JSON, and it is working flawlessly.  So, I developed some suspicion this service may have been broken, and created a new MPG, with an XMLFW.  Instead of recreating the whole processing policy, I simply reused the policy I created for the XMLFW.  This fixed the issue with not needing the probe turned on for it to work.

      Now I'm wondering if there's not some kind of remnant thing I cannot see or cannot easily spot in that policy, its rules, or its actions.  So... as I write this, I'm deleting the entirety of the all things that were in the XMLFW config, and going to rebuild it as an MPG from scratch.

       

       

      • HermannSW
        HermannSW
        2817 Posts
        ACCEPTED ANSWER

        Re: Why does the Probe NOT capture?

        ‏2013-05-24T11:02:14Z  in response to JoeMorganNTST

        No.  I generally post questions here first to see if I can resolve before filing a PMR.

        I appreciate that!

        I didn't know that.  That's good information to know.  However, the probes *always* show the successful requests, and never the failed ones.

        Always capturing the good ones sounds interesting.

        May you enable "unbounded" Probe mode for trying to capture all 5?
        You can do that of you open the MPGW via Object Screen, then you will see "Probe Settings" tab.


         

        Hermann<myXsltBlog/> <myXsltTweets/> <myCE/>

      • swlinn
        swlinn
        1327 Posts
        ACCEPTED ANSWER

        Re: Why does the Probe NOT capture?

        ‏2013-05-29T01:12:52Z  in response to JoeMorganNTST

        Hi Joe,

        Like you and Hermann, it's just too coincidental that the 2 that fail don't show up, but since the magic number is 3 unless you select unbounded as Hermann suggested, I'd be curious if you sent 10 and 3 failed, you'd see the 7 successful requests in the probe???  My guess is the ones that fail are failing before your processing rule.  Here's what I would try.  Setup a continous packet capture and then set a trigger on the event code for "Missing Input on Action" that turns off the packet capture.  Hopefully then you can see what the specific request is that is failing and then diagnose as to why.  Here's a link in the info center on how to enable an event trigger in your logging target, with the CLI to turn on/off a packet capture.  http://pic.dhe.ibm.com/infocenter/wsdatap/v5r0m0/topic/com.ibm.dp.doc/problemdetermination69.htm?path=0_5_1_8_11_0#wq72  I'd just use SSH to enable the packet capture and then have the trigger turn it off when you get the error.  Upload the packet capture file and review with wireshark.

        Regards,

        Steve

         

        • JoeMorganNTST
          JoeMorganNTST
          210 Posts
          ACCEPTED ANSWER

          Re: Why does the Probe NOT capture?

          ‏2013-05-29T21:52:52Z  in response to swlinn

          I'd be curious if you sent 10 and 3 failed, you'd see the 7 successful requests in the probe???

          That's exactly what we are seeing.  It is perfectly consistent.  If we send 10 and 6 fail, we see 4 successful requests in the probe.

          In another strange twist, we now have probes on a WSP, and never see any request coming through, even though there is no other way for them to reach their destinations without coming through the probe.  I'm thinking something is very sick about 5.0.0.6 firmware.  It didn't happen right away, but as time goes on, these appliances are getting more sick by the day.

          My guess is the ones that fail are failing before your processing rule

          Actually, when it does fail, it is failing on the results action on the request rule. This is not universal.  About 50% of the requests make it through.  I can now control it.  It is a GET request (JSON/JSON).  The results action is sometimes failing with a "Missing input on action".  If I set the input to default, I have a 50% (more or less) failure rate.  If I explicitly set the input for the action to NULL, it never fails.

          It still doesn't explain why, when it does fail, I cannot see the probe process the error rule, which generates a very specific JSON response that only DataPower will produce.

          Upload the packet capture file and review with wireshark

          We had a lot of trouble doing this which adds to the mystery.  We cut off all other traffic, turned on packet capture, sent the requests until it failed, then turned off packet capture.  We could not find the request in the packet capture at all!  However, within a dedicated log target logging all activity on the service (and its references), we can see the transaction in the log.  I can only guess that we're running out of space on the capture, even though I did set it to continuous and set it's maximum size boundary to 500000 KB.  This, too, even when the failure occurs on the first request.  I do notice the packet capture output is nowhere near 500000 KB.  So, I'm going to have to try to figure out this other way.

          Joe

           

           

           

          Updated on 2013-05-29T21:55:15Z at 2013-05-29T21:55:15Z by JoeMorganNTST
          • swlinn
            swlinn
            1327 Posts
            ACCEPTED ANSWER

            Re: Why does the Probe NOT capture?

            ‏2013-05-29T22:21:20Z  in response to JoeMorganNTST

            Hi Joe,

            So now you really have me scratching my head.  For the request to fail at the result action means it did execute your processing policy rule which thereby should be in your probe.  I've seen WSPs fail as you describe when the inbound message is not SOAP or you are doing a request schema validation which fails, so your rule never gets initiated which is why it doesn't show up in the probe, which is what I was thinking might be your case, but if you're failing in your processing policy then I'm at a loss to explain why.  Perhaps we are getting to PMR time. 

            As far as the packet capture, you can filter the packets captured based on the IP of your test generator (filter string of "host w.x.y.z") so you don't have to look at all of the traffic coming thru the appliance, just send your test traffic from one place.  This will be much simpler and your pcap file can be smaller for sure.  A capture that big and you'll not see the forest through the trees.

            Regards,

            Steve

            • JoeMorganNTST
              JoeMorganNTST
              210 Posts
              ACCEPTED ANSWER

              Re: Why does the Probe NOT capture?

              ‏2013-05-29T22:32:59Z  in response to swlinn

              I've seen WSPs fail as you describe when the inbound message is not SOAP or you are doing a request schema validation which fails, so your rule never gets initiated which is why it doesn't show up in the probe,

              It's the standard pattern.  MPG in the DMZ routing to other services in a trusted zone.  At some point, WebLogic makes a request out through DataPower via a WSP.   We are seeing completely successful requests go through these WSPs (as evidenced by the responses in the MPG and the Weblogic Logs) and it is not capturing in the probe at all! 

              This is only bothering me because it is continuing the Twilight Zone kind of behavior that I cannot begin to explain.

              With all this said.  I did set the probes to "Unbounded" and it is now capturing.  I'm going to set them back to just "On", and see what happens.

              I may have to go back to try this same thing with the JSON/JSON MPG.  I may have to set the Results action back to default, and then set the probe to unbounded to see if it captures the error rule (at the very least).

               

               

               

              Updated on 2013-05-29T22:36:43Z at 2013-05-29T22:36:43Z by JoeMorganNTST
              • HermannSW
                HermannSW
                2817 Posts
                ACCEPTED ANSWER

                Re: Why does the Probe NOT capture?

                ‏2013-05-29T22:51:39Z  in response to JoeMorganNTST

                > With all this said.  I did set the probes to "Unbounded" and it is now capturing.
                >

                That is good.

                But I want to make another comment here:
                Do not use Probe in production!

                This can be read in InfoCenter as well (the bold text):
                http://pic.dhe.ibm.com/infocenter/wsdatap/v5r0m0/index.jsp?topic=%2Fcom.ibm.dp.doc%2Fproblemdetermination73.htm

                 

                Hermann<myXsltBlog/> <myXsltTweets/> <myCE/>

                 

                • JoeMorganNTST
                  JoeMorganNTST
                  210 Posts
                  ACCEPTED ANSWER

                  Re: Why does the Probe NOT capture?

                  ‏2013-05-30T14:12:48Z  in response to HermannSW

                  Do not use Probe in production!

                  Duh! :-)  This isn't a production domain, but, unfortunately, it is on a production appliance in a QA domain.  Totally against my recommendations, but it is what it is.

                   

                  • HermannSW
                    HermannSW
                    2817 Posts
                    ACCEPTED ANSWER

                    Re: Why does the Probe NOT capture?

                    ‏2013-05-31T07:07:21Z  in response to JoeMorganNTST

                    > ... it is on a production appliance in a QA domain.  Totally against my recommendations, but it is what it is.
                    >

                    I know that there is some redbook stating the same very simple rule to avoid problems in production:

                    Do not run anything else than the production domains on a production box, not test/QA, and definitely not dev!
                     

                    Hermann<myXsltBlog/> <myXsltTweets/> <myCE/>

                    Updated on 2013-05-31T07:08:11Z at 2013-05-31T07:08:11Z by HermannSW
                  • swlinn
                    swlinn
                    1327 Posts
                    ACCEPTED ANSWER

                    Re: Why does the Probe NOT capture?

                    ‏2013-05-31T22:02:49Z  in response to JoeMorganNTST

                    Hi Joe,

                    As I remember your environment, this is on your production DR site then, so you're testing in QA and not taking production traffic at the moment????  If so I've seen this scenario before at other clients to save the cost of additional appliances, so as long as you're not taking production traffic it should be ok to have the probe on for your QA domain.  If I'm wrong and you have a QA domain and live production domain on the same appliance, then definitely not a best practice.

                    Regards,

                    Steve

                    • JoeMorganNTST
                      JoeMorganNTST
                      210 Posts
                      ACCEPTED ANSWER

                      Re: Why does the Probe NOT capture?

                      ‏2013-05-31T22:15:30Z  in response to swlinn

                      This is fundamentally accurate.  I've been preaching that we need new appliances for QA, and we'd only need one set (DMZ/Internal) instead of 2, but it's still a cost issue.

                      We do have live traffic on those appliances and as long as we have the QA on the same devices, we'll eventually have probes on.

                      I just learned earlier this week that you helped early on.  Between the time you helped set this up and I got here, things got pretty sloppy, and I've been reconfiguring quite a bit like moving files and logging from flash to RAID.

                      I've been very slowly winning arguments... but, I suspect this one may not be solved until production drops out from under us.

                      Meanwhile, I am going to file a PMR as we can reproduce this at will in multiple appliances, and environments, so I'm pretty sure there's something wrong.

                       

  • JoeMorganNTST
    JoeMorganNTST
    210 Posts
    ACCEPTED ANSWER

    Re: Why does the Probe NOT capture?

    ‏2013-06-13T16:09:10Z  in response to JoeMorganNTST

    This happened again.  This time, it was on a Web Service Proxy, and I think I may now know what is causing it.

    I think that if there is *any* action on a processing rule that is expecting input, but it doesn't receive input, such that it gets a "Missing Input On Action" error, the entire probe is discarded.

    Will update when I prove it... but.. I'm heads down in 3 concurrent projects, so can't take the time away at the moment to setup a test.