Topic
14 replies Latest Post - ‏2013-02-26T20:53:12Z by SystemAdmin
SystemAdmin
SystemAdmin
6772 Posts
ACCEPTED ANSWER

Pinned topic Queue overflow: xxx event(s) lost

‏2013-02-25T21:57:26Z |
Hi, we are getting these kind of errors ( this is trap captured on SNMP)
++++++
Timestamp: 'February 23, 2013 3:04:36 AM EST'
Agent: 'x.x.x.x'
Enterprise OID: 'a.b.c.3.3.2'
Generic Type: '6'
Specific Type: '2'
Varbinds: oid->varbind
'a.b.c.3.3.1.1.0' --> '1'
'a.b.c.3.3.1.2.0' --> '5'
'a.b.c.3.3.1.3.0' --> 'Sat Feb 23 2013 03:04:38'
'a.b.c.3.3.1.7.0' --> '0'
'a.b.c.3.3.1.4.0' --> 'Queue overflow: 4353 event(s) lost'
'a.b.c.3.3.1.8.0' --> 'default'
++++++++++++

Firmware version - 4.0.2
Datapower XI50

I looked into the forum,
(https://www.ibm.com/developerworks/forums/thread.jspa?messageID=14489305&#14489305)

one of the reason mentioned is log target stops works and appliance needs to be reloaded
http://www-01.ibm.com/support/docview.wss?uid=swg21446234

But, it does not seem the case with me. Moreever the firmware version I am using is 4x, which should have this Fix.

Kindly suggest, what these traps are and what should be done.
Updated on 2013-02-26T20:53:12Z at 2013-02-26T20:53:12Z by SystemAdmin
  • SystemAdmin
    SystemAdmin
    6772 Posts
    ACCEPTED ANSWER

    Re: Queue overflow: xxx event(s) lost

    ‏2013-02-25T22:00:08Z  in response to SystemAdmin
    I mean, it seems the traps are being dicarded by SNMP.
    DataPower might be sending huge volume of traps to SNMP which it is NOT able to handle.

    Would like to know, if that is the case, are there any settings those needs to be changed?
    Is it SNMP side or DP side setting?
  • SystemAdmin
    SystemAdmin
    6772 Posts
    ACCEPTED ANSWER

    Re: Queue overflow: xxx event(s) lost

    ‏2013-02-25T22:03:34Z  in response to SystemAdmin
    > vish_pandit wrote:
    > Hi, we are getting these kind of errors ( this is trap captured on SNMP)
    > ++++++
    > Timestamp: 'February 23, 2013 3:04:36 AM EST'
    > Agent: 'x.x.x.x'
    > Enterprise OID: 'a.b.c.3.3.2'
    > Generic Type: '6'
    > Specific Type: '2'
    > Varbinds: oid->varbind
    > 'a.b.c.3.3.1.1.0' --> '1'
    > 'a.b.c.3.3.1.2.0' --> '5'
    > 'a.b.c.3.3.1.3.0' --> 'Sat Feb 23 2013 03:04:38'
    > 'a.b.c.3.3.1.7.0' --> '0'
    > 'a.b.c.3.3.1.4.0' --> 'Queue overflow: 4353 event(s) lost'
    > 'a.b.c.3.3.1.8.0' --> 'default'
    > ++++++++++++
    >
    > Firmware version - 4.0.2
    > Datapower XI50
    >
    >
    >
    > I looked into the forum,
    > (https://www.ibm.com/developerworks/forums/thread.jspa?messageID=14489305&#14489305)
    >
    > one of the reason mentioned is log target stops works and appliance needs to be reloaded
    > http://www-01.ibm.com/support/docview.wss?uid=swg21446234
    >
    > But, it does not seem the case with me. Moreever the firmware version I am using is 4x, which should have this Fix.
    >
    > Kindly suggest, what these traps are and what should be done.

    Are you getting this problem in one appliance or all the appliance?
    • SystemAdmin
      SystemAdmin
      6772 Posts
      ACCEPTED ANSWER

      Re: Queue overflow: xxx event(s) lost

      ‏2013-02-25T22:13:44Z  in response to SystemAdmin
      We are getting it from all the boxes. But one of them is generating more alerts compared to others.

      in other words - All are generating but it is NOT evenly distributed.
      • SystemAdmin
        SystemAdmin
        6772 Posts
        ACCEPTED ANSWER

        Re: Queue overflow: xxx event(s) lost

        ‏2013-02-25T22:15:17Z  in response to SystemAdmin
        Can you more little more specific about the firmware? like 4.x.x.x

        Also did you try to reboot/reload the appliance?
  • msiebler
    msiebler
    136 Posts
    ACCEPTED ANSWER

    Re: Queue overflow: xxx event(s) lost

    ‏2013-02-25T22:37:56Z  in response to SystemAdmin
    You get these messages when the datapower device is generating more log messages than the log target can consume; the log target could be something like a syslog-ng target or a nfs target.

    You may check to see how well the target is working; maybe the syslog server is too slow? maybe you need to adjust the log level so it is set higher than debug?
    Idealy you shoud not lose any messages.
    • SystemAdmin
      SystemAdmin
      6772 Posts
      ACCEPTED ANSWER

      Re: Queue overflow: xxx event(s) lost

      ‏2013-02-26T17:55:44Z  in response to msiebler
      Thanks for the responses.

      Kumar, the firmware version is
      +
      Firmware version - 4.0.2
      Datapower XI50
      +
      I had mentioned it in my initial post.

      msiebler, we have multiple log targets - one is syslog (over UDP), SNMP as well as file.
      I checked logging target status ( should have done earlier :) )- it shows me what log targets have pending events and how many events are dropped.
      we have one of the logTarget catching userDefined log category with debug level(some business requirement).we would be turning that off.

      well, the trap that I mentioned is from SNMP log target. (SNMP server received this trap from DataPower and we got email)
      When I looked into the trap, it shows sourceEventType is 'UNKNOWN' and evebtType as 'MOMENTARY' - Attached is the screenshot that I got from SNMP team.
      In the log category of DP (on all the domains) - I do not see any category as 'MOMENTARY'.
      Would like to know, what kind of NOTIFY event DP is generating for this? I got email with NOTIFY so it is clear we are sending NOTIFY type events to SNMP but not getting the category of this event.(or atleast, I am not able to see :) )
      • SystemAdmin
        SystemAdmin
        6772 Posts
        ACCEPTED ANSWER

        Re: Queue overflow: xxx event(s) lost

        ‏2013-02-26T18:38:53Z  in response to SystemAdmin
        > vish_pandit wrote:
        > Thanks for the responses.
        >
        > Kumar, the firmware version is
        > +
        > Firmware version - 4.0.2
        > Datapower XI50
        > +
        > I had mentioned it in my initial post.
        >

        Not sure it is firmware bug because I am running in same firmware without any issue. I wonder the MIB files you gave to monitoring team are corrupted? Did you reload the appliance and check whether you are getting the same error?

        If an SNMP trap comes in and is classified as an "/Unknown" event type, this is because Zenoss does not know what you want to do with this event.
        http://docs.huihoo.com/zenoss/admin-guide/2.1.1/ch10s15.html
        • SystemAdmin
          SystemAdmin
          6772 Posts
          ACCEPTED ANSWER

          Re: Queue overflow: xxx event(s) lost

          ‏2013-02-26T18:51:36Z  in response to SystemAdmin
          Thanks.

          The reason is large number of events we are sending to syslog and SNMP. As I said, we would be disabling that as we do not want to miss other data.

          About rebooting the appliance, we have NOT done that. Also, I think, there is NO issue with any of the log Target.
          It is just that, there was large volume of events being sent.
          From DP help
          +
          dropped
          The number of events that this logging target has dropped because there were too many pending.

          pending
          The number of events pending for this logging target, waiting to be stored at the destination.
          +
          Also the boxes are PROD boxes and I are trying NOT to reboot them. But, if issue does not get resolved, will restart it.

          Also I would be checking with SNMP team to fine tune the settings.
          • msiebler
            msiebler
            136 Posts
            ACCEPTED ANSWER

            Re: Queue overflow: xxx event(s) lost

            ‏2013-02-26T18:53:25Z  in response to SystemAdmin
            can you paste the status of all your log targets?
          • SystemAdmin
            SystemAdmin
            6772 Posts
            ACCEPTED ANSWER

            Re: Queue overflow: xxx event(s) lost

            ‏2013-02-26T18:56:04Z  in response to SystemAdmin
            How many log targets you have?
            • SystemAdmin
              SystemAdmin
              6772 Posts
              ACCEPTED ANSWER

              Re: Queue overflow: xxx event(s) lost

              ‏2013-02-26T20:09:40Z  in response to SystemAdmin
              name status memory(kbytes) processed(events) dropped(events) pending(events) error info
              default-log Active 14 905142 0 0 none
              mqDuration-log Active 4 2325913 0 3 none
              snmplogtarget Active 0 1960757 4359 3 SNMP Trap sent
              syslog-svcbus Active 0 2136801 4554 3 Message sent

              Well, we have 4 log targets in default domain(as above)
              and two more(default-log and one userDefined) in application domain. So total of 6 log targets.
              And we have 5 boxes. so 5*6. :)

              I am pretry much sure, the issue is due to high volume being sent and I need to change event subscription for these log targets.
              • SystemAdmin
                SystemAdmin
                6772 Posts
                ACCEPTED ANSWER

                Re: Queue overflow: xxx event(s) lost

                ‏2013-02-26T20:17:40Z  in response to SystemAdmin
                I agree maybe changing the event subscrption may do the trick. BTW how many event subscription do you have?
                • SystemAdmin
                  SystemAdmin
                  6772 Posts
                  ACCEPTED ANSWER

                  Re: Queue overflow: xxx event(s) lost

                  ‏2013-02-26T20:28:28Z  in response to SystemAdmin
                  Differs from log target to logTarget.

                  But we have
                  All -Error
                  cert-monitor - warning

                  and then we hvae couple of user defined log category which we monitor on Error level and some on critical.
                  • SystemAdmin
                    SystemAdmin
                    6772 Posts
                    ACCEPTED ANSWER

                    Re: Queue overflow: xxx event(s) lost

                    ‏2013-02-26T20:53:12Z  in response to SystemAdmin
                    For the sake of others who encountered this issue -

                    1. Check the 'Logging Targets' from Status menu.
                    2. You will find how many are pending and how many are discarded
                    3. Check event subscription for the logTarget in question. You may want to see increase priority level to 'error' or higher.
                    4. Check with external system(SNMP or SYSLOG) if there is any issue at their end.

                    If there are no issues with Log Target, then as suggested earlier, restart the domain or reload the appliance.