Topic
  • 14 replies
  • Latest Post - ‏2013-02-26T20:53:12Z by SystemAdmin
SystemAdmin
SystemAdmin
6772 Posts

Pinned topic Queue overflow: xxx event(s) lost

‏2013-02-25T21:57:26Z |
Hi, we are getting these kind of errors ( this is trap captured on SNMP)
++++++
Timestamp: 'February 23, 2013 3:04:36 AM EST'
Agent: 'x.x.x.x'
Enterprise OID: 'a.b.c.3.3.2'
Generic Type: '6'
Specific Type: '2'
Varbinds: oid->varbind
'a.b.c.3.3.1.1.0' --> '1'
'a.b.c.3.3.1.2.0' --> '5'
'a.b.c.3.3.1.3.0' --> 'Sat Feb 23 2013 03:04:38'
'a.b.c.3.3.1.7.0' --> '0'
'a.b.c.3.3.1.4.0' --> 'Queue overflow: 4353 event(s) lost'
'a.b.c.3.3.1.8.0' --> 'default'
++++++++++++

Firmware version - 4.0.2
Datapower XI50

I looked into the forum,
(https://www.ibm.com/developerworks/forums/thread.jspa?messageID=14489305&#14489305)

one of the reason mentioned is log target stops works and appliance needs to be reloaded
http://www-01.ibm.com/support/docview.wss?uid=swg21446234

But, it does not seem the case with me. Moreever the firmware version I am using is 4x, which should have this Fix.

Kindly suggest, what these traps are and what should be done.
Updated on 2013-02-26T20:53:12Z at 2013-02-26T20:53:12Z by SystemAdmin
  • SystemAdmin
    SystemAdmin
    6772 Posts

    Re: Queue overflow: xxx event(s) lost

    ‏2013-02-25T22:00:08Z  
    I mean, it seems the traps are being dicarded by SNMP.
    DataPower might be sending huge volume of traps to SNMP which it is NOT able to handle.

    Would like to know, if that is the case, are there any settings those needs to be changed?
    Is it SNMP side or DP side setting?
  • SystemAdmin
    SystemAdmin
    6772 Posts

    Re: Queue overflow: xxx event(s) lost

    ‏2013-02-25T22:03:34Z  
    > vish_pandit wrote:
    > Hi, we are getting these kind of errors ( this is trap captured on SNMP)
    > ++++++
    > Timestamp: 'February 23, 2013 3:04:36 AM EST'
    > Agent: 'x.x.x.x'
    > Enterprise OID: 'a.b.c.3.3.2'
    > Generic Type: '6'
    > Specific Type: '2'
    > Varbinds: oid->varbind
    > 'a.b.c.3.3.1.1.0' --> '1'
    > 'a.b.c.3.3.1.2.0' --> '5'
    > 'a.b.c.3.3.1.3.0' --> 'Sat Feb 23 2013 03:04:38'
    > 'a.b.c.3.3.1.7.0' --> '0'
    > 'a.b.c.3.3.1.4.0' --> 'Queue overflow: 4353 event(s) lost'
    > 'a.b.c.3.3.1.8.0' --> 'default'
    > ++++++++++++
    >
    > Firmware version - 4.0.2
    > Datapower XI50
    >
    >
    >
    > I looked into the forum,
    > (https://www.ibm.com/developerworks/forums/thread.jspa?messageID=14489305&#14489305)
    >
    > one of the reason mentioned is log target stops works and appliance needs to be reloaded
    > http://www-01.ibm.com/support/docview.wss?uid=swg21446234
    >
    > But, it does not seem the case with me. Moreever the firmware version I am using is 4x, which should have this Fix.
    >
    > Kindly suggest, what these traps are and what should be done.

    Are you getting this problem in one appliance or all the appliance?
  • SystemAdmin
    SystemAdmin
    6772 Posts

    Re: Queue overflow: xxx event(s) lost

    ‏2013-02-25T22:13:44Z  
    > vish_pandit wrote:
    > Hi, we are getting these kind of errors ( this is trap captured on SNMP)
    > ++++++
    > Timestamp: 'February 23, 2013 3:04:36 AM EST'
    > Agent: 'x.x.x.x'
    > Enterprise OID: 'a.b.c.3.3.2'
    > Generic Type: '6'
    > Specific Type: '2'
    > Varbinds: oid->varbind
    > 'a.b.c.3.3.1.1.0' --> '1'
    > 'a.b.c.3.3.1.2.0' --> '5'
    > 'a.b.c.3.3.1.3.0' --> 'Sat Feb 23 2013 03:04:38'
    > 'a.b.c.3.3.1.7.0' --> '0'
    > 'a.b.c.3.3.1.4.0' --> 'Queue overflow: 4353 event(s) lost'
    > 'a.b.c.3.3.1.8.0' --> 'default'
    > ++++++++++++
    >
    > Firmware version - 4.0.2
    > Datapower XI50
    >
    >
    >
    > I looked into the forum,
    > (https://www.ibm.com/developerworks/forums/thread.jspa?messageID=14489305&#14489305)
    >
    > one of the reason mentioned is log target stops works and appliance needs to be reloaded
    > http://www-01.ibm.com/support/docview.wss?uid=swg21446234
    >
    > But, it does not seem the case with me. Moreever the firmware version I am using is 4x, which should have this Fix.
    >
    > Kindly suggest, what these traps are and what should be done.

    Are you getting this problem in one appliance or all the appliance?
    We are getting it from all the boxes. But one of them is generating more alerts compared to others.

    in other words - All are generating but it is NOT evenly distributed.
  • SystemAdmin
    SystemAdmin
    6772 Posts

    Re: Queue overflow: xxx event(s) lost

    ‏2013-02-25T22:15:17Z  
    We are getting it from all the boxes. But one of them is generating more alerts compared to others.

    in other words - All are generating but it is NOT evenly distributed.
    Can you more little more specific about the firmware? like 4.x.x.x

    Also did you try to reboot/reload the appliance?
  • msiebler
    msiebler
    140 Posts

    Re: Queue overflow: xxx event(s) lost

    ‏2013-02-25T22:37:56Z  
    You get these messages when the datapower device is generating more log messages than the log target can consume; the log target could be something like a syslog-ng target or a nfs target.

    You may check to see how well the target is working; maybe the syslog server is too slow? maybe you need to adjust the log level so it is set higher than debug?
    Idealy you shoud not lose any messages.
  • SystemAdmin
    SystemAdmin
    6772 Posts

    Re: Queue overflow: xxx event(s) lost

    ‏2013-02-26T17:55:44Z  
    • msiebler
    • ‏2013-02-25T22:37:56Z
    You get these messages when the datapower device is generating more log messages than the log target can consume; the log target could be something like a syslog-ng target or a nfs target.

    You may check to see how well the target is working; maybe the syslog server is too slow? maybe you need to adjust the log level so it is set higher than debug?
    Idealy you shoud not lose any messages.
    Thanks for the responses.

    Kumar, the firmware version is
    +
    Firmware version - 4.0.2
    Datapower XI50
    +
    I had mentioned it in my initial post.

    msiebler, we have multiple log targets - one is syslog (over UDP), SNMP as well as file.
    I checked logging target status ( should have done earlier :) )- it shows me what log targets have pending events and how many events are dropped.
    we have one of the logTarget catching userDefined log category with debug level(some business requirement).we would be turning that off.

    well, the trap that I mentioned is from SNMP log target. (SNMP server received this trap from DataPower and we got email)
    When I looked into the trap, it shows sourceEventType is 'UNKNOWN' and evebtType as 'MOMENTARY' - Attached is the screenshot that I got from SNMP team.
    In the log category of DP (on all the domains) - I do not see any category as 'MOMENTARY'.
    Would like to know, what kind of NOTIFY event DP is generating for this? I got email with NOTIFY so it is clear we are sending NOTIFY type events to SNMP but not getting the category of this event.(or atleast, I am not able to see :) )
  • SystemAdmin
    SystemAdmin
    6772 Posts

    Re: Queue overflow: xxx event(s) lost

    ‏2013-02-26T18:38:53Z  
    Thanks for the responses.

    Kumar, the firmware version is
    +
    Firmware version - 4.0.2
    Datapower XI50
    +
    I had mentioned it in my initial post.

    msiebler, we have multiple log targets - one is syslog (over UDP), SNMP as well as file.
    I checked logging target status ( should have done earlier :) )- it shows me what log targets have pending events and how many events are dropped.
    we have one of the logTarget catching userDefined log category with debug level(some business requirement).we would be turning that off.

    well, the trap that I mentioned is from SNMP log target. (SNMP server received this trap from DataPower and we got email)
    When I looked into the trap, it shows sourceEventType is 'UNKNOWN' and evebtType as 'MOMENTARY' - Attached is the screenshot that I got from SNMP team.
    In the log category of DP (on all the domains) - I do not see any category as 'MOMENTARY'.
    Would like to know, what kind of NOTIFY event DP is generating for this? I got email with NOTIFY so it is clear we are sending NOTIFY type events to SNMP but not getting the category of this event.(or atleast, I am not able to see :) )
    > vish_pandit wrote:
    > Thanks for the responses.
    >
    > Kumar, the firmware version is
    > +
    > Firmware version - 4.0.2
    > Datapower XI50
    > +
    > I had mentioned it in my initial post.
    >

    Not sure it is firmware bug because I am running in same firmware without any issue. I wonder the MIB files you gave to monitoring team are corrupted? Did you reload the appliance and check whether you are getting the same error?

    If an SNMP trap comes in and is classified as an "/Unknown" event type, this is because Zenoss does not know what you want to do with this event.
    http://docs.huihoo.com/zenoss/admin-guide/2.1.1/ch10s15.html
  • SystemAdmin
    SystemAdmin
    6772 Posts

    Re: Queue overflow: xxx event(s) lost

    ‏2013-02-26T18:51:36Z  
    > vish_pandit wrote:
    > Thanks for the responses.
    >
    > Kumar, the firmware version is
    > +
    > Firmware version - 4.0.2
    > Datapower XI50
    > +
    > I had mentioned it in my initial post.
    >

    Not sure it is firmware bug because I am running in same firmware without any issue. I wonder the MIB files you gave to monitoring team are corrupted? Did you reload the appliance and check whether you are getting the same error?

    If an SNMP trap comes in and is classified as an "/Unknown" event type, this is because Zenoss does not know what you want to do with this event.
    http://docs.huihoo.com/zenoss/admin-guide/2.1.1/ch10s15.html
    Thanks.

    The reason is large number of events we are sending to syslog and SNMP. As I said, we would be disabling that as we do not want to miss other data.

    About rebooting the appliance, we have NOT done that. Also, I think, there is NO issue with any of the log Target.
    It is just that, there was large volume of events being sent.
    From DP help
    +
    dropped
    The number of events that this logging target has dropped because there were too many pending.

    pending
    The number of events pending for this logging target, waiting to be stored at the destination.
    +
    Also the boxes are PROD boxes and I are trying NOT to reboot them. But, if issue does not get resolved, will restart it.

    Also I would be checking with SNMP team to fine tune the settings.
  • msiebler
    msiebler
    140 Posts

    Re: Queue overflow: xxx event(s) lost

    ‏2013-02-26T18:53:25Z  
    Thanks.

    The reason is large number of events we are sending to syslog and SNMP. As I said, we would be disabling that as we do not want to miss other data.

    About rebooting the appliance, we have NOT done that. Also, I think, there is NO issue with any of the log Target.
    It is just that, there was large volume of events being sent.
    From DP help
    +
    dropped
    The number of events that this logging target has dropped because there were too many pending.

    pending
    The number of events pending for this logging target, waiting to be stored at the destination.
    +
    Also the boxes are PROD boxes and I are trying NOT to reboot them. But, if issue does not get resolved, will restart it.

    Also I would be checking with SNMP team to fine tune the settings.
    can you paste the status of all your log targets?
  • SystemAdmin
    SystemAdmin
    6772 Posts

    Re: Queue overflow: xxx event(s) lost

    ‏2013-02-26T18:56:04Z  
    Thanks.

    The reason is large number of events we are sending to syslog and SNMP. As I said, we would be disabling that as we do not want to miss other data.

    About rebooting the appliance, we have NOT done that. Also, I think, there is NO issue with any of the log Target.
    It is just that, there was large volume of events being sent.
    From DP help
    +
    dropped
    The number of events that this logging target has dropped because there were too many pending.

    pending
    The number of events pending for this logging target, waiting to be stored at the destination.
    +
    Also the boxes are PROD boxes and I are trying NOT to reboot them. But, if issue does not get resolved, will restart it.

    Also I would be checking with SNMP team to fine tune the settings.
    How many log targets you have?
  • SystemAdmin
    SystemAdmin
    6772 Posts

    Re: Queue overflow: xxx event(s) lost

    ‏2013-02-26T20:09:40Z  
    How many log targets you have?
    name status memory(kbytes) processed(events) dropped(events) pending(events) error info
    default-log Active 14 905142 0 0 none
    mqDuration-log Active 4 2325913 0 3 none
    snmplogtarget Active 0 1960757 4359 3 SNMP Trap sent
    syslog-svcbus Active 0 2136801 4554 3 Message sent

    Well, we have 4 log targets in default domain(as above)
    and two more(default-log and one userDefined) in application domain. So total of 6 log targets.
    And we have 5 boxes. so 5*6. :)

    I am pretry much sure, the issue is due to high volume being sent and I need to change event subscription for these log targets.
  • SystemAdmin
    SystemAdmin
    6772 Posts

    Re: Queue overflow: xxx event(s) lost

    ‏2013-02-26T20:17:40Z  
    name status memory(kbytes) processed(events) dropped(events) pending(events) error info
    default-log Active 14 905142 0 0 none
    mqDuration-log Active 4 2325913 0 3 none
    snmplogtarget Active 0 1960757 4359 3 SNMP Trap sent
    syslog-svcbus Active 0 2136801 4554 3 Message sent

    Well, we have 4 log targets in default domain(as above)
    and two more(default-log and one userDefined) in application domain. So total of 6 log targets.
    And we have 5 boxes. so 5*6. :)

    I am pretry much sure, the issue is due to high volume being sent and I need to change event subscription for these log targets.
    I agree maybe changing the event subscrption may do the trick. BTW how many event subscription do you have?
  • SystemAdmin
    SystemAdmin
    6772 Posts

    Re: Queue overflow: xxx event(s) lost

    ‏2013-02-26T20:28:28Z  
    I agree maybe changing the event subscrption may do the trick. BTW how many event subscription do you have?
    Differs from log target to logTarget.

    But we have
    All -Error
    cert-monitor - warning

    and then we hvae couple of user defined log category which we monitor on Error level and some on critical.
  • SystemAdmin
    SystemAdmin
    6772 Posts

    Re: Queue overflow: xxx event(s) lost

    ‏2013-02-26T20:53:12Z  
    Differs from log target to logTarget.

    But we have
    All -Error
    cert-monitor - warning

    and then we hvae couple of user defined log category which we monitor on Error level and some on critical.
    For the sake of others who encountered this issue -

    1. Check the 'Logging Targets' from Status menu.
    2. You will find how many are pending and how many are discarded
    3. Check event subscription for the logTarget in question. You may want to see increase priority level to 'error' or higher.
    4. Check with external system(SNMP or SYSLOG) if there is any issue at their end.

    If there are no issues with Log Target, then as suggested earlier, restart the domain or reload the appliance.