Topic
  • 5 replies
  • Latest Post - ‏2012-04-05T13:54:06Z by SystemAdmin
SystemAdmin
SystemAdmin
102 Posts

Pinned topic Health policies related queries

‏2011-12-15T06:29:36Z |
Hi ,

Kindly help me w.r.t. these health policies related queries

1. For a transaction based application ,in addition to the default health policies, kindly suggest what more policies can be applied.

2. In "Default_Excessive_Memory_Usage" health policy ,health condition is triggered when jvm heap size reaches 95% .Is this %age calculated considering the ' Max heap size ' value set at each individual server's "Java Virtual Machine" settings.

3. In case of StormDrain condition , which detects a drop in average response time, what should be the ideal action that should take place when it is triggered.
Updated on 2012-04-05T13:54:06Z at 2012-04-05T13:54:06Z by SystemAdmin
  • KeithS
    KeithS
    17 Posts

    Re: Health policies related queries

    ‏2011-12-15T14:17:51Z  
    1) It really depends on what type of transaction-based application you have. Can you provide more details?

    2) The percentage of heap used is calculated using the max heap size setting on each server.

    3) The action depends on the application. If the app could be failing due to a DB failure, then all servers in a cluster would be failing at the same time. In this case, you want to simply notify an administrator to investigate. If the application is single-tiered (i.e. doesn't talk to a DB or other backend), then the action should be to restart the server because the failure causing the storm drain condition could only be within the JVM that is being restarted.
  • SystemAdmin
    SystemAdmin
    102 Posts

    Re: Health policies related queries

    ‏2011-12-17T14:38:29Z  
    • KeithS
    • ‏2011-12-15T14:17:51Z
    1) It really depends on what type of transaction-based application you have. Can you provide more details?

    2) The percentage of heap used is calculated using the max heap size setting on each server.

    3) The action depends on the application. If the app could be failing due to a DB failure, then all servers in a cluster would be failing at the same time. In this case, you want to simply notify an administrator to investigate. If the application is single-tiered (i.e. doesn't talk to a DB or other backend), then the action should be to restart the server because the failure causing the storm drain condition could only be within the JVM that is being restarted.
    Hi Keith,

    Thanks for your reply.
    1. By transaction based application, what i really mean to say is that ours is an banking application.I want to know what other health policies can be set(other than the default) ,that can form the first line of defense against health related problems.

    2. Regarding Storm drain condition based on what you have suggested,drop in response times could be due to not necessarily server jvm related issues, but also if there is DB failure (due to any reasons),connectivity issue etc.
    So restarting the server is not always the appropriate resolution.
  • KeithS
    KeithS
    17 Posts

    Re: Health policies related queries

    ‏2012-01-02T18:46:54Z  
    Hi Keith,

    Thanks for your reply.
    1. By transaction based application, what i really mean to say is that ours is an banking application.I want to know what other health policies can be set(other than the default) ,that can form the first line of defense against health related problems.

    2. Regarding Storm drain condition based on what you have suggested,drop in response times could be due to not necessarily server jvm related issues, but also if there is DB failure (due to any reasons),connectivity issue etc.
    So restarting the server is not always the appropriate resolution.
    1) Other recommended custom health policies to consider are:
    a) Hung thread detection
    Sample expression is:
    PMIMetric_FromLastInterval$threadPoolModule$concurrentlyHungThreads > 3L
    for more than 3 hung threads.
    Recommended actions are to take thread dumps and then restart the server
    b) Total process memory size to detect leakage in native memory
    Sample expression is:
    PMIMetric_FromLastInterval$xdProcessModule$processTotalMemory > 2048L
    Recommended action is to restart the server
    c) Slow DB response times by monitoring the JDBC connection pool
    Sample expression is:
    PMIMetric_FromLastInterval$connectionPoolModule$avgWaitTime > 5000L
    which triggers when having to wait more than 5 seconds for a connection from the pool.
    Recommended action(s) is to execute a custom action and/or just notify an administrator of DB issues.

    2) Correct. Storm drain can be useful for certain types of app problems, but it is likely that the excessive request timeout and/or excessive response time health policies would also catch these same problems. In short, it is recommended that you use storm drain in supervise mode for each app for an evaluation period. If you get false positives, do not use storm drain for that app.

    Keith
  • SystemAdmin
    SystemAdmin
    102 Posts

    Re: Health policies related queries

    ‏2012-01-05T10:52:03Z  
    Thanks Keith,
    W.r.t. the custom health policies , I think the 'Hung thread detection' and 'slow db response times' policies will be very helpful to us.
    I will try to implement these in our application.

    Regards,
    Paresh
  • SystemAdmin
    SystemAdmin
    102 Posts

    Re: Health policies related queries

    ‏2012-04-05T13:54:06Z  
    • KeithS
    • ‏2012-01-02T18:46:54Z
    1) Other recommended custom health policies to consider are:
    a) Hung thread detection
    Sample expression is:
    PMIMetric_FromLastInterval$threadPoolModule$concurrentlyHungThreads > 3L
    for more than 3 hung threads.
    Recommended actions are to take thread dumps and then restart the server
    b) Total process memory size to detect leakage in native memory
    Sample expression is:
    PMIMetric_FromLastInterval$xdProcessModule$processTotalMemory > 2048L
    Recommended action is to restart the server
    c) Slow DB response times by monitoring the JDBC connection pool
    Sample expression is:
    PMIMetric_FromLastInterval$connectionPoolModule$avgWaitTime > 5000L
    which triggers when having to wait more than 5 seconds for a connection from the pool.
    Recommended action(s) is to execute a custom action and/or just notify an administrator of DB issues.

    2) Correct. Storm drain can be useful for certain types of app problems, but it is likely that the excessive request timeout and/or excessive response time health policies would also catch these same problems. In short, it is recommended that you use storm drain in supervise mode for each app for an evaluation period. If you get false positives, do not use storm drain for that app.

    Keith
    These should be implemented for better monitoring!

    And...I have a question.
    In your hung thread detection sample, is "threadPoolModule" implicitly equivalent to Web Container thread pool? Or total thread pool including Default, ORB, HAManager and so on?