I visit customers and do heath checks. Here are some of the questions I ask when looking into applications
- When implementing new applications, or making significant changes to existing applications, do you give the system programmers enough information for them to make the application Highly Available?
- There is a new application coming
- Is the application rolled out slowly or dumped into production..
- There are the following new messages/events being produced and need to be monitored, and what actions to take. Eg Detection of problems.
- There are the following new queues, these are the queue high event depths
- So you know when there is a problem with the application
- What other monitoring do you do?
- This is the DLQ to be used
- Messages going to a DLQ usually indicate a problem - eg queue full,or bad data
- Expected change to MQ resources
- Amount of data logged
- Isolation from other applications does it need own Structure/SMDS?
- Expected queue depths ( for example 100 messages or 1 million messages)
- Do you use the right persistence?
- Some customers have application recovery logic, and so the messages can be non persistent.
- Inquiry type messages are typically non persistent
- Use Accounting class(3) data to see what is being used.
- Small messages are better than big messages - for example do you need to send 10MB of blanks?
- If you have very small messages - under 100 bytes - can you have multiple in one message to save space?
- Do you handle poisoned messages. This is when application gets a message - abends, and rolls back, It does this repeatedly
- Queue should have HARDENBO , and application should check BOCOUNT and put message to BOQNAME if the BOCOUNT > 2
- Do applications report problems
- Unexpected shut down - so automation can notify people
- Unexpected errors - eg message too long, queue full, data consistency problems
- Do you consider use of one Reply To Queue per application or per CICS.application rather than one reply to queue used by all applications
- This is to provide application Isolation
- Using dynamic reply to queue - what is your process for processing 'dead' message and deleting the queue.
- This prevents a build up of redundant queues
- Do you have one DLQ for all applications or one per business application
- What happens if a message goes on the DLQ queue - and who gets notified
- Queue events - do you use queue high(and queue low) events
- What processing do you do?
- Is Queue high event small enough to give you time to fix the problem, and high enough not to get false positives?
- Do you use COA and COD to show message has arrived and been delivered
- Do you understand the business application
- Expected throughput in peak messages/hour , peak queue depth,
- Average and maximum message size
- What is used in the application, eg CICS transaction name, queue, channels, remote queue managers.
- So if problem with channel X you know what business transactions are impacted
- What puts to the queue
- What gets from the queue
- When should queue high event be created
- What max depth do you need to specify
- What is the cost to the business if this transaction has a problem. The higher the cost - the more attention operation needs for this
- Tell MQ sysprogs of expected CPU usage peak hour,max depth ( for page set, CF usage)
- Any encryption
- What security - eg who can put/get to the queue
- Is the normal queue depth small - or large
- Is all this passed to the system programmers before putting this application into production
- Trigger every
- As well as being expensive it can issue many START requests in CICS. If CICS reaches MAXTASK or TCLASS then other work affected.
- Consider EXEC CICS start PROTECTED
- Use of trigger every is ok for low throughput ( 1 message a second)
- High transaction rate can cause many CICS transactions to start, and hit CICS TCLASS or MAXTASK - and so impact other work
- CICS Transactions may all run on one CICS even though trigger generated on different LPAR - depends on how close the CF is
- Better to use long running tasks that do - as an example - please contact me if you are thinking of using this so I can explain some of the more complex points.
- Do I = 1 to 1000
MQGET with wait for 1 second
If no messgae found then leave
do DB2 update
EXEC CICS SYNCPOINT
MQINQ queue depth
MQINQ handles open for input
if curdepth > 100 and input handles < 20 then
EXEC CICS START TRAN(...)