we're using AIX 5300-09-02-0849 on p570 LPARs with dedicated network adapters. There are two LPARs in a cluster r4201 and r4251 running SAP. r4201 runs the central instance and r4251 runs the application server. The /usr/sap directory for instance is mounted from node r4201 to both nodes, on r4201 local FS:
/dev/isp02lv02 16777216 4270732 75% 5874 1% /isp_global/usr/sap
and on r4201 and r4251 via NFS:
r4201i1:/isp_global/usr/sap/ISP 16777216 4270732 75% 5874 1% /usr/sap/ISP
Recently we started getting this error:
Last time it was on other node r4251. The strange thing is, this happens only on one node at a time but both nodes have this filesystem mounted from the very same resource on r4201. The other remarkable thing is it happens only in one particular file: /usr/sap/ISP/DVEBMGS00/data/PAGFIL00
NFS write error on host r4201i1: 78.
Currently on host r4201:
root@r4201 [/usr/sap/ISP/DVEBMGS00/data] ls -l PAGFIL00 NFS write error on host r4201i1: 78. -rw-rw---- 1 ispadm sapsys 2012053504 Jun 10 13:19 PAGFIL00
and on host r4251:
root@r4251 [/usr/sap/ISP/DVEBMGS00/data] ls -l PAGFIL00 -rw-rw---- 1 ispadm sapsys 2012053504 Jun 10 13:19 PAGFIL00
Do you have any other solution than to stop SAP and re-mount this NFS filesystem? (which solves the problem for some time) Customer is unhappy that he has to stop production and we're getting tired of creating priority 1 incident reports. We opened the problem to SAP but they promptly closed this as problem of operating system, although it happens on ONE PARTICULAR SAP FILE everytime:
thanks for the new information. There is no file attached to the messageIt indeed looks like a problem with NFS. The error number 78 is
indicating a timeout. One possibility to solve the issue could be to
mount the filesystem with the options 'hard' and 'intr'. For further
information you can read the following document: 'http://www16.boulder.ibm.com/doc_link/en_US/a_doc_lib/aixbman/prftungd/nfsclitun.htm'
Interesting for you are the sections "Performance implications of hard
or soft NFS mounts" and "Unnecessary retransmits". If this does not
solve your issue, you should consider openening a PMR with IBM to check
your network setup. Maybe there is somewhere a bottleneck which is
leading to the timeouts on the NFS mount.
I attach SAP Note #15.
Kind regards, Csaba GÖTZ
The recommendation for "hard" mounting is a bit out of the way because this may cause problems with even everyday operation of the applications. If you know any preventive settings or solution that won't disrupt production we'll be eternally grateful. So far i'm going to search for any information.