Pinned topic NFS write error on host r4201i1: 78.

‏2009-06-12T08:41:12Z |

we're using AIX 5300-09-02-0849 on p570 LPARs with dedicated network adapters. There are two LPARs in a cluster r4201 and r4251 running SAP. r4201 runs the central instance and r4251 runs the application server. The /usr/sap directory for instance is mounted from node r4201 to both nodes, on r4201 local FS:

/dev/isp02lv02    16777216   4270732   75%     5874     1% /isp_global/usr/sap

and on r4201 and r4251 via NFS:

r4201i1:/isp_global/usr/sap/ISP    16777216   4270732   75%     5874     1% /usr/sap/ISP

Recently we started getting this error:

NFS write error on host r4201i1: 78.
Last time it was on other node r4251. The strange thing is, this happens only on one node at a time but both nodes have this filesystem mounted from the very same resource on r4201. The other remarkable thing is it happens only in one particular file: /usr/sap/ISP/DVEBMGS00/data/PAGFIL00

Currently on host r4201:

root@r4201 [/usr/sap/ISP/DVEBMGS00/data] ls -l PAGFIL00 NFS write error on host r4201i1: 78. -rw-rw----    1 ispadm   sapsys   2012053504 Jun 10 13:19 PAGFIL00

and on host r4251:

root@r4251 [/usr/sap/ISP/DVEBMGS00/data] ls -l PAGFIL00 -rw-rw----    1 ispadm   sapsys   2012053504 Jun 10 13:19 PAGFIL00

Do you have any other solution than to stop SAP and re-mount this NFS filesystem? (which solves the problem for some time) Customer is unhappy that he has to stop production and we're getting tired of creating priority 1 incident reports. We opened the problem to SAP but they promptly closed this as problem of operating system, although it happens on ONE PARTICULAR SAP FILE everytime:

Dear Martin,

thanks for the new information. There is no file attached to the messageIt indeed looks like a problem with NFS. The error number 78 is
indicating a timeout. One possibility to solve the issue could be to
mount the filesystem with the options 'hard' and 'intr'. For further
information you can read the following document: ''
Interesting for you are the sections "Performance implications of hard
or soft NFS mounts" and "Unnecessary retransmits". If this does not
solve your issue, you should consider openening a PMR with IBM to check
your network setup. Maybe there is somewhere a bottleneck which is
leading to the timeouts on the NFS mount.
I attach SAP Note #15.

Kind regards, Csaba GÖTZ

The recommendation for "hard" mounting is a bit out of the way because this may cause problems with even everyday operation of the applications. If you know any preventive settings or solution that won't disrupt production we'll be eternally grateful. So far i'm going to search for any information.