Skip to main content

Notes from Support: Calling Support with a Domino server crash

Bret H Swedeen (swedeen@us.ibm.com), GWA IT Architect, IBM
Bret Swedeen joined Lotus in the summer of 1997 as a Principal Knowledge Architect. Bret has worked in the computer industry for nearly 8 years doing everything from Notes phone support at Corporate Software to Notes consulting at Coopers & Lybrand. A Certified Netware Engineer (CNE) and Certified Lotus Professional (CLP) System Administrator and Application Developer, Bret has also written articles for the Lotus Notes Advisor, Database Advisor, LAN Times, and LAN Magazine. Most recently Bret authored the Lotus Notes 4.5 Administrator's Guide from Sybex publishing.

Summary:  This month's article focuses on the information you need to gather when you encounter a server crash.

Date:  01 Jul 1999
Level:  Introductory
Activity:  833 views

The From the Field column is brought to you by Lotus Support, and features technical articles based on our experiences with helping customers to find solutions in the field. This month's article focuses on the information you need to gather when you encounter a server crash.

Updated: This article has been updated with S/390 information.

There's nothing fun about a Domino server crash. Everything stops working, users get frustrated, and everyone blames you -- the Domino administrator. Fortunately, most administrators can quickly troubleshoot a server crash and get their machine back online before angry users storm the data center; however, what if you're not so fortunate? You try everything and still your server refuses to cooperate -- That's when it's time to call Lotus Support.

Calling any technical support center takes time -- the one thing most Domino administrators don't have enough of. Of course, the key to an effective support call is preparation. No one wants to spend the first half of a support call chasing after version numbers and environment details. You want to get to the heart of the matter, and get that server running again. If you can identify the type of server crash, provide server details, and have various configuration files and server logs readily accessible, you can make the most of your first call to support. Details of what you need to collect before you call is the focus of this article From the Field.

Where to start

Before we start listing all you need before you call, let's clarify an important point: All the preparation in the world won't guarantee a quick solution to a server crash. Troubleshooting a Domino server crash takes time. The point to all of the preparation steps outlined in this article is to help you get your support call off to the right start and make the most of your time spent on the phone.

A crash is a crash, is a crash, is a crash. Right? Wrong! Most Domino server crashes fall into one of the following six categories. Start by identifying which category your crash falls into:

  • Domino server crash : The Domino server stops running, including all related processes. This action may result in the creation of a NOTES.RIP file, DRWTSN32.LOG file, or an application exception. Other applications, however, continue to run and the operating system remains responsive.
  • Domino server hang : The Domino server appears to work; however, it no longer responds to Notes clients, and the server console does not accept keyboard input. This suspended state does not generate a NOTES.RIP file or a DRWTSN32.LOG file. Similar to a complete server crash, however, other applications continue to run and the operating system remains responsive.
  • Individual server task crash : The Domino server continues to run; however, an individual task (such as the mail router) no longer responds.
  • Domino server down : The Domino server crashed and will not restart, or the server crashes immediately after starting. In either case, the server does not continue to run.
  • Operating system hang : The operating system does not respond to keyboard or mouse commands. There's no way to elegantly shutdown or restart the operating system without using the power switch.
  • Operating system crash : Each operating system displays this state differently. Windows NT displays a blue screen with unintelligible information. OS/2 displays "trap" error information. Novell NetWare does what they call an abend. Regardless of which operating system has crashed, the end result is the same -- the machine no longer responds and it takes a flip of the power switch to get going again.

The following graphic helps illustrate the point of failure for each crash category:


Figure 1. Domino crashes categories
Domino crash categories

Note: If your server crash is due to an operating system hang or crash, start by troubleshooting the operating system first. You may need to contact the operating system manufacturer before calling Lotus Customer Support.

Once you've identified the type of server crash, collect the following general information:

  • Machine make, model, and general configuration (memory, processor, and hard disk space)
  • Network card make and model
  • Network protocols and versions (if applicable)
  • Operating system and version
  • Service packs or patches (and approximate application dates)
  • Domino server release (and when you applied the last QMR, if applicable)
  • Server tasks that run on this server
  • Notes client releases that access this server
  • General role of this server in your Domino environment (hub or spoke server)
  • Other applications that run on this machine

The support analyst might not ask for all of this information, but don't worry -- better to have too much information than not enough.


Files to collect

Now that you've classified your server crash and collected some general information, start collecting the various configuration files and server logs. Once you've collected all of these configuration files and activity logs, group them into a single ZIP file for easy transmission.

The following sections identify the files to collect, grouped by operating system:

Windows NT (Intel and Alpha platforms)

Collect the following files for crashes on Windows NT/Intel or Windows NT/Alpha:

  • The Notes rip file (NOTES.RIP), if one was generated during the crash (for more information about the NOTES.RIP file, see the sidebar "Rip files and Quincy")
  • The NOTES.INI file
  • The LOG.NSF database
  • Windows NT Diagnostics file(s). You can easily collect this information in the following manner:
    1. Click the Windows NT Start button, and choose Programs - Administrative Tools - Windows NT Diagnostics.
    2. Click Print.
    3. Set the Default Level to Complete, and the Destination to File.
    4. Click OK.
    5. Enter the filename and destination directory in the next dialog box and click OK.
  • Windows NT Event Viewer log files. You can easily capture this information from the Event Viewer in the following manner:
    1. Click the Windows NT Start button, and choose Programs - Administrative Tools - Event Viewer.
    2. Choose Log - System.
    3. Choose Log - Clear All Events.
    4. When prompted if you want to save the log events, click Yes.
    5. Enter a filename and destination directory (the file is saved with an EVT extension).
    6. When prompted if you want to clear all events, click Yes.
    7. Repeat these steps for the Security, and Application event screens.

Along with these files, capture the list of running processes from the NT Task Manager. You can easily capture this information in the following manner:

  1. Press CTRL+ALT+DEL and click Task Manager.
  2. Click the Processes tab.
    Windows NT Task Manager
  3. With your mouse, resize the Task Manager so you can see all processes without scrolling.
  4. Press ALT+Print Screen.
  5. Open Microsoft Paint and maximize the application so it fills the screen.
  6. Choose Edit - Paste.
  7. Choose File - Save.
  8. Name the image and save it as a 16-color Windows bitmap image.

OS/2

Collect the following files for crashes on OS/2:

  • The Notes rip file (NOTES.RIP), if one was generated during the crash (for more information about the NOTES.RIP file, see the sidebar "Rip files and Quincy")
  • The NOTES.INI file
  • The LOG.NSF database
  • The CONFIG.SYS file
  • The OS/2 version and fix pack information. You can easily retrieve this information from the command prompt and redirect to a text file with the following command: ver /r > ver.txt
  • OS/2 syslevel information. You can easily retrieve this information from the command prompt and redirect to a text file with the following command: syslevel > syslevel.txt
  • A list of all running processes. You can easily retrieve this information from the command prompt and redirect to a text file with the following command: pstat >pstat.txt
  • Trap or SYS errors and registry dump if this information is displayed on the screen.

NetWare NLM

Collect the following files for NetWare NLM crashes:

  • The Notes rip file (NOTES.RIP), if one was generated during the crash (for more information about the NOTES.RIP file, see the sidebar "Rip files and Quincy")
  • The NOTES.INI file
  • The AUTOEXEC.NCF file
  • The STARTUP.NCF file
  • Any other .NCF files that load during the server startup

Note: NetWare names the first RIP file it creates NOTES.RIP, the next one NOTES1.RIP, the one after that NOTES2.RIP, and so forth. Double-check the date-time stamp on the RIP file to make sure you have the most recent file.

IBM AS/400 (OS/400)

Collect the following files for crashes on the IBM AS/400:

  • LOG.NSF database
  • MAIL.BOX database (SMTP crashes only)
  • Log file created by the Notes System Dump (NSD) utility, which contains the following relevant system and Domino server information:
    • Invocation stack of the failing thread
    • Current environment variables
    • Joblog of the failing job
    • NOTES.INI file
    • Status of all running Domino processes for the failing partition
    • Domino console at the time of failure

The NSD utility automatically runs when the server crashes. To find the appropriate NSD file, follow these steps:

  1. Change to the Notes data directory
  2. Look for the NSD file named nsd_ yyyymmdd _ hh.mm.ss.nsd (yyyymmdd represents the year, month, and day of the crash and hh.mm.ss represents the time of the crash).

NEW! IBM S/390 (OS/390)

Before collecting any files after a Domino crash on the S/390, make sure all maintenance levels and PTFs are up to date. If a crash occurs on an up to date system, collect the following files:

  • LOG.NSF database
  • Log file created by the Notes System Dump (NSD) utility (previously known as killnotes for Releases 4.5x and 4.6x)
  • Any and all system dump files such as CEEDUMP (located in the Notes data directory), and SYSDUMP
  • Any other output files that get generated such as the Server Console Output Log, STATREP.NSF, or debug output files if specific NOTES.INI settings have been added for data collection (such as semaphore debug statements in the file named SEMDEBUG.TXT)

UNIX

Collect the following files for crashes on UNIX:

  • The LOG.NSF database
  • The log file created by the Notes System Dump (NSD) utility. This log file contains relevant system information and Domino information, including the NOTES.INI file, the contents of the Notes data directory, available disk space, patches on the system, and the status of all running Notes processes (including the core file, if generated). To run NSD, follow the steps listed below:
    1. Change to the Notes data directory.
    2. At the command prompt, type "run /opt/lotus/bin/nsd"

All NSD results are written to the log file, and also to the screen. When NSD is finished, the name given to the log file appears on the screen.


Troubleshooting double-check

At this point, you've collected a wealth of information about your server. Before you rush for the phone, step back for a moment and double-check all of the troubleshooting you've done up to this point. Also, don't overlook the obvious causes for the crash. Sometimes, if you're too close to the problem, or in a state of panic, it's easy to bypass the simple explanation.

Here are some pointers to help you out with troubleshooting:

  • Visit the Lotus Support Web site and search the Lotus Knowledge Base for tech notes documenting a similar problem. These tech notes might contain a suitable solution or work-around.
  • If you're running an older release of the Domino server, check the fix list in the Release Notes of more recent versions to see if your problem is resolved in a newer release.
  • Think about what's changed on the server. This could include, but not be limited to:
    • New or upgraded hardware
    • Updated firmware
    • Operating system upgrade
    • Operating system patch or service pack
    • Other software upgrade or new installation, especially new server addin tasks
    • Out-of-date or recently updated device drivers

If something has changed, try to back out of the recent changes. Does the problem persist?

  • Make sure to completely analyze the LOG.NSF database for specific error messages that might indicate the source of the crash. Focus on the Miscellaneous Events, Replication Events, and Router Events views. Also, temporarily increase the logging level of certain events to reveal greater detail about the server activity (for more details, see the sidebar Increasing logging levels)
  • Review the server's Stats and Events database, and specifically look at the following statistics:
    • Disk space
    • Available memory
    • Database replication
    • Mail router
      • Determine whether the crash happens under certain conditions or a particular time of day. If so, what's unusual about these conditions or time of day? You can also use the Mean Time Between Failure (MBTF) tool to help determine crash patterns (for more details about the MTBF tool, see Measuring your Domino server's reliability).
      • If the problem is network-related, make sure to use the Trace Connection tool.
    • Choose File - Tools - User Preferences, or File - Preferences - User Preferences.
    • Click Ports.
    • Click Trace with Full Trace Information enabled.

For additional help troubleshooting a network problem, try NotesCONNECT. This tool helps test IP connectivity and is designed for NT and Win95 only.


Making that call

Now that everything is together and you've double-checked your troubleshooting, make that call to Lotus Support (you can find contact information in the Lotus Support Worldwide Guide). Keep in mind, however, that even though you're more than prepared, troubleshooting a server crash takes time. The analyst on the other end of the phone doesn't have the benefit of physically being where you are, and it takes time to digest all of the information you've collected. Be patient and work with the analyst, they're there to help you.


Resources

About the author

Bret Swedeen joined Lotus in the summer of 1997 as a Principal Knowledge Architect. Bret has worked in the computer industry for nearly 8 years doing everything from Notes phone support at Corporate Software to Notes consulting at Coopers & Lybrand. A Certified Netware Engineer (CNE) and Certified Lotus Professional (CLP) System Administrator and Application Developer, Bret has also written articles for the Lotus Notes Advisor, Database Advisor, LAN Times, and LAN Magazine. Most recently Bret authored the Lotus Notes 4.5 Administrator's Guide from Sybex publishing.

Comments (Undergoing maintenance)



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Lotus
ArticleID=23390
ArticleTitle=Notes from Support: Calling Support with a Domino server crash
publish-date=07011999
author1-email=swedeen@us.ibm.com
author1-email-cc=htc@us.ibm.com

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Special offers