Table of Contents
The purpose of a Health Check is to perform a reasonably deep analysis of the Domino server configuration and underlying services it requires (such as hardware resource, disk and network). A thorough health check aims to identify where best practices could be implemented. It should also uncover any vulnerabilities that may compromise integrity, availability or confidentiality of the information served by Domino.
This article provides Administrators with a checklist to help them get started with performing a Health Check. Additionally, certain points are explorer in greater depth.
A health check compromises three key elements:
1. Inputs (Gathered by reviewing hardware and software configurations)
2. Analysis (A review of the inputs, in the context of your Domino environment)
3. Recommendations (Actions identified from the analysis phase)
High Level Checklist for Health Check
If you cover all these points, you will have performed a comprehensive health check:
- Inspect Server Hardware
- Inspect Server Configuration
- Inspect Domino Mail Routing
- Inspect SMTP Mail Routing
- Inspect Domino Replication
- Inspect Directory Services
- Inspect Domino Clustering
- Inspect Domino Security of Core Databases (names, admin4, events4)
- Inspect Domino Server Topology
- Inspect NOTES.INI files
- Inspect Server Logs
- Inspect Server Statistics
- Inspect Server Documents
- Inspect Connection Documents
- Inspect Server Configuration Documents
- Inspect Program Documents
- Inspect Active Server Tasks
- Report on Disaster Recovery Procedures
- Report on Backup Procedures
- Report on Number of Databases and Size
- Report on Database Activity
- Report on Agents and Runtime
Performing the Health Check
A health check is more than just looking at your environment. A good health check requires that you document your findings to be reviewed with your team or management. It is also a way to track your environment health. Throughout the rest of this document you will see not just the steps to perform the check, but guidelines on what data your report should include.
In order to provide some context to the health check it is useful to include high-level architecture diagrams. This will assist the reader to understand the key components that make up the Domino environment.
Diagrams should succinctly describe the following information to the reader:
- The number and physical location of Domino servers
- IP addresses/host names
- Domino clusters
- The client types accessing the environment
- The network environment
- External access (from the internet)
- Mail routing topology
- Replication topology
For additional information refer to LINK to Doc topic
Current Client Environment
Use this section to get a picture of the user base that the environment serves. Describe the approximate number of users and their access methods (Notes client, browser, mobile device etc.). Person documents inside the Domino Directory often contain a list of Notes client versions used to access the server if you are unsure which client types are in use. This information is stored in the Administration tab. You can also enable license tracking
to give you an easier way of reviewing this information.
Domino Server Builds
This section of your health check is similar to an inventory. It lists the Domino servers in the environment together with their role, host operating system and version (including any hot fixes).
Take note of mixed version environments, especially within across members. Best practice suggests that cluster members should all be at the same Domino version (e.g. 8.5.2).
Your health check should include an overview of the current hardware being used. Monitor key system resources over an extended period of time. You should aim to monitor the systems for weeks or months. This will help to iron out any irregular variation. Reporting on resource utilization has the following two benefits:
- Ascertain a steady-state baseline of resource utilization
- Help identify any pattern of spikes in resource utilization that can have an adverse effect on the service.
- Help plan when additional hardware or a server migration may be required
Operating tools such are useful. As a minimum, the following attributes should be measured:
- Total CPU utilization
- Total Memory (RAM) utilization
- Average Disk Queue Length
- Network utilization
The tools used and the exact statistics will vary slightly between platforms. Below is an example using Windows Performance Monitor (Perfmon.exe).
Figure 1 depicts a healthy server in terms of the resources outlined above. CPU utilization is typically low with spikes not exceeding about 50% during busy periods. There is plenty of free memory throughout the monitoring period. The average disk queue length is typically under 2, except occasional spikes.
Figure 1: Healthy server
Figure 2 depicts a server that reaches its CPU capacity and has most of its available memory utilized (up to 70%).
Figure 2: CPU capacity reaching peak
Figure 3 depicts a server where the average disk queue is very high, peaking at 80.
Figure : Average Disk Queue Length too high.
The disk subsystem is a key component in the Domino server configuration. Various factors such as the location of files, the RAID level and free space should not be overlooked.
Best practice suggests you should have separate disk volumes (separate spindles) for the following components:
- Operating System
- Domino binaries (Program Code)
- Domino data directory
- DAOS Repository
- Transaction Log drive
- View Rebuild drive
- Swap Drive
Note: On IBM i the operating system automatically manages your drives through its single level storage capabilities. The operating system and all Domino components can reside in the primary auxiliary storage pool (asp) with no negative performance side affects. Using the primary asp is the default and recommended configuration for this platform.
As a rule of thumb, maintain at least 20% free disk space to reduce the amount of file fragmentation. Performance usually degrades as a system runs out of disk space. If there is no available space remaining this can lead to a server crashing or panic.
RAID at the OS/Software level is not recommended. Hardware-based RAID controllers should be used in production Domino servers. There is always a balance to achieve between the disk IO performance, redundancy and usable disk space gained from each RAID level. Figure 4 lists the most appropriate RAID level for each Domino component.
Note: On IBM i/5, RAID level 5 or 6 is suitable for all volumes.
RAID 10 or RAID 5
RAID 1 or RAID 10
View Rebuild Temporary Files
| O/S Swap drive||RAID 1|
Figure 4: RAID Levels
Furthermore it is best practice to use a separate RAID controller dedicated to the transaction log drive.
View Rebuild Drive
Dedicate a RAID 1 array to store temporary files used to perform view index rebuilds. This can improve performance especially during busy times when many users open their Inbox for the first time. IBM Technote #1090462
documents the procedure in more detail. Practice suggests that the rebuild drive is configured to use a dedicated RAID 1 array composed of two dedicated solid-state disk drives or physical disk drives.
Note: For IBM i/5, the view rebuild directory may be in the primary ASP, but can be located outside of the server's data directory for increased performance.
During a health check a review of transaction logging configuration should be done. In most cases, Transaction Logging can safely be enabled in order to reap the benefits. An excellent guide to Transaction Logging best practices is available in IBM Technote #7009309
Transaction Logging configuration should differ according to several factors. These include:
- Available Server Disk Layout
- Available disk space
- Domino Server Usage (Sametime server or Mail server etc.)
- Type of backup strategy and backup tool availability
This topic is discussed further in the Transaction Logging
section of this wiki.
To determine the health of your notes.ini consider the following:
- The server NOTES.INI configuration file contains settings that can modify default behavior and be used to tune the server. Care should be taken when modifying the NOTES.INI. Always take a backup and document any changes in your change management process.
- Directly modifying the NOTES.INI file can lead to mistakes. An accidental or incorrect change may cause Domino or Notes to run unpredictably. Therefore setting NOTES.INI parameters in the server configuration document or using the set config console command is safer.
- A text comparison tool is useful to highlight differences in NOTES.INI files. Look for consistency across server roles (e.g. mail, hub or application servers), and especially across cluster members.
- The NOTES.INI file tends to contain more obsolete parameters on servers that have undergone upgrades from earlier Domino releases. NOTES.INI parameters typically become obsolete because its function is superseded by a UI setting (in the server configuration document, or server document itself).
An example is MAILCLUSTERFAILOVER=1. This parameter was superseded by a setting in the Router/SMTP tab of the server Configuration document in Domino R5. While the NOTES.INI parameter will not overwrite the setting in the Configuration document, it can lead Administrators to think its set when its really not.
The ServerTasks parameter specifies tasks that begin automatically when the Domino server starts. These tasks consume memory and CPU so it is important to ensure only tasks necessary to the server's role are included. An example of a redundant tasks is the Rooms & Resource Manager (RNRMGR) running on a Sametime community server or SMTP running on an administration server. As part of a good health check you should ensure that all running tasks are still needed in the current environment.
Domino Configuration Tuner
In a nutshell, the DCT examines server configurations and compares them with an extensive set of best practice rules. It allows administrators to quickly and easily analyze an entire Domino domain and identify any parameters that are known to cause issues. The DCT is explored in more depth in 4.2 Document Configuration Tuner (DCT).
As part of any health check you should use the DCT tool and document the results and recommended actions.
The convenience and versatility of electronic mail makes it somewhat like an ever growing file cabinet (or attic). Large mail files, especially ones that contain a large number of documents, have a negative impact in several ways. For example:
- They rapidly consume server disk space, especially when they are replicated to multiple servers.
- Database view indexes consume more disk space, consume more CPU time and take longer to update.
- The Inbox (and other folders) require more time to update and open.
- Full-text indexes are larger and take more server resources to maintain.
- Backups and restores take more time to complete.
- Retaining old documents may violate document retention policies.
- Having many large files open simultaneously can exhaust server resources, especially on Windows.
For a discussion of large Domino mail files, see the paper entitled How Large Databases Uniquely Affect IBM Lotus Domino Server Performance
There are a variety of options available to Domino Administrators that help control the size of user's mail files. Firstly there are database size quotas, and the ability to withhold mail delivery from any mail file that is over its quota. Additionally, the size of individual messages that can be delivered by the router can be limited. Other advanced database properties discussed elsewhere (such as compression) can be effective at reducing the overall database size.
These topics are covered in more detail in 2.2 Managing a User's Inbox.
For best performance, keep as few documents in the Inbox as possible. As a rule of thumb, over 1000 documents is excessive. The Inbox Maintenance feature can be enabled can be run periodically to move documents out of users' inbox and into another folder.
Ports and Network configuration
As part of a health check you should review your network configuration including the ports defined on your server. Here are a list of tips to consider and document as part of your health check:
- The Domino server Ports configuration is defined by NOTES.INI parameters. The Ports parameter should contain all enabled ports defined on the server. The order is also important as the first port listed denotes which should be used for cluster communication.
- Port compression allows network traffic to be compressed before being sent to its destination. This may be either another Domino server, or a Lotus Notes client. Use the Administration client (Server -> Status tab, Tools -> Ports -> Setup) to enable or disable port compression. Do not enable port compression if your network switches and routers already compress traffic automatically as you there is a CPU overhead involved in compressing and decompressing the traffic.
- Best practice suggests a dedicated NIC and dedicated high speed network for Domino clustering to provide the fastest throughput. By using a dedicated network for cluster replication you offload the cluster's probe and replication network traffic from the user-access LAN, leaving more bandwidth for client communication with the cluster servers. This also provides a level of redundancy in the event of a network failure. For further information, read Configuring a Private LAN for a Domino Cluster.
Figure 6 is an extract from the server NOTES.INI of a cluster member that has the Server_Cluster_Default_Port
parameter set. With this configuration, the cluster replication (clrepl) task only uses this defined port. If this port fails, the clrepl task does not fail over to any other port.
TCPIP=TCP, 0, 15, 0,,32
Figure 6: Cluster configuration extract from NOTES.INI
- The Server_Cluster_Auxiliary_Ports NOTES.INI parameter will allow cluster replication to fail over to the other available ports, even when using the Server_Cluster_Default_Port= parameter. Its use is documented in IBM technote#1259288. In this case, add the parameter like so:
- Verify that all servers are configured to use the same line speed and duplex options as the switch to which they are attached to avoid potential network performance issues.
As part of your health check your replication topology should be reviewed and analyzed. Consider the following:
- If you are using a hub & spoke topology, best practices suggests initiating replication from the hub. This is to maximize the amount of resource available on the spoke servers to server client requests.
- Best practice suggests the number of replicator tasks should equal to the number of spoke servers with which the hub replicates. However, you should not exceed 20 replicators to avoid putting too much load on the server. If the server you intended to replicate is not a hub server, the recommended number of replicators should equal the number of processors on the server.
- How often are you currently replicating? Is one replication event completing before the next event begins?
- How long does it take changes to be completely propogated in your environment. Is this timing still acceptable or have business conditions or requirements changed?
As part of your health check your mail routing topology should be reviewed and analyzed. Consider the following:
- Is SPAM currently under control in your environment? Do you need to make changes to your SMTP inbound controls or consider using a 3rd party service or product to better manage SPAM in your environment.
- Is dead mail accumulating in your mail boxes?
In a complete health check security should also be considered. Here are a few tips. For a complete list refer to 1.3 Security checklist
- The Security tab of the Server Document contains fields that control server access and permissions. Organisations should have controls in place to ensure people are given the minimum amount of access necessary to perform a task. For instance, Database Administrators do not need Full Administrator Access to the Domino Domain.
- User and Server IDs can be attached to respective documents in the Domino Directory, however best practice is to securely store them outside of the NAMES.NSF and use features such as Certificate Authority and ID Vault.
- Rather than grant default ACL access to the Domino Directory, Administrators can set Default to No Access, then grant just the appropriate certifiers. For example, */ORG to Reader.
As part of a health check you should review your directory architecture. For information on using multiple directories, see 3.4 Multiple Directories
Directory assistance is a feature that enables a server to look up information in a directory other than a local primary Domino Directory (NAMES.NSF). You can configure directory assistance to use either a remote LDAP or another Domino Directory. Secondary Domino Directory referrals should be configured to use a local replica to the server for best performance. Administrators should create replicas of additional Domino Directories referred, on all servers where Directory Assistance is enabled.
Also consider the following:
- Are additional secondary directories required?
- Are all directories currently being used still needed?
- Should a mobile directory catalog or extended directory catalog be created?
- Is a mobile directory catalog being used on a server instead of the recommended extended directory catalog architecture?
As part of your health check, a review of the person documents is recommended. For example, field validation for the Person document does not specify that the Internet Address field be complete. Mail routing from external senders can still occur if an internet address is stored in the User Name
field. To help prevent against incorrect mail routing, all members with a Domino mail box should contain a valid external email address in the internet address field.
Policies are a powerful tool that enables Administrators to control many user's client settings. These can be used to simply set client options (such as spell-check before sending e-mails). They can also be used to enforce company policy (such as e-mail message disclaimers).
As part of your health check review which policies are in force. Also check policy rules for any contradictory policies that might be applied to users. It is good practice to have at least a Desktop, Security and Registration policy:
- Desktop Policy to help standardize the Notes client. Benefits include a reduction in support calls and easier troubleshooting.
- Security Policy to enforce password for compliance in-line with corporate instructions.
- Registration Policy to standardize creation of new users and their accounts. For example, ensure all new users have Editor access to their mail file (rather than Manager).
The topic of Policies is discussed in more detail in this wiki. See 2.3 Policies
Backup and Restore Procedures
As part of your health check you should review and document, if it has not been done already, your backup and restore procedures.
- A clearly defined backup procedure is vital for any business. The backup schedule for Domino environments will depend on whether Archived Transaction Logging is enabled. If the Transaction Logging type is circular logging, or is disabled, then daily full backups of the Domino data (NSF / NTF) is most likely required.
- If the transaction logs are backed up with a third-party backup tool, then the schedule must accommodate this. Transaction logs should be backed up frequently. Each evening an incremental backup captures the Domino data. A full backup should still be taken once a week.
- Equally important as the backup procedure is the restore procedure. Whereas the backups tend to be automated, a restore is usually performed by the technical operations group or Domino Administrators. It is therefore useful to perform periodic “dry-runs”, to ensure databases are restored as expected and the procedure can be performed smoothly when the business demands it.