ShowTable of Contents
Table of Contents
Monitoring a Domino environment means a repeating systematic collection and supervision of the environment and its process and individual tasks within. The main functionality of monitoring is to identify if certain parameters of a system or environment exceed their defined boundaries and react in a defined way, for example, by alerting.
Due to the highly configurable nature of Domino and a variety of tasks it can perform, the aim of this article is to define a monitoring strategy that covers the most common components of a Domino environment. This monitoring strategy should be treated as a base line that requires further customization to accommodate your specific Domino environment.
Monitoring Options
We do not recommend a single solution or a single tool for all customers. Certainly large implementations of Domino or those with high usage have different needs than smaller Domino environments.
IBM offers the following monitoring options:
- Server Monitor
This is a basic monitoring option built into the Lotus Domino Administrator client. It is great for small environments or as an additional monitoring for one of the following solutions.
- Domino Domain Monitoring (DDM)
This is a server feature built into Lotus Domino and enabled by default. DDM is great for detecting, understanding and acting on run time issues.
DDM probes log events. The administrators need to check the events that have been logged. Event generators and event handlers together with statistics collection can be used to monitor a Domino environment. For information on how they are handled, see:
http://www.ibm.com/developerworks/lotus/tutorials/lsdom6stats/index.html
- IBM Tivoli ITCAM for Applications
This is part of an enterprise class monitoring solution, which is extremely scalable and capable to monitor much more than Lotus Domino alone.
Its functionality can be deployed agent-based or agentless. It leverages best-practice models that focus on performance monitoring of key Lotus Domino components including servers, mail routing, replication, calendar, database, and clusters. Deeper Domino administrative capabilities are available with IntelliWatch
http://www-01.ibm.com/software/tivoli/products/composite-application-mgr-applications/index.html
- IBM Tivoli Intelliwatch Pinnacle for Distributed Systems
This is an automated problem detection and correction, system-wide product configuration options, custom reporting capabilities, fault recovery and more. For more information, see:
http://www-01.ibm.com/software/tivoli/products/intelliwatch/
Note, there are also 3rd party solutions on the market you can use which are not listed here.
What should be monitored?
A common mistake is to limit monitoring to the application Lotus Domino itself. A number of other components should also be monitored to ensure their work and to avoid spending time on analyzing issues which are caused by a completely different area.
This list provides a brief overview of which elements should be monitored:
- Network (LAN / WAN)
- Platform (Hardware, Operating system)
- Storage & Backup environment
- Application with its components
Within this article, we focus on the last part “Application” which in our case is Lotus Domino.
Monitoring Profiles for Domino
For ease of reference, different monitoring profiles should be defined within an environment. By grouping monitors in this way, it is possible to create profiles which are applicable to specific server configurations or functions in the Domino environment.
The following server roles shall be understood as an example:
- Generic Domino Servers – Applied for all domino servers.
- Mail Servers – Servers hosting end user mailboxes.
- Web Servers – Servers hosting Web sites (HTTP services).
- Cluster Servers – Servers providing cluster service.
- Special Application Servers - Servers providing additional services.
Additional profiles can be defined based on your environment needs. Make sure to document additional server profiles and include a definition when to use which monitoring profile.
Action
Monitoring by itself is useless unless you take actions in case of an event or problem. These actions can be defined for each response level and also for each event in detail. Which action is the most important or convenient depends on your corporate environment.
In small implementations of Lotus Domino, it might be enough to mail the administrator to take action some time later. In large environments, there might need to have a solution which supports 24x7 monitoring and alerting. In this scenario, it is often required to integrate Lotus Domino monitoring results into an enterprise-wide monitoring system or help desk system.
Actions depend to different factors like the size of the environment and the availability of systems for alerting or ticket management.
Lotus Domino supports a number of notification actions which can be used further on to build custom integrations to 3rd party systems, for example, to automatically open a help desk ticket in your custom help desk application.
Figure 1 shows event handler methods.

If a Tivoli Enterprise Console is already available, then forwarding events to this console is recommended. This is most likely the case for medium and large Lotus Domino installations.
Hint: Cell Phone Alert
In order to receive cell phone alerts, there is a special cell phone configuration which some providers support. Providers can be requested to forward email messages to a phone as text messages (SMS). This allows notifying administrators via SMS when a critical event occurs. If properly configured, a cell phone can receive email messages in SMS form. The email must be addressed to a specific email address and domain name which is defined by your provider.
In most cases, it is your cell phone number followed by the provider’s gateway domain name. for example, 0123456789@.. Consult your cell phone carrier for details about how to enable this configuration. Be aware that this may add extra cost to your phone bill.
To notify multiple people about the same event, create a group in your Domino directory (e.g. “AdminAlert-HTTPServers”) which contains a list of these special email addresses.
Mapping Response Level to Severity Level
For further understanding the configuration details later on in this article, we map the response level to severity levels which are widely used in help desk systems.
|
|
|
|
|
|
|
Highest level of attention required, serious impact to business expected.
|
|
|
|
High attention required, system is functioning but may lead to service disruption if no action is taken
|
|
|
|
Requires attention of a Domino administrator, if not handled in a timely manner this may lead to further problems
|
|
|
|
Should be brought to administrators attention, but doesn’t require immediate attention
|
|
|
|
Previous severity now stabilized
|
Profile: Generic
A default monitoring profile should be applied to every Domino server, regardless of it is designated role.
In general, where a monitor is considered important and critical enough that it will impact server function, the monitor interval can be set to 5 or 10 minutes. Otherwise an interval of hourly is predominant.
The Generic Domino Server Profile should include the following monitors:
|
|
|
|
|
|
|
|
|
Mail Delivery Monitoring probe
Send Interval: 10 minutes
Time out threshold: 10 minutes
|
|
|
|
is unavailable
is available
|
TCP Event Monitor
Every 5 min
|
|
|
|
Becomes Unavailable
Becomes Available
|
Task Status Monitor
Alternative : Hourly
|
|
|
|
Becomes Unavailable
Becomes Available
|
|
|
|
|
Becomes Unavailable
Becomes Available
|
|
|
|
|
Becomes Unavailable
Becomes Available
|
|
|
|
|
Becomes Unavailable
Becomes Available
|
|
|
|
|
Becomes Unavailable
Becomes Available
|
|
|
|
|
Becomes Unavailable
Becomes Available
|
|
|
|
|
|
|
|
|
|
Becomes Unavailable
Becomes Available
|
|
Domino Statistic ‘Replica.Failed’
|
|
Increase of 10
Increase of 10
|
Statistic Event Generator
Alternative : Hourly
|
Domino Statistic ‘Server.Sessions.Dropped’
|
|
Increase of 50
Increase of 100
|
Statistic Event Generator
Alternative : Hourly
|
Domino Statistic ‘Server.Users’
|
|
Increases above X
Increases above Y
|
Statistic Event Generator
(X and Y depend on size of server)
Alternative : Hourly
|
Domino Statistic ‘Agent.Hourly.UnsuccessfulRuns’
|
|
|
|
Domino Statistic ‘Agent.Daily.UnsuccessfulRuns’
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Profile: Mail Server
The following monitors will be applied to all Domino servers designated as mail servers:
- Generic Monitoring Profile.
In addition, the following monitors are recommended:
|
|
|
|
|
|
|
|
Becomes Unavailable
Becomes Available
|
|
|
|
|
Becomes Unavailable
Becomes Available
|
|
Domino Statistic ‘Mail.Dead’
|
|
Increases above X
Increases above Y
Decreases below X
|
|
Domino Statistic ‘Mail.Waiting’
|
|
Increases above X
Increases above Y
|
Statistic Event Generator
Alternative : Every 10 minutes
X and Y depend on size of environment
|
Domino Statistic ‘Mail.Trans..Failures’
|
|
Increases above 100
Increases above 500
Decreases below X
|
|
Profile: Web Server
The following monitors will be applied to all Domino servers designated as Web servers:
- Generic Monitoring Profile.
- Domino Mail Server Monitors Profile (if needed).
In addition, the following monitor profile should be added:
|
|
|
|
|
|
|
|
Becomes Unavailable
Becomes Available
|
|
Domino Statistic ‘HTTP.PeakConnections‘
|
|
Increases above X
Increases above Y
|
Statistic Event Generator
(X and Y depend on size of server)
Alternative: Every 60 minutes
For details see IBM Technote 1232603
|
Domino Statistic ‘Domino.Threads.Active.Peak’
|
|
Increases above X
Increases above Y
|
Statistic Event Generator
(X and Y depend on size of server)
Alternative: Every 60 minutes
For details see IBM Technote 1232603
|
Profile: Domino Cluster
Any Domino Servers configured as a Domino cluster should have the following Domino Cluster Server monitoring profile applied in addition to the basic profiles:
- Generic Monitoring Profile.
- Domino Mail Server Monitors Profile (if needed)
In addition, the following monitor profile should be added:
|
|
|
|
|
|
|
|
Becomes Unavailable
Becomes Available
|
|
|
|
|
Becomes Unavailable
Becomes Available
|
|
Domino Statistic ‘Replica.Cluster.Failed’
|
|
|
|
Domino Statistic ‘Server.Cluster.OpenRedirects.LoadBalanceByPath.Unsuccessful’
|
Warning
Critical
Fatal
Reset
|
|
|
Domino Statistic ‘Server.Cluster.OpenRedirects.LoadBalance.Unsuccessful’
|
Warning
Critical
Fatal
Reset
|
|
Statistic Event Generator
(X and Y depend on size of server)
Alternative: Every 60 minutes
For details see IBM Technote 1232603
|
Domino Statistic ‘Server.Cluster.OpenRedirects.FailoverByPath.Unsuccessful’
|
Warning
Critical
Fatal
Reset
|
|
|
Domino Statistic ‘Server.Cluster.OpenRedirects.Failover.Unsuccessful’
|
Warning
Critical
Fatal
Reset
|
|
|
Replica.Cluster.WorkQueueDepth.Avg
|
|
|
|
Example of documenting Monitoring Profiles
| Server Name | Generic | Mail | Hub | Web | Cluster | Custom App. | Add-on | Mail Delivery |
| ServerA/ITSO | Yes | Yes |
|
| Yes |
|
| Yes |
| ServerB/ITSO | Yes |
|
|
| Yes |
| Antivirus | Yes |
| SametimeA/ITSO | Yes |
|
|
|
|
|
|
|
Domino Event Monitoring
Although the profiles above can be implemented in different monitoring systems, it is possible to monitor Lotus Domino event from the Domino Monitoring Configuration (events4.nsf) database.
To prevent too much information from being shown, administrators should monitor all Domino events defined as Fatal, Failure or Warning (high), as defined in the table below. Each event type sub classifies each message with a severity level. These severity levels are defined, in the Lotus Domino server, as:
|
|
|
|
|
|
|
|
|
|
|
Severe failure that does not cause a system crash.
|
|
|
|
Loss of function requiring intervention.
|
|
|
|
|
|
|
|
|
|
|
|
All of the above messages.
|
For best results you may wish to change the following default settings:
Remember to document changed defaults, so you can reapply them after an upgrade of Lotus Domino to a higher version.
|
|
|
|
|
|
|
|
Database is being Compacted; Compact must finish before use.
|
|
|
Compact task runs against (e.g.) a system database which is in use.
|
|
|
Recovery Manager: Assigning new DBIID for (need new backup for media recovery).
|
|
|
Backup software is requested to take a new full backup of this application.
|
|
|
Recovery Manager: Restart Recovery complete. (/ databases needed full/partial recovery)
|
|
|
This only indicates that the server has been restarted completely.
|
|
|
Database is currently being indexed by another process
|
|
|
This is only informational.
|
|
|
Full Text Error (FTG): Exceeded max configured index size while indexing document NT in database index
|
|
|
We do not want to FT large attachments - so this error is normal.
|
|
|
Recipient user name not unique. Several matches found in Domino Directory.
|
|
|
We cannot do anything about, because the recipient is chosen by the sender, and when sent offline or to email address not validated by Client.
|
|
|
User not listed in Domino Directory
|
|
|
Failure occurs every time a user writes wrong name in SendTo field.
|
|
|
Error registering mail rule for database
|
|
|
Rules is controlled by users - we can not fix this every time - and it has no consequence for the server.
|
|
|
|
|
|
This is only informational.
|
|
|
|
|
|
This is only informational.
|
|
|
ATTEMPT TO ACCESS SERVER by was denied
|
|
|
Many users may try to access Admin server or servers with limited access, e.g. because they have had access before.
|
|
|
ATTEMPT TO ACCESS DATABASE by was denied
|
|
|
Normal (ex. Users try to see calendar details and does not have any public access or higher).
|
|
|
Failing over from for replica id , directing open to
|
|
|
Information about an user has been redirected to cluster-server.
|
|
|
Failing over from , directing open to
|
|
|
Information about an user has been redirected to cluster-server.
|
|
|
Unable to redirect failover from
|
|
|
Information that a database was not able to failover to cluster-server
|
|
|
Operation cannot be performed at the current time - database compaction in progress.
|
|
|
|
|
|
A DDM report document (NoteID 0x) could not be opened.
|
|
|
If a DDM report has been manually deleted, and then another instance of the error is logged, then this error is coming.
|
|
|
Replicator was unable to initialize (from ):
|
|
|
Failure occurs every time a replica stub is made.
|
|
|
Your account is locked out; see your system administrator to reset it
|
|
|
Many users forget to change their password in time; we consider this to be fixed by the user himself.
|
|
|
documents ( bytes) indexed in
|
|
|
|
|
|
LDAP Server: Warning: Invalid credentials specified on Bind request, DN is
|
|
|
Normal behavior, see IBM Technote 1219847.
|
|
|
Database was marked for delete and has been deleted
|
|
|
This is only informational.
|
|
|
Admin Process: does not appear in the ACLs of any databases designating as their Administration Server
|
|
|
AdminP process is normal.
|
|
|
does not appear in the Readers or Authors fields of any databases designating as their Administration Server
|
|
|
AdminP process is normal.
|
|
|
The database is transactionally logged. A full backup of it needs to be performed on for media recovery.
|
|
|
Backup software is requested to take a new full backup of this application.
|
|
|
Router: Message contains no recipients
|
|
|
Information on missing recipients in a message.
|
|
|
does not appear in the unread lists of the databases on .
|
|
|
AdminP process is normal.
|
|
|
Admin Process: does not appear in design elements of any databases designating as their Administration Server
|
|
|
AdminP process is normal.
|
|
|
Not all specified languages were found in design template
|
|
|
This error has to be handled, otherwise refresh design of the database fails.
|
Further Reading
For more information about how to use and configure DDM, refer to the following IBM Technote and the IBM Redpaper: