The tell status command for Lotus Notes Traveler server is
tell traveler status.
If you run the command when the overall status is Green, the only message the system displays is
Lotus Notes Traveler overall status is GREEN. When the status is Yellow or Red, the system displays all the conditions causing noncompliance. The returned messages include both the reason for the noncompliance and the probable cause for the failure (when available). This status information is part of the
systemdump command.
The following section is an example of what the results may look like, given a status return of Red:
tell traveler status
The Lotus Notes Traveler task has been running since Tue Jun 15 17:08:37 EDT 2010.
The last successful device sync was on Mon Jun 21 06:43:01 EDT 2010.
Yellow Status Messages
The response times for opening databases on mail server CN=Mail1/O=Test are above the acceptable threshold.
The response times for opening databases on mail server CN=Mail7/O=Test are above the acceptable threshold.
Red Status Messages
17,238 errors have been logged for user CN=Joe Tester/OU=Test/O=IBM.
There have been 3,845 device sync failures for reasons other than the server is too busy.
The overall status of Lotus Notes Traveler is Red.
Threadchecks
The threshold values specified in these sections are default values. The thresholds for red and yellow thresholds can be customized using configuration files. The configuration parameters are detailed later in this document.
DS or PS threads have run for a "long period" of time
Problem threshold:
- Yellow: Wall clock run time is greater than 30 minutes
- Red: Wall clock run time is greater than 120 minutes
Console Message: \tUser {User name} on thread {thread name} has been running for {xx} minutes.
Probable cause: If the Red threshold is reached, then the thread is likely hung. In rare instances there may be a device sync or an extremely long prime sync that is working against a very large user database or a slow mail server, which is normal.
Corrective actions:
- Persistent yellow conditions might indicate a slow mail server or an overloaded Traveler server. Monitor and look for other status conditions that might have a better indication of a diagnosis.
- For first occurrence, take a system dump which will include the information about all of the threads in the Traveler service. Use tell traveler systemdump and run an nsd at the domino command line to gather native stacks. Collect the logs.
- Restart the Traveler service. There is a good chance this will require a complete Domino server restart and you may need to kill the Domino server in order for it to shutdown completely.
Percentage of Device Syncs that failed with 503 return code
Problem threshold:
- Yellow: The number of 503 synchs is more than 5%.
- Red: The number of 503 synchs is more than 10%.
Console Message: \tThere have been {number of 503 RC} device sync failures because the server is too busy and returned status code 503.
Probable Cause: The most probable cause is that the server is running over capacity. 503 means that there are no threads available to handle a synchronization request, and the Traveler server continues to allocate threads until it becomes resource constrained.
Corrective actions: Either increase the memory, or move some of the users to another Lotus Notes Traveler server.
Percentage of Device Syncs are failing with error code other than 503
Problem threshold:
- Yellow: The number of unsuccessful synchs is more than 5%.
- Red: The number of unsuccessful synchs is more than 10%.
Console Message: \tThere have been {number of error code other than 503 RC} device sync failures for reasons other than the server is too busy.
Probable cause: There are network connectivity issues between Lotus Notes Traveler server and the user's device(s).
HTTP thread allocations
Problem threshold:
- Yellow: The peak or current number of connections is greater than 80% of HTTP threads.
- Red: The peak or current number of connections is greater than 90% of HTTP threads.
Console Message:
- \tThe number of active HTTP connections is {current percentage} percent of the available HTTP threads ({HTTP Threads}).
- \tThe peak number of HTTP connections is {peak percentage} percent of the available HTTP threads ({HTTP Threads}).
Probable cause: This condition implies that there are not enough HTTP threads for the number of devices trying to user the Lotus Notes Traveler server.
Corrective actions:
- Increase the number of HTTP threads if there is enough memory and CPU resources.
- Move some of the users to another Lotus Notes Traveler server.
Memory checks
The threshold values specified are default values. The thresholds for red and yellow thresholds can be customized using configuration files. The configuration parameters are detailed later in this document.
Native memory usage
Problem threshold:
- Yellow: Native Memory usage is greater than 85%
- Red: Native Memory usage is greater than 95%
Console Message: \tThe current native memory usage is {current percentage} percent of the available memory.
Probable cause: Native share memory includes shared memory with other Domino applications on the Domino Server.
Corrective actions:
- Verify whether too many HTTP Threads are allocated.
- Reduce the number of applications running on the Domino server.
- Reduce the number of Lotus Notes Traveler users on the machines.
- Issue tell traveler memcommand to see the history of memory and CPU usage.
Java memory usage
Problem threshold:
- Yellow: Java Memory usage is greater than 85%
- Red: Java Memory usage is greater than 95%
Console Message: \tThe current Java memory usage is {current percentage} percent of the available memory.
Probable cause: Not enough Java heap memory for the number of users on the system.
Corrective actions:
- Issue the tell traveler mem command to see the history of memory and CPU usage.
- Increase the Maximum Memory Size in the Domino server document under the Lotus Notes Traveler tab.
Other checks
The threshold values specified are default values. The thresholds for red and yellow thresholds can be customized using configuration files. The configuration parameters are detailed later in this document.
CPU usage
Checks the current data to see if the system is over worked. The code checks from the present back through one complete interval. On average the time period used for measuring the CPU utilization will be 1.5 times the interval length. By default the interval is 15 minutes.
Problem threshold:
- Yellow: CPU threshold is 70%.
- Red: CPU threshold is 90%.
Console Message: \tThe Lotus Notes Traveler's CPU usage is {current percentage} percent over the last {minutes} minutes of processing.
Corrective actions:
- Reduce the number of applications running on the Domino server.
- Reduce the number of Lotus Notes Traveler users on the machines.
- Issue tell traveler mem command to see the history of memory and CPU usage.
Error messages logged
Checks to see if the number of error messages logged for a user has reach the threshold. These thresholds are monitored per person, not for all users on the system.
Problem Threshold:
- Yellow: A user's error count is greater than 50 errors
- Red: A user's error count is greater than 100 errors \
Console Message: \t{0} errors have been logged for user {1}. Checks the time of database open for a given server.
Problem Thresholds:
- Yellow: 10% of the opens are above the “Yellow Open Threshold”
- Red: 5% of the opens are above the “Red Open Threshold”
Console Message: \tThe response times for opening databases on mail server {mail server name} are above the acceptable threshold.
Probable Cause: Check for network delays between the Lotus Notes Traveler server and mail server.
Constraint processing
The constraint processing is proactive code that monitors the system checking to see if it has entered a resource constraint state. Currently, the only resource that is monitored is system memory. Once the constraint state is detected, Traveler will not allow new device sync or prime sync threads to start. Other threads will be allowed to complete and hopefully the constraint condition will be alleviated. If the constraint condition persists, then the existing Traveler thread pool logic will kill over the additional unused threads, further reducing the system's memory footprint. The minimum number of prime sync threads is 5 and the minimum device sync threads is10. If the system is in constraint state, new device syncs will be denied with the 503 status code (server is busy). The system will log the information level of messages when entering and exiting constraint state with the thread summary information. Whenever a constraint state lasts longer than 60 minutes, an error message will be logged and a system dump executed.
The system enters constraint mode when memory conditions hit the Red state, and exit when it is 5% below the Red entry level. By default, the system enters constraint when native memory percentage usage is greater than
STATUS_NATIVE_MEMORY_RED, which is 95% by default or when Java memory is greater than
STATUS_JAVA_MEMORY_RED which is 85% by default. The system exits constraint when native memory usage is below 90% and when Java memory is below 80%.
Since the sync thread thresholds are dynamically specified and there is no need to explicitly configure the
TSS_PRIMESYNC_THREADS,
TSS_SYNC_THREADS and
WORKER_THREADS. These parameters migrate out of the
NTSConfig.xml file, since they are no longer needed. The code sets a limit of 20 threads for prime sync and a limit of 5000 for device sync and worker threads. The
WORKER_THREADS configuration parameter is no longer needed and is completed removed from the system, but both the
TSS_PRIMESYNC_THREADS and
TSS_SYNC_THREADS can still be set in the
NTSConfig.xml. The constraint processing should make the need to explicitly code these values in the
NTSConfig.xml unnecessary.
Stats
- GetAlarm.Time.Histogram
- NameLookup.Time.Histogram
- DCA.DB_OPEN
- DCA.DB_CLOSE
- ERRORS.<UserId>
- CPU.Pct.<% CPU Range in 10% increments> (000-010, 010-020, and so on)
- DATABASE.QUERY.HISTOGRAM<SimpleName>.(000-001,001-002,002-005, and so on)
Configuration parameters
The table below shows all of the
NTSConfig.xml required to change the thresholds.
Table 1. Configuration parameters
| Parameter name | Default | Description |
| STATUS_THEAD_MAX_RUN_YELLOW | 30 | If a thread runs longer than this number of minutes, the state will be consider to be Yellow. |
| STATUS_THEAD_MAX_RUN_RED | 120 | If a thread runs longer than this number of minutes, the state will be consider to be Red. |
| STATUS_DS_FAILUER_503_YELLOW | 5 | Percentage of threads failing with a 503 error message to be considered in Yellow state. |
| STATUS_DS_FAILUER_503_RED | 10 | Percentage of threads failing with a 503 error message to be considered in Red state |
| STATUS_DS_FAILUER_NON_503_YELLOW | 5 | Percentage threads failing with a non-503 error message to be considered in Yellow state |
| STATUS_DS_FAILUER_NON_503_RED | 10 | Percentage threads failing with a non-503 error message to be considered in Red state |
| STATUS_DB_OPEN_INTERVAL_YELLOW | 2 | Lower time limit interval index to open Databases in GENERAL_TIME_HISTOGRAM_BOUNDARIES_NAMES. The intervals are "000-001", "001-002", "002-005", "005-010", "010-030", "030-060", "060-120", "120-Inf". |
| STATUS_DB_OPEN_INTEVAL_RED | 8 | Upper time limit interval index to open Databases GENERAL_TIME_HISTOGRAM_BOUNDARIES_NAMES. |
| STATUS_DB_OPEN_PCT_OVER_YELLOW | 5 | Percentage over the STATUS_DB_OPEN_INTERVAL_YELLOW to set status to Yellow. |
| STATUS_DB_OPEN_PCT_OVER_RED | 10 | Percentage over the STATUS_DB_OPEN_INTERVAL_RED to set status to Red. |
| STATUS_CPU_PCT_YELLOW_THRESHOLD | 70 | Yellow CPU percentage threshold. |
| STATUS_CPU_PCT_RED_THRESHOLD | 90 | Red CPU percentage threshold. |
| STATUS_ERROR_COUNT_YELLOW_USER | 50 | For each user, if their error count is above this value, the status will be set to Yellow. |
| STATUS_ERROR_COUNT_RED_USER | 100 | For each user, if their error count is above this value, their status will be set to Red. |
| STATUS_HTTP_THREAD_PCT_YELLOW | 80 | If the peak HTTP thread usage is above this limit, the status will be set to Yellow |
| STATUS_HTTP_THREAD_PCT_RED | 90 | If the peak HTTP thread usage is above this limit, the status will be set to Red. |
| STATUS_NATIVE_MEMORY_YELLOW | 85 | Yellow native memory percentage threshold . |
| STATUS_NATIVE_MEMORY_RED | 95 | Red native memory percentage threshold. |
| STATUS_JAVA_MEMORY_YELLOW | 75 | Yellow Java memory percentage threshold. |
| STATUS_JAVA_MEMORY_RED | 85 | Red Java memory percentage threshold. |
| THREADS_MINIMAL_PRIMESYNC | 5 | The number of Prime Synch threads allowed to run when in constraint state. |
| THREADS_MINIMAL_DEVICESYNC | 10 | The number of Device Synch threads allowed to run when in constraint state. |
Notes.ini parameters
There are some new
notes.ini parameters used with health check.
- NTS_ STATUS_CHECK_INTERVAL_SECONDS: The number of seconds between each health check monitoring and logging interval. The default value is 900 seconds or 15 minutes
- NTS_STATUS_CHECK_CACHE_SIZE: The number of cache entries to save. The cache entries contain the snap shot of the current CPU usage, current Java memory usage and C native memory usage. The default value is 100 entries so that by default there will be more than 24 hours of data cached.
Performance considerations
Highly efficient system performance while running the health check commands is not absolutely critical, as it is only run periodically (15 minutes by default). However, because it is frequentlyl executed, the process should be efficient as possible. The new method for determining if the system is in constraint state is critical to performance, as it executes each time a new device sync begins.
The other critical piece for performance is the collection of additional stats. Because the current procedure already batch writes stats, the addition of additional stats should not cause any additional degradation of performance.
Java memory usage will moderate, as there is cache for CPU and Memory statistics that are retrieved every 15 minutes, for a total of 100 entries. This is only a small memory usage, when compared to the memory usage of the system as a whole.
Parent topic:
Tell command considerations and examples