Using Lotus Domino, you report crash or memory errors and collect memory dumps for analysis. The root cause of the memory crash or low memory errors sometimes can be difficult to trace. This article is to help customers read a memory dump and what to look for in a memory dump.
Memory errors reported on console logs might take the form of one or more of the following:
1. "Insufficient Memory."
2. "Maximum Number of Memory Segments that Notes Can Support Has Been Exceeded."
3. "Insufficient Memory. NSF Pool is Full." (or any other Pool name).
4. "AllocHandle: OUT OF SHARED HANDLES! -- pid xxxxxxxx Handles used so far 255743, Maximum handles = 324646, error = 0x107".
5."AllocHandle: OUT OF PRIVATE HANDLES! -- pid xxxxxxx Handles used so far 10495, Maximum handles = 10495, error = 0x107.
6. "Memory Is Low for Indexing".
7. "Not enough memory for Full Text Indexing or Search".
Or it can result in a Domino server crash or hang with one of the following messages on the server console:
1. "Maximum number of memory segments that Notes can support has been exceeded ".
2. "PANIC: Cannot attach to shared memory region, due to insufficient access (probably owned by another user or group)".
3. "PANIC: Insufficient memory".
4. "Memory allocation error, out of system memory".
Other errors may be also reported which have the same meaning.
In the examples above, the root cause is memory, but they may manifest themselves in different ways:
A. Insufficient memory on the OS leading to Domino running out of memory. For example, running a mail server with 1000 users on 1GB of RAM.
B. Bad memory management of clustered servers where many Domino partitions running on the same OS are competing on the physical memory. Memory management server ini parameters must be used to control how much each partition must use.
An example would be the usage of:
NSF_BUFFER_POOL_SIZE_MB
PercentAvailSysResources
AIX_LIMIT_SHM_SEGMENTS
ConstrainedSHM
On UNIX systems, the notes.ini parameters below can cause issues if they are not set correctly:
Debug_Enable_Sys_V_SHM
MEM_SHMSegmentSizeMB
Mem_EnablePreAlloc.
Note: It is best to consult with the IBM Lotus Unix Support team before setting these parameters on Domino servers, especially if the host server has multiple partitions.
C. Leak on Domino memory Pools (private or shared) through the allocations of Domino memory manager.
D. Leak on Domino Private memory caused through the direct allocation of tasks from the OS.
E. Leak on Domino Private or Shared Handles through the allocations of Domino Memory Manager.
F. Leak caused by third party applications running on the same OS leading to Domino running out of system resources.
G. High JVM, or possible JVM leak, causing a crash on Agent manager/HTTP tasks.
H. Memory Fragmentation caused by private memory or shared memory.
The following parameters would be useful:
Notes_Shared_DPoolSize
PRIVATE_DPOOLSIZE_THRESHOLD (for small allocation).
From memory dumps, you can verify utilization percentages:
113 process private memory pools
755 MB total pools size
11 MB total pools used
1.59% pool utilization
1.01 pools searched per allocation
1.22 free blocks searched per allocation
1.21 free blocks searched per free
I. LotusScript leaks. NSD will not show these leaks.
J. Memory Fragmentation caused by operating systems. Such as usage of the OS environment variable MALLOCMULTIHEAP=1 on AIX.
K. Memory semaphore keys causing conflict on multiple partitioned 64 bit AIX servers. This has been reported in technote 1381843 "Segmentation fault occurs when starting multiple Domino partitions at the same time". The solution is to use PARTITION_KEYNUMBER=X for each Domino server in the partition as follows:
server 1 : notes.ini PARTITION_KEYNUMBER=1
server 2 : notes.ini PARTITION_KEYNUMBER=2
and so on.
The analysis of a memory dumps has two parts.
1. Raw memory dump and what information it provides.
Raw memory dumps can provide the following :
A. Fragmentation in shared and private Notes Pools.
Shared Pool Fragmentation means poor pool utilization. In most cases for shared memory, the pool utilization would 90% or above:
192 system shared memory pools
716 MB total pools size
672 MB total pools used
93.97% pool utilization
If shared memory utilization drops below 75%. it is an indication of fragmentation which needs to be investigated as two causes. If the poor utilization occurs over a few weeks, it would be normal as any software usage of memory will lead into fragmented memory. The solution would be to restart Domino. IBM recommends restarting Domino on weekly or biweekly basis to reduce fragmentation in shared memory usage.
If poor utilization occurs over a few days, it would be an indication of bad pool utilization caused by an agent or task. One solution would be to increase the shared dpool size using the notes.ini parameter Notes_Shared_DPoolSize.
Private Pool Fragmentation in private memory of tasks can be normal to some degree as shown here:
16 process private memory pools
7 MB total pools size
3 MB total pools used
51.90% pool utilization
Private Pool Fragmentation in private memory of tasks is not normal is these conditions:
331 process private memory pools
165 MB total pools size
3 MB total pools used
2.25% pool utilization
or
113 process private memory pools
755 MB total pools size
11 MB total pools used
1.59% pool utilization
The poor utilization examples above need to be investigated to find what is causing the task to allocate large pools.
B. Another important piece of information supplied by the raw memory dump is the LotusScript heap sizes. This section is not included in the annotated memory dump. An example would be to search for the "LotusScript Memory Usage for Process".
An example of a leak in LSLitPool
LotusScript Memory Usage for Process:
Heap 'MM Internal Heap': 398112 bytes in use out of 540077 bytes in 90 allocations (addr 0xBF6E10)
Heap 'General Heap': 16 bytes in use out of 4096 bytes in 1 allocations (addr 0xBF7E40)
Heap 'LSKeyWords': 12720 bytes in use out of 16384 bytes in 390 allocations (addr 0xBF8E70)
Heap 'LSLitPool': 232852552 bytes in use out of 745168896 bytes in 3638322 allocations (addr 0xBFCF30)
Heap 'LSIAdtClassTable': 9368 bytes in use out of 12288 bytes in 96 allocations (addr 0x80820C8)
Heap 'ObjectManager': 136 bytes in use out of 12288 bytes in 2 allocations (addr 0x80830F8)
Heap 'Dynamic Array Heap': 0 bytes in use out of 4096 bytes in 0 allocations (addr 0x8085158)
Heap 'Dynamic List Heap': 0 bytes in use out of 8192 bytes in 0 allocations (addr 0x8086188)
Total Heap Usage: 233272904 bytes in use out of 745766317 bytes in 3638901 allocations
Analysis of the above points to the following Software Problem Reports (SPR's):
MKIN6ZGS8P Leak of LSLitPool (LotusScript 'literal' pool) memory on each execution of an agent.
MKIN72TU4Q Leak #2 of LSLitPool (LotusScript 'literal' pool) memory on each execution of an agent.
2. Annotated memory and what information it provides.
Annotated memory dumps show the total pool sizes and handle counts of these pools in each Domino package and provide a summary of all packages at the end.
Example:
NSF (0x0200)
+117 952 119952 PRCMEM BLK_NSFT - NSF per-thread data
+130 1 1056 PRCHDL BLK_USER_NAMES_LIST - NoteReplicate allocated names list
+143 5 100 SHRHDL BLK_FOLDER_NAMELIST - Folder reader/writer list for privilege checking
+145 38 2485656 SHRHDL BLK_NSF_FOLDERPOOL - NSF global folder pool
+146 15 270 PRCMEM BLK_DBDIR_PROCESS - DB Dir Properties per-process callback vector
+152 1 65412 SHRHDL BLK_DBUHASH_POOL - NSF DBU hash table pool
+167 1 65412 SHRHDL BLK_SCHOBJ_POOL - Schedule object container pool
+181 1 314656 SHRHDL BLK_NSF_DBCACHE - NSF database cache
+197 1 20738 SHRMEM BLK_NSF_UBM_GDESC - Global descriptor for universal buffer manager
+204 148 9662328 SHRMEM BLK_UBMBCB - universal buffer manager buffer control block array
+205 148 582566776 SHRMEM BLK_UBMBUFFER - universal buffer manager buffer array
The second and third columns shows the handle count and pool size respectively. Any pool size exceeding 200MB is a candidate for a leak and needs to be traced with other memory dumps to see if that pool is growing all of the time or fluctuating. Trapleak might need to be enabled.
BLK_UBMBUFFER is always an exception as it always falls into the round off of 3/8 times addressable memory address of Domino.
Handle count is another indicator of a leak and is normally reported on the server console with the message:
AllocHandle: OUT OF PRIVATE HANDLES!
or
AllocHandle: OUT OF SHARED HANDLES!
When trapleak is enabled, the correct pool code needs to be extracted by the package code and pool name row. As an example, if we want to trapleak the pool:
+181 1 314656 SHRHDL BLK_NSF_DBCACHE - NSF database cache
Then the proper pool code is (0x0200) hexadecimal + (181) decimal which is 0x200 + B5 = 0x2B5 in hexidecimal.
Note: In the NSD, the Top 10 table would show the pool code to be 0x82B5. This is normal as the NSD shows any shared memory pool in hexadecimal.
--g-- 0x82b5 count= 1, size= 235578, h=0x00001a73 [ 59: 224] BLK_NSF_DBCACHE