As is the case when maintaining data, corruption is an issue encountered that when understanding of the options available, will lead to an optimal return to productivity for the impacted database(s). This article is intended for all levels of Lotus Notes and Domino administrators and details the methods available to repair corruption as well as options available to reduce exposure to it.
It is important to note that the recommendations that follow do not guarantee prevention or absolute repair of a corrupt database and does not circumvent the need to backup all data as frequently as possible.
1. What causes corruption?
When beginning to troubleshoot database corruption, it is beneficial to consider probable causes of the occurrence. In most cases, corruption events are caused by the following:
- Server crashes or hangs
- Conflicts between tasks running on the server, resulting in two tasks attempting to operate on the same database at the same time
- 3rd-party tools that interact with Notes/Domino
While it should be noted that this list is not all-inclusive, most cases of corruption are found to be related to them. It should be noted that root cause of a single corruption event is likely not attainable. If working with IBM Support for a corruption issue, root cause can be sought when a reproducible scenario has been defined.
2. Reducing exposure to data corruption
A foremost measure to reducing exposure to data corruption in a Domino environment is to enable all servers for transaction logging. This best practice affords the oversight of simultaneously recording all transactions that occur in a database to a transaction log and can be beneficial after system failures when the transaction logs are replayed to restore and recover a database, which greatly reduces server restart time. Additional details concerning this best practice, including its benefits and configuration recommendations are available in Technote 7009309 (
http://www-01.ibm.com/support/docview.wss?uid=swg27009309).
Along with the enablement of transaction logging for all servers in the domain, regularly scheduled maintenance will provide for greater database integrity. These recommended maintenance tasks (
http://www-10.lotus.com/ldd/dominowiki.nsf/dx/maintenance-tasks) should be coordinated with other maintenance activities taking place in the environment to afford improved overall system health.
3. Repairing corrupt databases
Determining the appropriate maintenance for the situation
As we begin to repair a corrupted database, irregardless if the corruption is indicated by specific error or by "questionable behavior" when working with a database, it is recommended that the Fixup task be the first tool used to attempt to resolve the issue. If the corruption is specific to a view(s) within the database, please consider using Updall (detailed below) before leveraging Fixup to attempt to repair the view index.
When running Fixup, please consider the type of database you will be running the task against. More specifically, consideration should be given to whether the database the Fixup task will be ran against is a system database such as names.nsf, admin4.nsf, log.nsf, and etc. When system databases are the target of maintenance, tasks which associate with them should be stopped. For example, to run Fixup against the Administration Requests (admin4.nsf) database, stop the Administration Process (adminp) task. When the Domino directory (names.nsf) or the Domino log (log.nsf) are the target of maintenance, the maintenance should be ran offline, with the Domino server stopped. Fixup can be ran against non-system databases such as user's mail databases without stopping any other tasks.
Fixing up a database that is not transaction logged and does not participate in the Domino Attachment and Object Service (DAOS)
It is recommended that a full scan take place to ensure integrity of the impacted database. In order to accomplish a full scan, the following command should be used:
load fixup -F database.nsf
The '-F' parameter forces the Fixup task to scan all documents of the database. Without it, Fixup only scans documents modified since its last run. See Fixup options (
http://publib.boulder.ibm.com/infocenter/domhelp/v8r0/index.jsp?topic=/com.ibm.help.domino.admin85.doc/H_FIXUP_OPTIONS_8097_STEPS.html) for more details.
Fixing up a transaction logged database that does not participate in the Domino Attachment and Object Service (DAOS)
It is not typically the case that a transaction logged database requires a Fixup be ran against it but when it is necessary, the following command should be used:
load fixup -J database.nsf
The '-J' parameter allows the Fixup task to scan a transaction logged database. If a backup utility certified for Lotus Domino is in use, please ensure that a full backup of the database is scheduled as soon as possible as a Fixup ran against a transaction logged database will assign a new Database Instance ID (DBIID). See Fixup options (
http://publib.boulder.ibm.com/infocenter/domhelp/v8r0/index.jsp?topic=/com.ibm.help.domino.admin85.doc/H_FIXUP_OPTIONS_8097_STEPS.html) for more details.
Fixing up a transaction logged database that participates in the Domino Attachment and Object Service (DAOS)
In order to repair a DAOS-enabled database that is encountering a corruption issue, the following command should be used:
load fixup -J -D database.nsf
The '-J' parameter is a requirement for the Fixup task to operate on a DAOS-enabled database as DAOS requires transaction logging for participating databases. The '-D' parameter purges or fixes corrupt documents in the specified databases if the document is corrupt, if the DAOS ticket is outdated, or when the NLO associated with the document is missing. If a backup utility certified for Lotus Domino is in use, please ensure that a full backup of the database is scheduled as soon as possible as a Fixup ran against a transaction logged database will assign a new Database Instance ID (DBIID). See Fixup options (
http://publib.boulder.ibm.com/infocenter/domhelp/v8r0/index.jsp?topic=/com.ibm.help.domino.admin85.doc/H_FIXUP_OPTIONS_8097_STEPS.html) for more details.
After running Fixup, proceed to test the state of the database by performing the same operations or follow the same steps that were taken to arrive at the indication of corruption for the impacted database. If the corruption has not been resolved, the next course of action is to perform a compaction of the impacted database.
When running the Compact task, consideration for the type of database it will be operating on should be given in the same manner given when running the Fixup task. The type of compaction to perform on a corrupt database is a copy-style compact. A copy-style compact requires that there be no process accessing the database for the duration of the task's operation and will terminate before its completion if any entity opens the database for read or write. When planning to copy-style compact a corrupt database, ensure that all database users and other tasks related with the database are not accessing it. It should also be noted that Compact task needs no consideration as to whether a database is transaction logged or participates in DAOS, however a copy-style compaction will result in a new Database Instance ID (DBIID) and a full backup of the database should be scheduled as soon as possible following the compaction.
Compacting a corrupt database
When compacting a corrupt database, the following command should be used:
load compact -c -i database.nsf
The '-c' parameter designates the copy-style compaction to take place on the designated database.
Database.nsf will be streamed into a temporary file until all data elements have been successfully copied. When the copying has completed, the designated database is deleted from the filesystem and the temporary file is renamed to replace the file designated during the compact command. While this does not impact the replica id of the database the Compact is ran against, please be certain that there is sufficient disk space on the Domino server to allow for the copy to complete. See Compact options (
http://publib.boulder.ibm.com/infocenter/domhelp/v8r0/index.jsp?topic=/com.ibm.help.domino.admin85.doc/H_COMPACT_OPTIONS_1761_STEPS.html) for more details.
Compacting a corrupt database and discarding view indexes
The scope of corruption in a database may be difficult to discern if the impact is limited to a database's documents or its view indexes. Under these circumstances, it is recommended that a copy-style compaction be ran against the database as well as discarding the currently built view indexes in the database. To accomplish this, the following command should be used:
load compact -c -d -i database.nsf
The '-d' parameter facilitates the discarding of built view indexes for the database specified in the command. While this parameter does ensure that a complete rebuild of view indexes occurs, initial access to each view after this Compact will result in a delay as the view index is built. It should be noted that after running a Compaction with the '-D' parameter, there is no need to run Updall on the database as all views will be rebuilt upon their initial access. See Compact options (
http://publib.boulder.ibm.com/infocenter/domhelp/v8r0/index.jsp?topic=/com.ibm.help.domino.admin85.doc/H_COMPACT_OPTIONS_1761_STEPS.html) for more details.
If corruption is specific to a view or views within a database, the Updall task is a beneficial tool to leverage. Consideration for the view or database to be operated on should be given as lookups against the view may be impacted until the task has completed its operation as the following recommendations can be ran online, with the server running.
Rebuilding view indexes
When rebuilding view indexes for a database, the following command should be used:
load updall -R database.nsf
The '-R' parameter will rebuild all currently built view indexes within the targeted database and is resource-intensive. See Updall options (
http://publib.boulder.ibm.com/infocenter/domhelp/v8r0/index.jsp?topic=/com.ibm.help.domino.admin85.doc/H_UPDALL_OPTIONS_3277_STEPS.html) for more details.
Rebuilding specific view indexes
If the issue has been narrowed to a specific view, it is possible to target this view for rebuild rather than rebuilding all views for a database. By specifying a particular view, the overhead of view rebuilding is significantly reduced. The following command should be used:
load updall -T viewname -R database.nsf
The '-T' parameter is used to specify the particular view to rebuild the index for. See Updall options (
http://publib.boulder.ibm.com/infocenter/domhelp/v8r0/index.jsp?topic=/com.ibm.help.domino.admin85.doc/H_UPDALL_OPTIONS_3277_STEPS.html) for more details.
Other means of repairing corruption
When the course of the maintenance prescribed for a corruption incident is ran, proceed to test the state of the database by performing the same operations or follow the same steps that were taken to arrive at the indication of corruption for the impacted database. If the corruption has not been resolved, the next course of action is to attempt to create a new replica of the database. If the impacted database has other replicas in the environment, replacing its instance with an operating system-level copy of a non-impacted replica is often the most effective method of restoring operation to the database.
If it is determined that both maintenance and the creation of a new replica does not offer relief and no other replicas exist, the final step to returning operation to the database is to restore the database from backup. Please consult the backup vendor associated with the environments backup solution if assistance is required to restore a database.
The recommendations made for repairing a corrupt database should only be put into practice when a corrupt database is encountered. More importantly, these steps are resource intensive and are counter-productive if built into a regularly scheduled maintenance cycle. Scheduled maintenance recommendations are made in the "Reducing exposure" section of this article.
4. Other corruption scenarios
The details above describe methods for resolving database corruption in what can be considered single occurrences for databases. In the event that corruption recurs for a database and you need to engage IBM Support for troubleshooting assistance,
prior to opening the Problem Management Record (PMR), there are a handful of steps that can be taken to assist in troubleshooting the PMR as quickly as possible.
Notes.ini debug
- Note to Terri, I'm thinking a table similar to the "server not responding" article, with the following:
console_log_enabled=1
debug_threadid=1
File monitoring
It is necessary to determine what process could be inducing the corruption in a database and for Microsoft Windows, the Process Monitor (
http://technet.microsoft.com/en-us/sysinternals/bb896645.aspx) tool offers real-time file system and process-level monitoring.
Please note that a corruption occurrence must take place with these additional data collection methods in place so data relevant to the issue has been gathered.
Sending in info
*Note to Terri, we can copy this section from the "server not responding" article, requesting the follow:
console.log
ProcessMon log