Java content repository TextSearch Support tools for IBM WebSphere Portal: Overview and usageAdded by Leslie Gallo | Edited by Leslie Gallo on September 21, 2010 | Version 3
|This article provides an overview of the support tools that are used for monitoring and debugging JavaTM content repository (JCR) TextSearch in IBM® WebSphere® Portal. It is meant to facilitate WebSphere Portal developers, administrators, users, and technical specialists who use JCR TextSearch to administer and troubleshoot problems within Search.
ShowTable of Contents
These are the JCR TextSearch support tools that are used extensively for search-related issues:
- TextSearch Status utility
- TextSearch Reorg utility
- Juru Index Toolbox utility
To get the most from this article, you should be familiar with the JCR TextSearch processes in WebSphere Portal. If you are new to JCR TextSearch, refer to the developerWorks® article, Java content repository TextSearch in IBM WebSphere Portal and IBM Lotus Web Content Management: Overview and troubleshooting.
TextSearch Status utility
This utility monitors the progress of the Index Maintenance process. The rebuild index process may run for many hours and may have a propensity for failure, depending on the type and volume of content in the repository.
Often administrators and users can feel frustrated when they see that the Index Directory is still building; however, there is a way to determine how the Index Maintenance process is progressing.
The TextSearch (TS) Status utility helps you view the status of the Index progressing for different workspaces through a Web interface (TSStatus.jsp file). You can obtain a copy of the TSStatus.jsp file from IBM Support.
To view the TS Index Progress, place the TSStatus.jsp file under
\installedApps\\wcm.ear\ilwwcm.war folder in WebSphere Portal, as shown in figure 1. This can be accessed as
There is no need to restart WebSphere Portal for this file addition to take effect.
Figure 1. Location of the TSStatus.jsp file in WebSphere Portal
The .jsp page displays the Workspace ID, Workspace name, and Index Maintenance status. The status can be one of the following: RUNNING, REINDEXING, SLEEPING, or NOT STARTED.
Now let's view some screenshots of TS Status utility displaying the different statuses as the Index Maintenance is progressing.
WebSphere Portal has just installed in the system and the Index directory is not yet built, so the text search index directory does yet not exist (check the location \PortalServer\jcr\search; the folder 1 does not exist).
When you check the status using the .jsp file, you'll see that the utility displays the window shown in figure 2, in which the status is NOT STARTED.
Figure 2. Output of the utility when Index directory ‘1’ is not found
So now you want to start the rebuild index from scratch for the existing documents in WebSphere Portal.
To do this, edit any IBM Lotus® Web Content ManagementTM content, save it, and then wait until the index maintenance interval. During the index maintenance cycle, the index directory is rebuilt entirely for the existing documents in WebSphere Portal.
Once the index maintenance starts building the index, the utility will display the window shown in figure 3. Here, the status is REINDEXING.
Figure 3. Output of the utility when reindexing is just started
As you can see in the above figure, when the status is REINDEXING, the following is displayed:
Workspace IDOnce the reindexing process completes, the index status changes to COMPLETE. When you click the Refresh button, it goes to the SLEEP state, and the utility displays the windows shown in figures 4 and 5.
Processed Events: Total Count of documents indexed until now during this cycle.
Pending Events: The remaining number of the pending documents to be indexed during this cycle.
Figure 4. Output of the utility with index maintenance in COMPLETE state
Figure 5. Output of the utility with index maintenance in SLEEP state
When the status is SLEEPING, the utility displays:
Workspace IDNow the index directory is built and is ready for a search operation. Let's say you've performed a few updates in the repository, after which you want to check how the index is progressing for the incremental updates in the repository.
Last Indexed time
To do this, create or edit some Web Content Management content and wait until the next Index Maintenance cycle. The Utility will display the screen as shown in figure 6, in which the status is RUNNING for incremental updates.
Figure 6. Output of the utility when index maintenance is RUNNING
In this case, in which the status is RUNNING (for Incremental Pending Updates), it displays:
Workspace IDOnce the Index Maintenance process completes, the index status changes to COMPLETE. When you click Refresh, it goes to the SLEEP state, and the utility display the windows shown in figures 7 and 8.
Pending Documents: The remaining number of the pending documents to be indexed.
Last Indexed Time
Figure 7. Output of the utility with index maintenance in COMPLETE state
Figure 8. Output of the utility with index maintenance in SLEEP state
TextSearch Reorg utility
This utility compacts the size of the underlying searchable Juru index (the text engine used by JCR), which keeps growing during the course of Index maintenance. The Reorg tool performs a clean-up activity by permanently removing deleted document entries from the index.
The utility is run in WebSphere Portal environments in which many updates are performed that result in a large number of internal deletions in the Juru Search Index directory. In such scenarios, the index directory grows in size to several gigabytes, and the search and index times can be significantly affected, depending on the number of deleted documents.
You can invoke this utility from a command line and run it anywhere in the command prompt. Be sure you have all the .jars under the directory \ jcr\prereq.jcr\lib set in the classpath. This tool comes along with the WebSphere Portal installation.
The reorg command is used as follows:
cmcfgdbu -prod TS -t check [ -wslist < wsname1> ... ]
cmcfgdbu -prod TS -t reorg [ -deleteHighMark | -force ] [ -wslist ... ] Note that the arguments in square brackets [ ] are optional.
Here are the details of the main arguments:
The utility can be invoked with the check argument as shown in figure 9.
- check. Provides the number of deleted documents in the requested Juru Index, and you can use it to decide whether the reorg operation should be performed.
Figure 9. Utility invoked from command line with check argument
The index reorganization is resource intensive; therefore, use it with caution, generally after 1000 deleted documents.
- reorg. Performs the 'Reorg' operation on the specified Juru index. By default, the utility does the reorg when the number of deleted documents is equal to or greater than the default threshold limit of 1000 documents.
The optional arguments used along with reorg are:
- -deleteHighmark . Specifies a custom value that is provided to check against the number of deleted documents existing in the current Juru index for performing a reorg.
- -force. This option forces reorg to execute, even if the threshold (1000 deleted docs) is not yet reached.
- -wslist. Specifies the list of Juru index names.
The utility can be invoked with the deleteHighmark and force arguments as shown in figures 10 and 11.
Figure 10. Utility invoked from command line with reorg and deleteHighmark argument
Figure 11. Utility invoked from command line with reorg and force argument
Juru Index Toolbox utility
This utility lets you check whether the search index directory contains an indexed document. You can provide a query word, search against the index directory, and get the results as a Document ID, that is, the ID of the document that contains the search word.
The tool is available from IBM Support and provides the options as shown in figure 12. You must enter the path of the index directory “1”, and you then have the option to choose the search language against which to search the index directory.
Figure 12. Utility invoked from a command line
This article has explained the different JCR TextSearch support tools in WebSphere Portal and how to use these tools to administer and troubleshoot TextSearch index maintenance and search processes.
Participate in the discussion forum.
Read the developerWorks® article, "Introducing the JCR API: Learn how JSR-170 is used for building content management applications."
Browse the WebSphere Portal Server Information Center, to learn how to configure and use Java Content Repository TextSearch in WebSphere Portal.
About the authors
Malarvizhi Kandasamy is a Staff Software Engineer working at IBM since April 2005. She has 10 years of experience in the software industry and is an IBM Certified Solution Designer for Content Manager v8.3, IBM Certified Database Associate for DB2 v9, and Sun Certified Java Programmer 1.5. She holds a Bachelor of Computer Science degree from Madras University. You can reach her at email@example.com.
Ramgopal (Ram) Kanasani is a Software Engineer working at IBM since August 2008. He holds a Bachelor of Technology degree from IIIT Allahabad. You can reach him at firstname.lastname@example.org.