Performing a background crawlAdded by IBM on February 11, 2013 | Version 1 (Original)
|You can use a SearchService command to perform a background crawl of the Search seedlists without creating a Search index.
Before you begin
See Starting the wsadmin client
for information about how to start the wsadmin command-line tool.
About this task
The SearchService.startBackgroundCrawl command allows you to crawl the application seedlists and save those seedlists to a specified location. You might want to use this command if you are experiencing issues with crawling and you want to verify that the crawling process is completing successfully.
To perform a background crawl of the Search seedlists, complete the following steps.
- Start the wsadmin client from the following directory of the system on which you installed the Deployment Manager:
is the WebSphere
® Application Server installation directory and dm_profile_root
is the Deployment Manager profile directory, typically dmgr01.
You must start the client from this directory or subsequent commands that you enter do not execute correctly.
- After the wsadmin command environment has initialized, enter the following command to initialize the Search environment and start the Search script interpreter:
If prompted to specify a service to connect to, type 1 to pick the first node in the list. Most commands can run on any node. If the command writes or reads information to or from a file using a local file path, you must pick the node where the file is stored.
When the command is run successfully, the following message displays:
Search Administration initialized
- Enter the following command:
SearchService.startBackgroundCrawl(String persistenceLocation, String components)
Crawls the seedlists for the specified applications and then saves the seedlists to the specified location. This command does not build an index.
The command takes the following parameters:
A string that specifies the path to which the seedlists are to be saved.components
A string that specifies the applications whose seedlists are to be crawled. The following values are valid: activities, all_configured, blogs, calendar, communities, dogear, files, forums, profiles, status_updates, and wikis. Use all_configured when you want to crawl all the applications.
SearchService.startBackgroundCrawl("/opt/IBM/Connections/backgroundCrawl", "activities, forums, communities, wikis")
What to do next
After completing a background crawl, perform one of the following options:
Parent topic: Creating background indexes
Configuring the number of crawling threads
Verifying that Search is crawling regularly
Creating a background index
Extracting file content
Recreating the Search index
- Extract file content. For more information, see Extracting file content.
- Create a background index. For more information, see Creating a background index.
- Create a foreground index. For more information, see Recreating the Search index.
If you want to create a foreground index, copy the persisted seedlists from the persistence location that you specified when you used the startBackgroundIndex command to the CRAWLER_PAGE_PERSISTENCE_DIR directory on the node that is doing the indexing.
In a multi-node system, you might want to copy the seedlists to the CRAWLER_PAGE_PERSISTENCE_DIR directory on all nodes. Alternatively, you can set the CRAWLER_PAGE_PERSISTENCE_DIR variable to a network location and copy the persisted seedlists from the persistence location you specified to that location.