Retrieving file contentAdded by IBM on February 11, 2013 | Version 1 (Original)
|Use SearchService commands to perform file content retrieval tasks.
Before you begin
To edit configuration files, you must use the IBM
® Application Server wsadmin client. See Starting the wsadmin client
About this task
Depending on the number of files being indexed in your deployment, it can take a long time to retrieve file content. To ensure that all content is retrieved and indexed, you can run the indexNow command to retrieve all content before the document indexing service finishes, or you can run it after the document indexing service has finished.
For example, to manually index files and all file content, you might run the following commands:
The document indexing service can run on multiple nodes, making the download and conversion process faster. When the document indexing task is scheduled, the Search application sends a message to all the nodes to tell them to start the document indexing process locally. Each Search server starts taking files from the cache and downloading and converting them. When a node retrieves a file, it flags the file in the cache as claimed so that other nodes do not try to get content for that file.
To perform file content retrieval tasks, complete the following steps.
Parent topic: Managing the Search index
Configuring scheduled tasks
Verifying file content extraction
Search default scheduled tasks
- Start the wsadmin client from the following directory of the system on which you installed the Deployment Manager:
is the WebSphere
Application Server installation directory and dm_profile_root
is the Deployment Manager profile directory, typically dmgr01.
You must start the client from this directory or subsequent commands that you enter do not execute correctly.
- After the wsadmin command environment has initialized, enter the following command to initialize the Search environment and start the Search script interpreter:
If prompted to specify a service to connect to, type 1 to pick the first node in the list. Most commands can run on any node. If the command writes or reads information to or from a file using a local file path, you must pick the node where the file is stored.
When the command is run successfully, the following message displays:
Search Administration initialized
- Use the following commands to perform file content retrieval tasks.
Launches the file content retrieval task. This command iterates over the file cache, downloading and converting files that don't have any content.
This command takes a string value, which is the name of the application whose content is to be retrieved. The following values are valid:
SearchService.addFileContentTask(String taskName, String schedule, String startBy, String applicationNames, failuresOnly)
Retries failed attempts at downloading and converting files for the specified application.
This command takes a string value, which is the name of the application whose content is to be downloaded and converted. The following values are valid:
A file download or conversion task can fail for a number of reasons, for example, hardware or network issues. Failures are flagged in the cache and can be retried.
Creates a scheduled file content retrieval task.
This command takes the following arguments:
- taskName. The name of the scheduled task. This argument is a string value, which must be unique.
- schedule. The time at which the scheduled task starts. This argument is a string value that must be specified in Cron format. For more information about the Cron schedule, see Scheduling tasks.
- startBy. The time given to a task to fire before it is automatically canceled. This argument is a string value that must be specified in Cron format. For more information about the Cron schedule, see Scheduling tasks.
- applicationNames. The name (or names) of the IBM Connections application to be indexed when the task is triggered. This argument is a string value. To index multiple applications, use a comma-delimited list. The following values are valid:
- failuresOnly. A flag that indicates that only the content of files for which the download and conversion tasks failed should be retrieved. This argument is a boolean value.
SearchService.addFileContentTask("mine", "0 0 1 ? * MON-FRI", "0 10 1 ? * MON-FRI", "wikis,files","true")
When the command runs successfully, 1 is printed to the wsadmin console. If the command does not run successfully, 0 is printed to the wsadmin console.
You can also use the SearchService.addFileContentTask command to replace the task definition for the default 20min-file-retrieval-task. By default, this task runs every 20 minutes, except for a one-hour period between 01:00 and 02:00. To replace the default task settings, first remove the existing task using the SearchService.deleteTask(String taskName) command. Then use the SearchService.addFileContentTask to create a new task with the values that you specify.
SearchService.addFileContentTask("20min-file-retrieval-task", "0 1/20 0,2-23 * * ?", "0 10/20 0,2-23 * * ?", "all_configured", "false")
Lists all the scheduled file content retrieval tasks.SearchService.enableTask(String taskName)
This command does not take any input parameters.
Enables the specified task.
This command takes a single argument:
- taskName. The name of the task to be enabled. This argument is a string value.
Disables the specified task.
This command takes a single argument:
- taskName. The name of the task to be disabled. This argument is a string value.