The activity stream search service is automatically configured to crawl the activity stream seedlist at regular intervals. By default, the interval is set to 30 seconds. After an initial crawl of the activity stream, subsequent crawls are incremental, and only new events that were generated since the previous crawl are collected. When you install IBM
® Connections, the crawler is disabled by default.
Crawling and indexing is carried out on one of the servers in the cluster where the News application is deployed. This server is chosen automatically by the WebSphere High Availability (HA) Manager. If News becomes unavailable on this server, a different server that is running News is chosen by WebSphere HA to replace it. For each crawling session, the indexing server creates a delta index in a shared file system and sends a notification to other nodes in the cluster. This delta index is read from shared file system by the other nodes and merged into the main index on the local disk. All the cluster nodes serve search requests by reading from the local index. Configuration and status information for the crawlers is stored in database tables that are available to all the nodes. Delta indexes are stored for 24 hours. If a node is down for more than 24 hours, you need to copy the index manually to that node from another node. In the event that a node is unavailable, the other nodes can still perform search requests with no interruption.
Administrative users can manage the activity stream search service from a user interface that is accessed using a URL. From the Activity Stream Search Administration page, you can enable or disable the crawler, and edit the crawl schedule. You can also clear the current indexed content and perform a full crawl if required. To access the page, you must be assigned the search-admin role. For more information about this role, see the Roles
Parent topic: Administering the News repository