After crawling the data for the first time, perform a crawl on a regular basis to stay up-to-date with the changes being made by the people using IBM
Before you begin
You must have performed an initial data crawl and saved the value of the <wplc:timestamp> element before you can perform this procedure. See Crawling data for the first time
for more details.
About this task
This procedure collects all content that was created, updated, or deleted up to the server time at which you started the crawl. Because content changes on a constant basis, the list of entries in seedlist responses are likely different for two crawls using the same Timestamp
parameter value but started at different times.
To subsequently crawl data, complete the following steps:
- Send a GET request to the seedlist feed for the application whose data you want to crawl. Include the following parameter on the request:
Optional. Specifies the number of entries to return in the seedlist. Use this parameter to limit or increase the number of entries returned in a seedlist response. The default range is 500 entries. Setting the value of this parameter too large can potentially cause excessive load on IBM Connections applications. Timestamp
Required. The string value of the <wplc:timestamp> element in the body of the last response returned by the previous seedlist crawling session. This value cannot be manually composed.
These parameter values are case-sensitive. All other parameters used by the seedlist SPIs are considered internal; do not set them manually.
The following list contains the seedlist URLs for each application:
The activities seedlist also contains content from community activities.
The blogs seedlist also contains content from community blogs.
Community EventsParent topic: Crawling data
The wikis seedlist also contains content from community wikis.
- Process the returned feed. Find the rel=next link and send a GET request to the web address specified by its href attribute.
- Repeat the previous two steps until the response includes a <wplc:timestamp> element in its body.
- Store the value of the <wplc:timestamp> element; you must pass that value as a parameter when you perform a subsequent crawl of the data.