Community articleSubsequently crawling data
Added by IBM contributorIBM | Edited by IBM contributordeveloperWorks Lotus Team on July 8, 2014
Rate this article 1 starsRate this article 2 starsRate this article 3 starsRate this article 4 starsRate this article 5 stars

After crawling the data for the first time, perform a crawl on a regular basis to stay up-to-date with the changes being made by the people using IBM® Connections.



After crawling the data for the first time, perform a crawl on a regular basis to stay up-to-date with the changes being made by the people using IBM® Connections.

Before you begin

You must have performed an initial data crawl and saved the value of the <wplc:timestamp> element before you can perform this procedure. See Crawling data for the first time for more details.

About this task

This procedure collects all content that was created, updated, or deleted up to the server time at which you started the crawl. Because content changes on a constant basis, the list of entries in seedlist responses are likely different for two crawls using the same Timestamp parameter value but started at different times.

Procedure

To subsequently crawl data, complete the following steps:
  1. Send a GET request to the seedlist feed for the application whose data you want to crawl. Include the following parameter on the request:
  2. Range
    Optional. Specifies the number of entries to return in the seedlist. Use this parameter to limit or increase the number of entries returned in a seedlist response. The default range is 500 entries. Setting the value of this parameter too large can potentially cause excessive load on IBM Connections applications.
    Timestamp
    Required. The string value of the <wplc:timestamp> element in the body of the last response returned by the previous seedlist crawling session. This value cannot be manually composed.
    These parameter values are case-sensitive. All other parameters used by the seedlist SPIs are considered internal; do not set them manually.

    The following list contains the seedlist URLs for each application:

    Activities
    http://servername/activities/seedlist/myserver


    The activities seedlist also contains content from community activities.
    Blogs
    http://servername/blogs/seedlist/myserver


    The blogs seedlist also contains content from community blogs.
    Bookmarks
    http://servername/dogear/seedlist/myserver

    Communities
    http://servername/communities/seedlist/myserver

Community Events
http://servername/communities/calendar/seedlist/myserver

Files
http://servername/files/seedlist/myserver 

Forums
http://servername/forums/seedlist/myserver

Profiles
http://servername/profiles/seedlist/myserver

Wikis
http://servername/wikis/seedlist/myserver


The wikis seedlist also contains content from community wikis.
For example:

https://example.org/files/seedlist/myserver?Timestamp=AAABJRVgyWw%3D

  1. Process the returned feed. Find the rel=next link and send a GET request to the web address specified by its href attribute.
  2. Repeat the previous two steps until the response includes a <wplc:timestamp> element in its body.
  3. Store the value of the <wplc:timestamp> element; you must pass that value as a parameter when you perform a subsequent crawl of the data.