Community articleCrawling data for the first time
Added by IBM contributorIBM on July 8, 2014
Rate this article 1 starsRate this article 2 starsRate this article 3 starsRate this article 4 starsRate this article 5 stars

The first time that you crawl the IBM® Connections data, you must crawl all content, so it takes longer to perform this operation than it takes to perform subsequent crawls.



The first time that you crawl the IBM® Connections data, you must crawl all content, so it takes longer to perform this operation than it takes to perform subsequent crawls.

Procedure

To crawl the data for the first time, complete the following steps:
  1. Send a GET request to the seedlist feed for the application whose data you want to crawl. Do not specify any parameters on the request.
  2. Activities
    http://servername/activities/seedlist/myserver


    The activities seedlist also contains content from community activities.
    Blogs
    http://servername/blogs/seedlist/myserver


    The blogs seedlist also contains content from community blogs.
    Bookmarks
    http://servername/dogear/seedlist/myserver

    Communities
    http://servername/communities/seedlist/myserver

Community Events
http://servername/communities/calendar/seedlist/myserver

Files
http://servername/files/seedlist/myserver 

Forums
http://servername/forums/seedlist/myserver

Profiles
http://servername/profiles/seedlist/myserver

Wikis
http://servername/wikis/seedlist/myserver

Status Update
http://servername/news/seedlist/myserver

Events
http://servername/communities/calendar/seedlist/myserver

ECM files (FileNet)
http://servername:port/dm/atom/seedlist/myserver


The wikis seedlist also contains content from community wikis.
For example:

https://example.org/files/seedlist/myserver

  1. Process the returned feed. Find the rel=next link and send a GET request to the web address specified by its href attribute.
  2. Repeat the previous two steps until the response includes a <wplc:timestamp> element in its body.
  3. Store the value of the <wplc:timestamp> element; you must pass that value as a parameter when you perform a subsequent crawl of the data.