ShowTable of Contents
Author: Andreas Prokoph
The instructions included in this document assumes you have the following software.
WebSphere Portal V22.214.171.124
Lotus Connections V3.0.1
Product documentation references:
Lotus Connections – Seedlist SPI
WebSphere Portal – Setting up search collections and crawlers
The search-admin J2EE role is used by each Lotus Connections for securing access to seedlists and search engines. The search-admin role is populated by default with the Lotus Connections administrator user account. To add users to the search-admin role for each Lotus Connections application, see Switching to unique administrator IDs for system level communication.
WebSphere Portal Search
The instructions in this article assumed that:
- A search collection (search index) has already been created.
- Both WebSphere Portal and Lotus Connections are within the same single-signon domain. This is required if security filtering needs to be applied to the Lotus Connections resources.
To integrate Lotus Connection content into the WebSphere Portal search, you must complete the following tasks. All of the procedures are done from the WebSphere Portal administration portlets.
Creating Lotus Connections crawlers
Content crawlers are sometimes called content sources.
Log in to WebSphere Portal as an administrator.
Navigate to the Administration portlets.
Click Manage Search portlet and click Search collections.
From the list of search collections, click the one which should store the Lotus Connections resources like Wiki articles or Blog entries.
Screen capture of the Manage Search
For this example a search collection called Lotus Connections has been created
Click the link for that search collection to go to view a list of all content sources already available for that search collection. If it is empty, you can create one.
Screen capture of available search collections
The example screenshot shows four crawlers already available.
For Lotus Connections, you can crawl and index the following resources. The URLs in the list point to the seedlists which serve as input for the content sources.
For more information see:Crawling data for the first time.
- Activities: http://servername/activities/seedlist/myserver
- Blogs: http://servername/blogs/seedlist/myserver
- Bookmarks: http://servername/dogear/seedlist/myserver
- Communities: http://servername/communities/seedlist/myserver
- Files: http://servername/files/seedlist/myserver
- Forums: http://servername/forums/seedlist/myserver
- Profiles: http://servername/profiles/seedlist/myserver
- Wikis: http://servername/wikis/seedlist/myserver
Creating a content source
To create a content source click New Content Source at the top left of the list.
Screen capture of New Content Source form
For the Content source typeyou must select Seedlist provider. The default is Web site crawler.
In the Collect documents linked from this URL, you must specify the respective Lotus Connections Seedlist URL. For example for Wiki articles, you would specify:
The variable must to be replaced by the host name, and if required the port number, of the server.
For performance you can de-select the Force Complete Crawl>/b>. This will enable the crawler to process only updates, once all items have been loaded, which makes the turnaround time much faster. Updates includes references that have changed, been modified or have been deleted.
On the Advanced Parameters tab, set the Number of parallel processes from 2 to 5. This is only required if you have a very large amount of documents or resources that need to be processed in a timely manner.
Screen capture of the Advanced Parameters tab
Set the default character encoding to utf-8.
If the Lotus Connections content that needs to be crawled is secured, then you must provide the corresponding user credentials on the Security tab.
If the crawlers return an error message, such as cannot access page or seelist or blocked pages, you can easily verify the configuration.
Copy the URL that you entered in the Collect documents linked from this URL field.
Log in to Lotus Connections, using the same credentials you entered on the Security tab.
Paste the URL into the browser address bar.
Verify that the page can be retrieved.
In most cases, if you can get to the URL from a browser then the crawler should be able to get to it too.
Crawling the content
After the crawlers are configured, you are ready to crawl the content. In order to not hit the Lotus Connections server too hard, you should start one crawler at a time and wait between starting each one. For example, start a crawler and wait until it has fetched all resources, the status changes to “idle”. Then start the next crawler. Continue this process until all crawlers have completed their initial load of content
Screen capture of the crawler interface
Starting the content sources manually
After you have crawled the content and verified that the crawlers pick up the right content, you need to setup schedules for the crawlers so they can resume their task at a specified point of time, for example, 24 hours at 5:00 in the morning.