Allowing WebSphere Portal Search to crawl WCM content is a powerful way to provide search functionality to your WCM site.
The following information can be used to leverage that relationship:
http://www.ibm.com/developerworks/lotus/library/wcm-search/
http://www-10.lotus.com/ldd/portalwiki.nsf/dx/advanced-wcmwebsphere-portal-search-integration
However, although the portal search crawler can index file's that are attached directly to WCM content in these scenarios, it cannot index the WCM File Resource Components that are only stored in the library.
There are 2 possible approaches to index the files:
1) Use WCM API to generate URL's to all the library file resource components, then point a crawl URL to the .jsp. A sample .jsp is provided.
2) Use WCM API to create one piece of content per library file resource component. A sample .jsp is provided.
For the first approach, the basic idea is:
* JSP uses API to generate URLs to all library file resource components. It does this by using WCM API to generate rendering context to a default piece of content, then creating an anchor tag
* Then, provide a Portal Search Crawl URL to the .jsp
To leverage this approach:
1) Modify the provided searchCrawl.jsp for the following values:
- Change ws.getDocumentLibrary("CKTestWFLibrary") to the name of the library the files reside in
- RenderingContext rc = ws.createRenderingContext(request, response, new HashMap(),"http://cmknightsun.raleigh.ibm.com:10038/wps/wcm","/myconnect");
Change cmknightsun.raleigh.ibm.com:10038 to valid host:port in your system
- siteAreaIdIterator = ws.findByName(DocumentTypes.SiteArea, "CKTestSiteArea");
Change "CKTestSiteArea" to a valid site area in your site. Ensure that the site area has default content, and can be rendered.
2) After the .jsp is modified, place the .jsp in the following directory:
/PROFILENAME/installedApps/nodename/wcm.ear/ilwwcm.war/jsp/html/
3) After the jsp is placed, validate that you get a list of links by hitting the URL:
http://host:port/wps/wcm/jsp/html/searchCrawl.jsp
Now, all you have to do is make a WebSphere Portal search collection reference this .jsp
4) Navigate to WP Admin page -> Search Administration -> Manage Search -> Search Collections. Click on one of the search collections, or create one.
5) In the search collection form, click on New Content Source. Name the content source whatever you like. For the URL, enter in the URL from step 3 above, the
http://host:port/wps/wcm/jsp/html/searchCrawl.jsp. Modify the other parameters as necessary. Specifically, set it to only follow 1 level of links.
6) Click create.
Once the content source is created, validate it and force a crawl.
For the second approach, the idea is
* JSP uses API to create a piece of WCM content in a specific site area. The content has a ComponentReference that is populated with a reference to the file resource component.
* When creating the content, the JSP will first check to see whether a piece of content already exists with the same name at the same path. When the content is created, it is given the same name as the file resource component.
* The end result is a content object wrapper for every file resource component in your library. Then, since the search crawler will process these content objects, the files will be search indexed.
To leverage this approach:
1) Create an authoring template named "AT - BasicFileAt". Populate with just one element, a ComponentReference element named CompRef.
2) Create a presentation template named "PT - BasicFilePT". Add the following tag:
Set the security on the presentation based on which users you want to be able to search the files.
3) Create a site area somewhere under the site that you've marked as searchable. Name it "SearchFileSiteArea". Map the authoring template "AT - BasicFileAt" to the "PT - BasicFilePT" presentation template. Set the security on the site area based on which users you want to be able to search the files.
4) Now you are ready to run the .jsp. Upload the attached attachFiles.jsp to /profilehome/installedApps/nodename/wcm.ear/ilwwcm.war/jsp/html. You will have to modify the JSP to change the following values:
ws.setCurrentDocumentLibrary(ws.getDocumentLibrary("CKTestLib"));
Change CKTestLib to the name of your library
tempIterator = ws.findByName(DocumentTypes.Workflow, "Publish Workflow");
Change Publish Workflow to the name of the workflow you want your content to use. Note, for this sample it's assumed that you will have a 1 stage workflow for this content that will publish the content as soon as its saved. If that's not desireable, the file will have to be changed to implement publishing.
5) When the .jsp is run, it will first check to ensure a content object doesnt already exist at the path. The JSP will create a new content object using AT - BasicFileAt, store it under the
SearchFileSiteArea, assign it the workflow. Then, it populates the ComponentReference with a reference to the current file resource component.
6) The end result is you will have the SearchFileSiteArea in your site structure. When the WP Search crawler crawls the top site, you will now have your file resource elements indexed.