ShowTable of Contents
IBM® WebSphere® Portal (hereafter called “Portal”), with or without IBM Web Content ManagerTM (hereafter called WCM), performs well only with effective caching. There are multiple levels and types of caching within Portal/WCM, including HTTP caching, DynaCache caching, and other custom JavaTM caches.
Out of the box, Portal/WCM is not configured (from a caching perspective) for production loads; instead, it's configured for development usage with all features enabled. For a production environment, Portal (without cache enablement and tuning) is not able to support medium to large transaction rates effectively.
Preparing Portal for a production environment requires effective caching to improve the performance of the Portal/WCM server. This includes improving the caching of the pages, portlets, themes, and static resources.
Although there are various layers of caching available, the single most effective cache option is to cache the HTML rendered for the whole Portal page, including caching the static resources referenced by the page.
The next most effective solution is to cache the HTML for each portlet prior to rendering (portlet caching), and this solution is followed by caching the content of each portlet so that dynamic generation of the portlet is faster.
Note that, as pages and portlets become more cached (that is, there is a higher probability that what the user needs already exists in a cache), the content must become less personalized to the user.
Because it is not enabled by default, caching of anonymous Portal pages is rarely done. An anonymous Portal page is a page that can be rendered to a user without that user being authenticated to Portal. In contrast, an authenticated page is protected by access control and requires a user to be logged in to Portal to view it.
Note that a page may be both available to authenticated as well as anonymous users. The page may be identical in either case, or the content may be tailored based on authenticated versus anonymous access.
As anonymous pages are often the initial page of a Portal (also known as the “homepage”), the homepage often has the highest render count of any page in that Portal. For that reason, more CPU cycles are spent rendering this anonymous home page than most all other pages, assuming there is no caching. Therefore, if the caching of this anonymous home page can be improved, the CPU usage of the Portal can be reduced and thus the overall throughput is greatly improved.
The best practice is to wait to authenticate a user until that user actually needs to see pages that are personalized for their individual needs. The homepage---or any other anonymous page---usually informs users about services or products about the company overall, so by making the homepage both anonymous and cached, you can greatly improve Portal performance.
This article explains the rationale for using anonymous pages as well as a process for configuring Portal, WCM, and the Web server to yield maximum performance.
Why use and cache anonymous pages?
CPU cost of anonymous pages
As stated above, an anonymous user is one who renders pages from Portal but does not authenticate to it; Portal renders to that user a generic page. This distinction of authenticated versus anonymous access is important in Portal for many reasons.
Most importantly, the act of accessing an authenticated page in Portal for the first time is much more expensive (in terms of CPU cycles) than hitting that same page anonymously. This is because there are many internal caches that must be built “per user” for access control purposes as the user authenticates. Sessions must also be created for authenticated users.
Authentication can occur using Portal's log-in page/portlet, via SiteMinder, WebSeal, or any other authentication proxy. Note that even when these authentication proxies are used, Portal still uses many CPU cycles to be able to initially render content to these authenticated users. This is because Portal still must build many internal tables “per user” at initial rendering for a user authenticated via an authentication proxy.
Complex anonymous pages
If a page requires many CPU cycles to render, or a long time to render (because of backend services required to render it), this page is deemed “complex”. Complexity can be attributed to the theme, the individual portlets on the page, and/or the aggregate number of portlets on the page
However, even though anonymous pages are much cheaper to render than the authenticated pages, it is possible to make these anonymous pages expensive to render, if the portlets on them are expensive.
An example would be rendering a WCM “menu” to anonymous users when the menu is expensive or putting a portlet on an anonymous page that must hit slow backend services. These types of anonymous pages are excellent candidates for caching at the page level.
Caching anonymous pages outside Portal
In addition to caching an anonymous page inside Portal, it is possible to cache that page through both a browser as well as intermediate proxy caches. By doing this, you can greatly enhance the render time for the page and reduce the number of CPU cycles required by Portal to render the anonymous page.
If done as recommended, the entire HTML of the page is cached, and Portal uses no cycles to render the page until the cache times out.
If a page is not too dynamic and anonymous, it should be considered for caching. The page can be then be served from a proxy, such as mod_disk_cache on Apache or IBM HTTP Server (IHS), as well as from the user's browser cache.
If an entire anonymous page can be served from a cache (as opposed to totally regenerated on the server), then the render cost on the server, as well as the render time for that page, can be reduced dramatically.
Only force authentication when required
If a portal supports anonymous users either exclusively or as a prelude to becoming an authenticated (logged in) user, then best practice would suggest making the initial page rendered to any user be an anonymous page.
This would typically include information of value to all users of the portal as well as having a log-in portlet, if the user wishes to authenticate to view protected content. A person would then authenticate only once, if they need to see personalized content. This practice strikes a good balance between the cycles used in Portal to render content.
As long as the content on any anonymous page is not refreshed often, the architect and/or designer of the portal should enable all anonymous pages to be cached. The Portal administrator should then make this page available to caches to improve Portal performance.
This article also covers the invalidation of the different caches since it's usually important to understand how long it takes until changes are reflected in the browser of the end users.
Note that HTTPS (SSL) pages are restricted by specification from being cached, unless specifically allowed via the “cache-control” header. If you are using SSL between the Web server (for example, IHS or Apache), a reverse proxy cache in the Web server cannot cache SSL content from the Portal because Portal does not override the cache-control headers to allow it.
This restriction is one of several reasons why it is recommended to terminate SSL at the Web server and allow traffic to flow between the Web server and Portal as non-SSL.
This section explains the exact process to make anonymous pages and all referenced resources cached, as well as how to configure the WCM caches for best anonymous page performance.
Enabling caching of the page
The developerWorks® article, “IBM WebSphere Developer Technical Journal: Develop high performance Web sites with both static and dynamic content using WebSphere Portal V5.1.
” provides several of the details for enabling anonymous caching. Although it was written for Portal 5.1, it is still relevant to this paper's topic, as we build upon and update it and for Portal 7.0+.
To support anonymous page caching, all the artifacts on the page must have the access set for “anonymous users” and a non-zero time-out with a “shared by all users” cache scope. These artifacts include the page theme, the page itself, and all the portlets on the page.
In addition, there are several properties in the Resource Environment Provider called “WP NavigatorService” (which is accessible via the WebSphere Application Server or Deployment Manager administration console). These artifacts are discussed below.
- Theme. We can enable the theme on a page for cached access by setting the cache scope to “shared” and a cache time-out to greater than zero seconds. However, we cannot set the scope of a theme via the Portal administration GUI; instead, it must be done via XMLAccess. Appendix 1 covers setting this scope.
NOTE: In Portal 7.0.x, a bug exists whereby the theme is ignored in setting the anonymous page caching behavior. Until this bug is resolved, the theme scope does not have to be set.
By forcing a session on anonymous users, the PageBuilder2 theme issues a “set-cookie” response header that mandates the use of the “remoteCacheInfo.response.header.vary” NavigatorService property (see below).
- Page. Set the page cache scope to “Shared across all users” with an appropriate time-out. An initial suggestion would be 12 hours. Since Portal sets this scope in seconds, that number would be “43200” seconds. This setting is controlled either via XMLAccess or via the Portal Admin GUI in the Manage Pages section.
Figure 1 shows the option via the administration portlet.Figure 1. Page Cache Options window
- Portlet. Set the portlet cache scope to “Shared across all users” with an appropriate time-out. An initial suggestion would be 12 hours, or 43200 seconds. This setting is controlled either via XMLAccess or via the Portal Admin GUI in the “Portlets” section.
If the portlet is used on several different pages, but on some of those pages you do not want caching for the portlet enabled, consider (1) making a copy of the portlet, (2) setting the cache scope on the copy for anonymous use, and (3) using this copy on the anonymous page instead of the original portlet. You can easily make portlet copies via the Portal administration console. Figure 2. Portlet Cache Options window
Figure 2 shows this for the WCM rendering portlet.
Note also that, when you enable portlets for anonymous page caching, they are also enabled for portlet fragment caching. This means that Portal will cache the HTML for individual portlets, greatly improving portlet render times even on authenticated pages (assuming that the portlet content changes infrequently).
To enable portlet fragment caching, you must enable the portlets as well as the global “Portlet fragment caching” parameter on the WebSphere Application Server console, per Portal server, in the Portlet Container section.
WP NavigatorService is a cell-scoped Resource Environment Provider. It is found via the console on either WebSphere Application Server or the Deployment Manager:
- Navigate to Resource --- Resource Environment --- Resource Environment Providers, on the left-hand panel of the administration console.
- Select WP NavigatorService and click Custom Properties. Note that out of the box, there are no custom properties. The properties must be added or modified as follows:
- remoteCacheInfo.response.header.vary: This parameter controls the “Vary” header placed on Portal pages. It defaults to “User-Agent, Cookie” in Portal 7. Unfortunately, the “Cookie” portion of this default forces caches to reload content, if the response cookies do not match the request cookies. Therefore, if a user has cookies from a previous Portal session still in his browser cookie session cache, the user can force a reload in the proxy caches.
- public.expires: This parameter controls the “Expires” and “Cache-Control” headers for anonymous pages served by Portal. In reality, Portal will set these headers as a minimum of this parameter along with the cache scope of the theme, page, and portlets on that page. So, think of this parameter as a “worst case” lifetime of an anonymous page.
- remote.cache.expiration: This parameter controls the “Expires” and “Cache-Control” headers for both anonymous as well as authenticated pages served by Portal. In reality Portal will set these headers as a minimum of this parameter along with the cache scope of the theme, page, and portlets on that page. So, again, think of this parameter as a “worst case” life time of a page.
- public.session: While not strictly a contributor to the cache time-outs for anonymous pages, this parameter controls whether or not each anonymous user gets a session. Normally, anonymous users share a single session in Portal. However, if this parameter is set to “true”, each anonymous user gets a session along with a Portal response containing a “set-cookie” header that assigns a JSESSIONID cookie.
If you must set this parameter to “true”, it is possible to see why you must also set “remoteCacheInfo.response.header.vary” to “User-Agent” (minus “Cookie”) so that each unique JSESSIONID cookie allows anonymous responses to be served from a cache.
Table 1 lists the four parameters above along with their “suggested” values that enable anonymous page caching for a period of 12 hours. However, these setting should be tuned for each particular use case and represent only a very broad “starting” point.
Table 1. Suggested value for custom parameters
Enabling caching of static resources
Static files in the file system of the WebSphere Application Server.
The application server file system servlet does not serve the files with a cache header. So neither the browser nor caching proxies will set the cache-control, and the resource is, by default, not cached. The best place to add the cache headers to this content is at the HTTP Server level. See the section on HTTP Server tuning for details.
Static files in WebDav.
When storing themes or skins or other static resources in WebDav, the Portal configuration determines which cache headers are set. The same holds true when files are combined by use of the modularization framework. The configuration is described in the product documentation topic, “Administering the theme modularization framework
For the default configuration we recommend setting:
Static files in WCM.
- com.ibm.wps.resourceaggregator.cache.info.0.max-age to a long value
- com.ibm.wps.resourceaggregator.cache.info.0.cache-scope to “public”
Files can be stored in WCM (for example, as File Resources) and referenced via its URL, typically from the WCM rendering portlet or from the theme. To ensure the cache header sent by WCM for those URLs is correct and with a sufficiently high browser cache time-out, make sure that the setting “resourceserver.browserCacheMaxAge” in the WCM resource environment provider “WCMConfigService” has a high timeout value.
Our recommendation is to either use the theme modularization logic to disable all features not needed or, when using the PageBuilder2 theme, refer to the wiki article, “Optimizing Portal 7 Page Builder Theme for performance,
” to learn how to reduce complexity in the theme.
Configuring internal caches
WebSphere Portal and Web Content Manager provide a set of internal caches that are used to prevent expansive multiple computations of data.
For anonymous users the most important caches from the Portal side are the caches storing the navigation and page information. These are enabled by default---the only thing to look for are cache timeouts and the size of the cache.
From a WCM perspective the most important caches to configure are Advance caching and Fragment caching:
- Fragment caching. Fragment caching at the Web Content Viewer portlet stores the complete HTML fragment inside the cache. To enable this, configure caching across users inside the portlet, making sure that Fragment caching is globally enabled for the server.
- Advance caching. If authenticated users are using WCM, we would recommend configuring “connect.moduleconfig.ajpe.contentcache.defaultcontentcache=SECURED” in WCM WCMConfigService. If only anonymous users access the system, you can improve performance by choosing “connect.moduleconfig.ajpe.contentcache.defaultcontentcache=SITE” and configuring a long timeout for the setting “connect.moduleconfig.ajpe.contentcache.contentcacheexpires.”
For a deeper description of all the different caches in Portal and WCM, refer to the wiki white paper, "IBM WebSphere Portal 7 Performance Tuning Guide
," and to the product documentation topic, “Web content cache types.
HTTP Server tuning: Cache headers, caching, and compression
There are three main performance tunings to be configured at the HTTP Server for optimal performance:
(1) “Cache-control” and “expires” headers for the resources that do not have them
(2) Compression, to compress content before being sent over the network
(3) HTTP server caching (reverse proxy caching), to cache the content at the HTTP Server and thus prevent requests from going back to the Portal server.
Refer to the related blog post from Alex Lang
for details on how to configure this for IHS and Apache.
To validate the cache control settings, IBM recommends developers and administrators use tools that allow them to view the cache headers of the requested resources. Tools such as Firebug, Google Chrome Network tab, or Fiddler are appropriate.
When hitting the Web site in question for the second time, ideally all resources would come from the browser cache and would not be requested from the server. In that case, the responses should be nearly instantaneous.
In figure 3, Firebug was used to trace the requests sent by the browser to the server for a page. In the sample we are hitting the page for a subsequent
time, meaning that the browser cache is filled. From the URL, it is visible that the public area of Portal is addressed.
Figure 3. Firebug trace requests
All the requested resources should have a public cache header with a sufficient cache time and a long enough Expires header. The size of all the files requested should be small; in our test case, 70 KB.
Another interesting tool used to analyze the performance of a site is YSlow, which analyzes the Web site and provides suggestions to improve performance. In figure 4, YSlow was used to analyze a Portal public site that uses Portal 188.8.131.52.
The page contains a WCM portlet, and the cache timeouts were increased to a value higher than 48 hours because the YSlow “out of the box” rule requires a value this high. You will need to decide based on your scenario how long the entries should live in the cache.
Figure 4. YSLow performance report
Note that we had to implement additional tunings for the Portal 184.108.40.206 theme (see Appendix 1 for details).
In the case of updates the caches might serve stale data to the end user. It is therefore important to understand when new data becomes available to users and possibly how to reduce that time. The most important factor in the cache invalidation is the cache timeout.
Although some dynacaches (like the Portal Access Control caches) can invalidate caches immediately when there are changes, most caches just follow the predefined timeout. Since multiple cache layers are used, in the worst case the final timeout before the data refreshes could be the sum of the cache timeouts of all the involved caches.
Caches on WebSphere Application Server can be manually invalidated by use of tools such as the “Extended Cache Monitor”. The basic cache monitor tool ships with WebSphere Application Server and gets its “Extended” capability via the download in the developerWorks article, “IBM Extended Cache Monitor for IBM WebSphere Application Server technology preview
Caches in the HTTP Server (mod_disk_cache) can be flushed with tools like “htcacheclean,” which ships with the IHS Web server.
The browser cache is the most complex cache to clear. Ideally developers will choose a timeout that is valid for your business use case. The timeouts for the browser cache are controlled by the “Cache-Control” and “Expires” headers on the content; if the content has no “Cache-Control” or “Expires” header, the browsers have heuristics that control timeouts.
Note that it is possible to avoid clearing the browser cache for static resources by changing the URL (for example, the name) of the content and the reference to that content. Typically, this is done by versioning the name of the included resources; for example, “picture.gif” is renamed as “picture1.gif”.
If WCM is used, typically the content is not created on the rendering server but rather on the authoring server and is syndicated between the environments. If syndication is configured to occur at certain time intervals, this time also must be kept in mind when considering when updates will become visible.
Google Chrome-specific notes
- Google Chrome always sends the first HTTP(S) GET request for content with a “maxage=0” header. Not only does this prevent cached pages, it forces a refresh in intermediate caches, which defeats the whole purpose of page caching.
- If Chrome fetches the content via an “HREF”, it does not send the “maxage=0” header.
- Also, if this is the first request for an item after the browser has been opened, it will not send maxage=0. Thus, it is possible to get one instance of a page from an intermediate cache.
- The WebSphere Portal 220.127.116.11 and 8.0 themes are modularized themes. Based on the module configuration, Portal combines files and send them out with a cache header. The cache and expiry headers can be configured in the WebSphere Application Server Resource Environment Provider, WP ConfigService:
- If storing the themes in WebDav, ensure that the timeouts are set for filestore.cache.expiration.0.seconds, filestore.cache.expiration.1.seconds, filestore.cache.expiration.2.second,s and filestore.cache.expiration.3.seconds.
- For performance reasons the theme does not generate full URLs when switching a page for all pages referenced by this page; rather, it appends the identifier of the page, leading to an extra redirect when selecting the page.
- Redirects cannot be cached. In case of an anonymous site, it is more beneficial to leverage browser caching than to avoid generating the full URLs. To disable the redirect, configure the base URL for the theme as well as disable friendly URLs.
Tell us what you think
Please visit this link to take a one-question survey about this article:
developerWorks WebSphere Portal zone:
developerWorks Web Content Manager product page:
WebSphere Portal forum:
About the authors
joined IBM in 1982, since which time he has had various technical and management assignments in networking, digital signal processing, Java advocacy, and IBM WebSphere. He is currently the Technical Team Lead for the WebSphere Portal SEAL team. His primary focus is resolving critical customer situations involving the architecture, deployment, and operation of WebSphere Portal and Web Content Manager. You can reach Alex at firstname.lastname@example.org
is a Software Architect at IBM's Research Triangle Park Development Lab. He has worked on the WebSphere Portal development team for ten years, focusing on various components including security and virtual portals. In his current role he supports clients as a lab-based services consultant and works as Chief Programmer on the development of the product. You can contact Thomas at email@example.com