ShowTable of Contents
Introduction
When you develop a Web site with IBM® Web Content Manager™ (WCM), you must first define a Library, then a Site Area for your Library, and so on, resulting in a long URL with a common path, for example:
For improved Search Engine Optimization (SEO) you must call your HomePage with the following URL:
without an HTTP answer r301 or r302.
Then you replace your common path, like /wps/wcm/connect/<LibraryName>, with a short path, like /it/ for an Italian Web site. The typical solution is to apply some rewrite rule; however, this is an incomplete solution.
Because every href contained in the html page contains a long path, in this mode you obtain two URLs for every object---one short and one long. The result is that the search engine notices two URLs pointing to the same resource and divides the ranking by two, thwarting our efforts to increase SEO.
For example, if you call
http://www.iulm.it/ and display the source page, you can find the long URL, as shown in figure 1.
Figure 1. Long URL example
And if you call
http://www.iulm.it/wps/wcm/connect/iulmit/iulm-it/Home, you obtain the same page that you obtained with
http://www.iulm.it/, if the search engine locates it along a route that points to the URL and verifies that the URL is the same short-halves ranking.
Solution
As an example of correcting this problem, we will look at
http://www.sonus.com. We approach the solution in two phases; first we force the answer to the call only the homePage FQDN, then we take care of removing the common part of the URL.
Set HomePage answer
It’s quite simple to configure the IBM HTTP Server (IHS) with Redirect rules to hide context-path on a WCM home page. For example, on
http://www.sonus.com/:
1. Activate
mod_rewrite in the httpd.conf file:
a) Open the httpd.conf file, found in <installHTTPServer>/conf.
b) Find the term rewrite_module and remove the # in the beginning of the row; save it.
2. Define a specific virtualHost for your FQDN as follows:
a) Find the
NameVirtualHost tag and activate, for example, NameVirtualHost <yoursIP>:80
b) Find the
VirtualHost tag and activate it as shown in listing 1.
Listing 1. Activate VirtualHost tag
<VirtualHost <yourIP>:80>
DocumentRoot www/<mySite>
ServerName <myFQDNServerName>
ErrorLog logs/www/<mySite>/error.log
CustomLog logs/www/<mySite>/access.log common
</VirtualHost>
3. Insert
the Rewrite rule by inserting the lines in listing 2 before the
</VirtualHost> tag.
Listing 2. Rewrite rule code
RewriteEngine On #comment it to deactivate rewrite
RewriteLog <path>\<FileLog>.log #comment it to deactivate log
RewriteLogLevel 4 #comment it to deactivate log
#--------------------- short URL WCM
RewriteCond %{HTTP_HOST} ^<fqdn>
RewriteCond %{REQUEST_URI} ^(/)?$
RewriteRule ^(/)?$ /<path to the HomePage>/[PT,NC]
#-------------------------------------- End short Url WCM
4. Save it and restart the HTTP server.
Now your Web site answers http://<yourSiteFQDN> as
http://www.sonus.com (see figure 2).
Figure 2. Sonus Web Site

Remove the common path
This step is more complex. The idea is to rewrite dynamically every
- Link must be relative, such as /wps/wcm/connect/<libraryName>/<SiteArea>/...
- Link to map your homepage must be /
- Resource is in your WCM application
To apply this solution you must install an Apache module to allow a Proxy html, that is, to allow the rewriting of dynamic URLs. To install do this, you can recompile the source module, if necessary.
In the .zip file attached to this article, the modules are divided by folder, and you must copy every folder into the same folder under your Http Server installation path, when working with IHS 7 on a Microsoft® Windows® environment.
The rewrite process works via a Reverse Proxy functionality, an example of which is shown in figure 3.
Figure 3. Example rewrite process
As the figure shows:
- The user requests a page such as http://www.mysite.com/it/home.
- The VirtualHost receives the request and proxies it to an internal virtualhost like www1.mysite.com.
- The new virtualhost sends the request to the Application Server.
- The application server answers with a std page with a long URL link in
- The internal virtualHost re-proxies the answer, and a principal virtual host dynamically rewrites all the links in the page.
- Answer with a new rewrite page to the user.
Install the Apache module
To do this:
- Expand the attached .zip file into a temp directory.
- Copy the content of lib in /lib.
- Copy the content of modules in /modules.
- Copy the content of bin in /bin.
- Copy the content of conf in /conf.
Now add the following line in your httpd.conf file after your declaration of
:
include conf\proxy-html.conf
Activate the proxy module in the load module section, find the proxy_module keyword, and activate the loadmodule line as follows:
LoadModule proxy_module modules/mod_proxy.so
LoadModule proxy_connect_module modules/mod_proxy_connect.so
LoadModule proxy_ftp_module modules/mod_proxy_ftp.so
LoadModule proxy_http_module modules/mod_proxy_http.so
Edit your Proxy_Html.conf file to map to your library environment (see listing 3).
Listing 3. Edit Proxy_Html.conf file
#First, to load the module with its prerequisites. Note: mod_xml2enc
# is not always necessary, but without it mod_proxy_html is likely to
# mangle pages in encodings other than ASCII or Unicode (utf-8).
# For Unix-family systems:
# LoadFile /usr/lib/libxml2.so
# LoadModule proxy_html_module modules/mod_proxy_html.so
# LoadModule xml2enc_module modules/mod_xml2enc.so
# For Windows
# LoadFile <HttpInstallPath>/bin/zlib.dll
# LoadFile <HttpInstallPath>/bin/iconv.dll
# LoadFile <HttpInstallPath>/bin/libxml2.dll
# LoadModule proxy_html_module modules/mod_proxy_html.so
# LoadModule xml2enc_module modules/mod_xml2enc.so
The file is used with UTF-8 Charset; if you use another, you can modify the “Charset Section” in accordance with http://apache.webthing.com/mod_proxy_html/config.html.
Design the VirtualHost
In this example, the:
- Public FQDN is www.mysite.com.
- Private FQDN is www1.mysite.com, which is unreachable for Internet users.
- Short root is /it (substituted /wps/wcm/connect/ for /it).
Listing 4 shows the code for the public VirtualHost.
Listing 4. Public VirtualHost
<VirtualHost <yourIP>:80>
DocumentRoot www/mysite
ServerName www.mysite.com
ErrorLog logs/www/mysite/error.log
CustomLog logs/www/mysite/access.log common
#--------------------- short URL WCM
# manage redirect hidden to /
RewriteEngine On
RewriteLog logs/www/mysite/rewrite.log # write log,comment to disable
RewriteLogLevel 4 # write log,comment to disable
RewriteCond %{HTTP_HOST} ^www.mysite.com
RewriteCond %{REQUEST_URI} ^(/)?$
RewriteRule ^(/)?$ /it/<pathToHomePage> [PT,NC]
# <pathToHomePage> is your url part downstream of the library name in
# the complete URL
#
# for example
#
# my std complete URL is :
# http://www.mysite.com/wps/wcm/connect/mysite/site/home/homepage
#
# my pathToHomePage = “mysite/site/home/homepage”
ProxyHTMLLogVerbose On # write debug log in error.log file switch on/off
LogLevel Debug
ProxyPass /it/ http://www1.mysite.com/wps/wcm/connect/mysite/site/
ProxyHTMLURLMap http://www1.mysite.com/wps/wcm/connect/mysite/site/ /it [c]
<Location /it/>
ProxyHTMLEnable On
ProxyPassReverse http://www1.mysite.com/wps/wcm/connect/mysite/site/
SetOutputFilter proxy-html
ProxyHTMLURLMap /wps/wcm/connect/us/ /it/
ProxyHTMLURLMap /it /it
</Location>
</VirtualHost>
Listing 4 shows the code for the private VirtualHost.
Listing 4. Private VirtualHost
<VirtualHost <yourIP>:80>
DocumentRoot www/mysite
ServerName www-1.mysite.com
ErrorLog logs/www/mysite/error.log
CustomLog logs/www/mysite/access.log common
</VirtualHost>
Restart the HTTP server, and enjoy your new shortened URL page. If your configuration is correct, when you restart the HTTP Server you can see the following lines in your error.log file, indicating that the workers are initialized:
[Tue Feb 14 14:35:53 2012] [debug] proxy_util.c(1808): proxy: grabbed scoreboard slot 0 in child 6384 for worker http://www1.site.com/wps/wcm/connect/it/
[Tue Feb 14 14:35:53 2012] [debug] proxy_util.c(1904): proxy: initialized worker 0 in child 6384 for (www1.site.com) min=0 max=600 smax=600
And when the Proxy-html is working correctly, you can see the following lines:
[Tue Feb 14 14:37:55 2012] [debug] mod_proxy_http.c(56): proxy: HTTP: canonicalising URLwww1.site.com/wps/wcm/connect/it/azienda/
[Tue Feb 14 14:37:55 2012] [debug] proxy_util.c(1494): [client 87.5.140.181] proxy: http: found worker http://www1.site.com/wps/wcm/connect/it/ for http://www1.site.com/wps/wcm/connect/it/azienda/
[Tue Feb 14 14:37:55 2012] [debug] mod_proxy.c(1000): Running scheme http handler (attempt 0)
[Tue Feb 14 14:37:55 2012] [debug] mod_proxy_http.c(1942): proxy: HTTP: serving URL http://www1.site.com/wps/wcm/connect/it/azienda/
[Tue Feb 14 14:37:55 2012] [debug] proxy_util.c(2001): proxy: HTTP: has acquired connection for (www1.site.com)
[Tue Feb 14 14:37:55 2012] [debug] proxy_util.c(2057): proxy: connecting http://www1.site.com/wps/wcm/connect/it/azienda/ to www1.site.com:80
[Tue Feb 14 14:37:55 2012] [debug] proxy_util.c(2155): proxy: connected /wps/wcm/connect/it/azienda/ to www1.site.com:80
[Tue Feb 14 14:37:55 2012] [debug] proxy_util.c(2310): proxy: HTTP: fam 2 socket created to connect to www1.site.com
[Tue Feb 14 14:37:55 2012] [debug] proxy_util.c(2416): proxy: HTTP: connection complete to 172.24.254.17:80 (www1.site.com)
[Tue Feb 14 14:37:55 2012] [debug] mod_proxy_http.c(1725): proxy: start body send
[Tue Feb 14 14:37:55 2012] [debug] mod_xml2enc.c(203): [client 87.5.140.181] Content-Type is text/html; charset=UTF-8
[Tue Feb 14 14:37:55 2012] [info] [client 87.5.140.181] Got charset UTF-8 from HTTP headers
[Tue Feb 14 14:37:55 2012] [info] [client 87.5.140.181] H: matched /wps/wcm/connect/it/, substituting /it/
[Tue Feb 14 14:37:55 2012] [info] [client 87.5.140.181] H: matched /wps/wcm/connect/it/, substituting /it/
[Tue Feb 14 14:37:55 2012] [info] [client 87.5.140.181] H: matched /wps/wcm/connect/it/, substituting /it/
[Tue Feb 14 14:37:55 2012] [info] [client 87.5.140.181] H: matched /wps/wcm/connect/it/, substituting /it/
[Tue Feb 14 14:37:55 2012] [info] [client 87.5.140.181] H: matched /it, substituting /it
[Tue Feb 14 14:37:55 2012] [info] [client 87.5.140.181] H: matched /it, substituting /it
[Tue Feb 14 14:37:55 2012] [info] [client 87.5.140.181] H: matched /it, substituting /it
[Tue Feb 14 14:37:55 2012] [info] [client 87.5.140.181] H: matched /it, substituting /it
[Tue Feb 14 14:37:55 2012] [debug] mod_proxy_http.c(1818): proxy: end body send
[Tue Feb 14 14:37:55 2012] [debug] proxy_util.c(2019): proxy: HTTP: has released connection for (www1.site.com)
[Tue Feb 14 14:38:00 2012] [debug] mod_proxy_http.c(56): proxy: HTTP: canonicalising URLwww1.site.com/wps/wcm/connect/it/azienda/
[Tue Feb 14 14:38:00 2012] [debug] proxy_util.c(1494): [client 87.5.140.181] proxy: http: found worker http://www1.site.com/wps/wcm/connect/it/ for http://www1.site.com/wps/wcm/connect/it/azienda/
[Tue Feb 14 14:38:00 2012] [debug] mod_proxy.c(1000): Running scheme http handler (attempt 0)
[Tue Feb 14 14:38:00 2012] [debug] mod_proxy_http.c(1942): proxy: HTTP: serving URL http://www1.site.com/wps/wcm/connect/it/azienda/
[Tue Feb 14 14:38:00 2012] [debug] proxy_util.c(2001): proxy: HTTP: has acquired connection for (www1.site.com)
[Tue Feb 14 14:38:00 2012] [debug] proxy_util.c(2057): proxy: connecting http://www1.site.com/wps/wcm/connect/it/azienda/ to www1.site.com:80
[Tue Feb 14 14:38:00 2012] [debug] proxy_util.c(2155): proxy: connected /wps/wcm/connect/it/azienda/ to www1.site.com:80
[Tue Feb 14 14:38:00 2012] [debug] mod_proxy_http.c(1822): proxy: header only
[Tue Feb 14 14:38:00 2012] [info] [client 87.5.140.181] No content-type; bailing out of proxy-html filter
[Tue Feb 14 14:38:00 2012] [info] [client 87.5.140.181] No content-type; bailing out of proxy-html filter
[Tue Feb 14 14:38:00 2012] [debug] proxy_util.c(2019): proxy: HTTP: has released connection for (www1.site.com)
NOTE: Before implementing this in a production environment, be sure to disable verbose logging.
To see a working example, visit http://www.sonus.com, where the short root is /content.
Resources
About the author

Andrea Fontana currently works as a System Architect, defining, organizing, and configuring complex IBM product-based solutions. In particular, he works with WebSphere Portal and its collaborative environment including Domino 8.0.x, 8.5, IBM Connections 3.0.1, IBM Lotus Quickr 8.0.x, and IBM Sametime, setting up SSO Kerberos integration solutions and configuring systems with a r-proxy solution with SSL integration. His past experience includes roles as an Application Developer, Database Administrator, and Project Manager in a wide variety of business applications. He graduated from the ITIS Zuccante C., Mestre (Venice), specializing in Industrial Electronics. You can reach Andrea at a.fontana@net2action.com.