Skip to main content link. Accesskey S
  • Translate Page ▼
  • Anonymous
  • Log on
  • Help
  • IBM logo
  • WebSphere Portal Family wiki
  • All Wikis
  • Home
  • Community Articles
  • Product Documentation
  • Learning Center


Search

Advanced Search

Categories

Tag Cloud

  • 6.0
  • 6.1
  • 6.1.0.1
  • 6.1.5
  • 6.1.x
  • 7.0
  • 7.0.0.2
  • 8.0
  • accelerator
  • accelerators
  • ActiveInsight
  • administrator
  • administrators
  • advanced configuration
  • API
  • authoring
  • Banking template
  • basic configuration
  • best practices
  • builder
  • building a site
  • calls
  • cluster
  • customizing
  • demo
  • demos
  • deployment
  • deployment scenario
  • developer
  • developing
  • development
  • Enable
  • Express
  • Extend
  • Factory
  • feature
  • feature set
  • fix pack 2
  • Government to Business template
  • information center
  • install
  • installation
  • installing
  • known issues
  • LDAP
  • Learning
  • Lotus
  • management
  • media portal
  • media toolbox
  • media wcm
  • message catalog
  • messages
  • migration
  • migration tool
  • mobile
  • mobile experience
  • Mobile Portal
  • Mobile Portal Accelerator
  • Mobile portal Toolkit
  • mobile webkit
  • model
  • MPA
  • multiplatform
  • multiple LDAPs
  • PDM
  • performance
  • planning
  • portal
  • Portal 6.1
  • portlet
  • product doc
  • Redbooks
  • Redbooks Wiki
  • Retail Vendor template
  • SAP
  • search
  • security
  • set
  • SharePoint
  • SharePoint integration
  • solutions catalog
  • test infrastructure
  • theme
  • theme optimization
  • troubleshooting
  • video
  • WAB
  • wcm
  • WCM 6.1
  • web application bridge
  • web content
  • Web Content Management
  • websphere
  • WebSphere Portal
  • WebSphere Portlet Factory
  • Workforce
  • worksheet
  • z/os
  • zos
InformationInformation
You are currently viewing machine translated content. IBM translation might be available. Click IBM Translated Product Documentation to see what is available.X


Home > Resources for WebSphere Portal administrators and developers > Troubleshooting IBM WebSphere Portal and IBM Lotus Web Content Management server hangs
Rate this article 1 starRate this article 2 starsRate this article 3 starsRate this article 4 starsRate this article 5 stars

Troubleshooting IBM WebSphere Portal and IBM Lotus Web Content Management server hangs 

expanded Abstract
collapsed Abstract
No abstract provided.
Troubleshooting IBM WebSphere Portal and IBM Lotus Web Content Management server hangs

 

 

Anuradha D Chitta

Advisory Software Engineer

IBM Software Group

Bangalore, KA India

 

 

October 2009

 

 

Summary: This article presents a troubleshooting guide for IBM® WebSphere® Portal and IBM Lotus® Web Content Management server hangs, explaining how to identify and isolate their root causes.

 

 

Contents

1 Introduction. 1

2 Administrator’s checklist 2

2.1 Check for batch jobs being run on the server 2

2.2 Check verbosegc for heap usage. 2

2.2.1 Determining the cause of the memory outage. 2

2.3 WebContainer thread contention. 4

2.4 Lack of response from external sources. 6

2.5 Checking logs for hung threads. 8

3 Conclusion. 8

4 Resources. 8

About the author 9

 

 

 

1 Introduction

When a production server becomes unresponsive, administrators are inclined to restart the server as quickly as possible, to reduce the downtime. However, restarting the server without collecting the diagnostics will leave you with little information to troubleshoot what has caused the hang.

 

In this article we discuss how to identify what caused the hang and explain the necessary information to collect before restarting the server.

 

Some of the common factors that lead to server unresponsiveness include:

 

  • Lack of heap space
  • WebContainer Thread contention
  • Lack of response from external sources

 

2 Administrator’s checklist

Below is a checklist that administrators can use to troubleshoot WebSphere Portal and Web Content Management server hangs:

 

  • Check for batch jobs being run on the server
  • Check verbosegc for heap usage
  • Generate and review threaddumps
  • Check logs for hung threads

 

2.1 Check for batch jobs being run on the server

First, check for batch jobs or scheduled tasks running at the time of outage. Some of the tasks that can take up resources on the server include search crawls and Web Content Management tasks like memberfixer and Java™ Content Repository (JCR) indexing.

 

  • You can check the WebSphere Portal server CPU and memory usage by using vmstat and top on UNIX® servers.

 
  • Web Content Management search crawls are scheduled to run every 4 hours by default; if the content is not changing too frequently, make sure the crawls are spaced out to reduce load on the server.

 
  • Make sure the batch jobs taking up resources are scheduled during off-peak hours.

 

2.2 Check verbosegc for heap usage

Before moving the servers to production, you should have set WebSphere Portal / Web Content Management Java Virtual Machine (JVM) heap (memory) settings to optimal values, after tuning the server through load tests.

 

Even with this tuning exercise, however, unexpected load and large object requests coming from the application code can make the server run out of heap space and fail to satisfy Java object allocation requests. This can lead to excessive garbage collection cycles by pausing the threads, resulting in a server hang.

 

2.2.1 Determining the cause of the memory outage

First, let’s discuss the memory limitations on 32-bit platforms. The total size of a process can reach up to 2G on 32-bit platforms, which includes both the heap memory as well as native memory required by the native (jni/jdbc prepared statements, OS native calls, etc.) code to allocate objects.

 

In such cases make sure you do not let the maximum heap size grow to larger than 1.5G, leaving 500M for native memory allocations.

 

There are two types of Out of Memory (OOM) conditions:

 

(1) Complete heap exhaustion. When the server is totally out of heap space, the garbage collection cycles take longer and, during that process, the application threads are paused until the garbage collection cycle ends.

 

Make sure verbosegc is enabled on the server, check the native_stderr.log for allocation failures just before the outage, and check the amount of free heap space. 

 

Look for this output in the verbosegc log:

 

 

Typical Allocation Failure when the server is totally out of heap space:

 

  15:48:55 2009

  0% free (588040/1342175744), in 10088ms>

 

  = 32), weak 0, final 2, phantom 0>

 

  15:49:02 2009

 

 

  = 32), weak 0, final 5, phantom 0>

 

JVMDG217: Dump Handler is Processing OutOfMemory - Please Wait.

JVMDG315: JVM Requesting Heap dump file

JVMDG318: Heap dump file written to D:\IBM\WEBSPH~1\APPSER~1

\heapdump.20090822.154902.5580.phd

JVMDG303: JVM Requesting Java core file

JVMDG304: Java core file written to D:\IBM\WEBSPH~1\APPSER~1

\javacore.20090822.154944.5580.txt

JVMDG274: Dump Handler has Processed OutOfMemory.

 

From the verbosegc logs, if you notice that the server is running below 10% free heap for a long period of time, you might want to increase the Heapsize.

 

At the same time, investigate what is consuming the heap by generating a Heapdump or by analyzing the generated Heapdumps. For more information, refer to the IBM Support Techdoc, “Webcast replay: Using IBM HeapAnalyzer to diagnose Java heap issues.”

 

(2) Large object request failure. If the JVM is not able to satisfy an allocation request for an object of reasonable size, even when there is a lot of free heap, it indicates the heap is highly fragmented.

 

Make sure the KCluster is tuned and avoid making too many large object requests by uploading large files, etc.

 

The size of the object being requested by the applications can be identified from the allocation failure records in native_stderr.log as follows:

 

 

The above Allocation Failure is due to the code making a very large object request (36M) from the heap. Once we know this to be the cause of the failure, enable the following environmental parameter:

 

ALLOCATION_THRESHOLD=5000000

 

This will print out the stacktrace of every request that is larger than 5M in the native_stderr.log.

 

Once you have the stack of the code making such large object requests, you can get the owner of that code to consider reducing the object sizes, to avoid such large object allocations.

 

If this code cannot be changed for some reason, then you can set aside a chunk of heap for large objects alone so that it remains fairly unfragmented, satisfying large requests by providing enough contiguous heapspace.

 

You can do this using the property –Xloratio, which sets aside n% of your heap for large objects only. You can find more information on setting the KCluster and loratio properties in the IBM Support Technotes, “Avoiding Java heap fragmentation with Java SDK V1.4.2” and “How to allocate large objects into Large Object Area on IBM SDK 1.4.2 SR1 and later.”

 

2.3 WebContainer thread contention

Requests coming into WebSphere Portal / Web Content Management are served by Web Container threads. The WebContainer thread-pool setting needs to be tuned according to the expected load during peak times. When the server stops responding, the first thing we need to do is generate threaddumps, using the following mechanisms:

 

Microsoft® Windows®:

wsadmin.bat [-host host_name] [-port soap_port_number] [-user userid[-password password] 

 

wsadmin> set jvm [$AdminControl completeObjectName type=JVM,process=WebSphere_Portal,*]

 

wsadmin>$AdminControl invoke $jvm dumpThreads

 

UNIX:  

kill -3 PID

 

The path to the location of the Javacore file will be in the verbosegc output. On Solaris the threaddumps are printed into the native_stdout.log.

 

You can examine Javacores (threaddumps) to see what the WebContainer threads are doing and check for any deadlocks reported. Look at the state of the WebContainer threads. If most of them are in state:R (Running), it indicates that the server is under excessive load.

 

Now look at the code executing on these threads and generate subsequent threaddumps, to see how the threads are progressing. If most of the threads are in state:CW (Conditional Wait), check what condition these threads are waiting on.

 

Sample threads waiting on each other resulting in a deadlock:

Deadlock detected !!!

NULL          


2LKDEADLOCKTHR  Thread “WebContainer: 15" (0x58BD5520)

3LKDEADLOCKWTR    is waiting for:

4LKDEADLOCKMON      sys_mon_t:0x588AB898 infl_mon_t: 0x588AAE38:

4LKDEADLOCKOBJ      org.apache.log4j.Logger@37E9FCF8/37E9FD00:

3LKDEADLOCKOWN    which is owned by:

2LKDEADLOCKTHR  Thread “WebContainer: 8" (0x56C5C7A0)

3LKDEADLOCKWTR    which is waiting for:

4LKDEADLOCKMON      sys_mon_t:0x58523918 infl_mon_t: 0x00000000:

4LKDEADLOCKOBJ      java.lang.StringBuffer@3B9F5148/3B9F5150:

3LKDEADLOCKOWN    which is owned by:

2LKDEADLOCKTHR  Thread “WebContainer: 15" (0x58BD5520)

 

 

Review the code executing on the above threads and engage the respective developer/owner of that code.

 

Sample thread stacks and the corresponding activity:

An idle WebContainer thread that is waiting for incoming requests looks like this:

 

3XMTHREADINFO     "WebContainer : 1" (TID:0x56F8DD00, sys_thread_t:0x51DF0478, state:CW, native ID:0x00000B58) prio=5

4XESTACKTRACE          at java/lang/Object.wait(Native Method)

4XESTACKTRACE          at java/lang/Object.wait(Object.java:231(Compiled Code))

4XESTACKTRACE          at com/ibm/ws/util/BoundedBuffer.waitGet_(BoundedBuffer.java:190(Compiled Code))

4XESTACKTRACE          at com/ibm/ws/util/BoundedBuffer.take(BoundedBuffer.java:545(Compiled Code))

4XESTACKTRACE          at com/ibm/ws/util/ThreadPool.getTask(ThreadPool.java:817(Compiled Code))

4XESTACKTRACE          at com/ibm/ws/util/ThreadPool$Worker.run(ThreadPool.java:1480(Compiled Code))

 

 

If all the WebContainer threads are in idle state as shown above, and the server is still not responding to any requests, it indicates that the Web Server or something in front of WebSphere Portal is causing a bottleneck.

 

Examine the code stack executing on thread WebContainer : 1, and follow the progress of this thread in subsequent Javacores:

 

Multiple WebContainer threads waiting on a lock, held by “WebContainer : 1”:

3LKMONOBJECT       java/lang/Object@070000003A981CD8/070000003A981CF0: owner "WebContainer : 1" (0x000000011C8CF700), entry count 1

3LKNOTIFYQ            Waiting to be notified:

3LKWAITNOTIFY        “WebContainer : 2" (0x000000011D5AC800)

3LKWAITNOTIFY        "WebContainer : 3" (0x000000011DF18A00)

3LKWAITNOTIFY        “WebContainer : 4" (0x000000011DGDA600)

 

2.4 Lack of response from external sources

Applications often rely on external sources like Database or LDAP to process the requests. Application servers access Database using the Datasource connection pools. When the connections in the pool run out, the WebContainer threads are hung, waiting for a connection from the pool.

 

To determine whether this is the reason for server hang, you can look in the threaddumps to see how many threads are waiting on the connections from the pool.

 

A typical thread stack waiting on a Datasource pooled thread:

3XMTHREADINFO "WebContainer : 27" (TID:0x807C4D68,sys_thread_t:0x4533CE28, state:CW, native ID:0x83D2) prio=5

4XESTACKTRACE    at java.lang.Object.wait(Native Method)

4XESTACKTRACE    at com.ibm.ejs.j2c.poolmanager.FreePool.queueRequest(FreePool.java(Compiled Code))

4XESTACKTRACE    at com.ibm.ejs.j2c.poolmanager.FreePool.createOrWaitForConnection(FreePool. java(Compiled Code))

4XESTACKTRACE    at com.ibm.ejs.j2c.poolmanager.PoolManager.reserve(Poolanager.java(Compiled Code))

4XESTACKTRACE    at  com.ibm.ejs.j2c.ConnectionManager.allocateMCWrapper(ConnectionManager.java(Compiled Code))

 

When there are a large number of requests waiting on the pooled connections, make sure the Connection pool size is set greater than the Threadpool size. Also, make sure the Database is not slowing down releasing these established connections.

 

A typical thread showing the thread waiting on Database response:

3XMTHREADINFO      "WebContainer : 1" (TID:0x3030FC00, sys_thread_t:0x806FE328, state:R, native ID:0x4ED9) prio=5

4XESTACKTRACE          at java.net.SocketInputStream.socketRead0(Native Method)

4XESTACKTRACE          at java.net.SocketInputStream.read(SocketInputStream.java(Compiled Code))

4XESTACKTRACE          at com.ibm.db2.jcc.b.gb.b(gb.java(Compiled Code))

4XESTACKTRACE          at com.ibm.db2.jcc.b.gb.c(gb.java(Compiled Code))

4XESTACKTRACE          at com.ibm.db2.jcc.b.gb.c(gb.java(Compiled Code))

…….

4XESTACKTRACE          at com.ibm.db2.jcc.c.lf.c(lf.java(Compiled Code))

4XESTACKTRACE          at com.ibm.db2.jcc.c.lf.next(lf.java(Compiled Code))

4XESTACKTRACE          at com.ibm.ws.rsadapter.jdbc.WSJdbcResultSet.next(WSJdbcResultSet.java(Compiled Code))

4XESTACKTRACE          at com.ibm.wps.datastore.impl.ResourcePersister.loadDependants(ResourcePersister.java(Compiled Code))

4XESTACKTRACE          at com.ibm.wps.datastore.impl.ResourcePersister.findInternal(ResourcePersister.java(Compiled Code)

 

 

When threads remain in the above state for a long time, which you can determine from the subsequent Javacores, it indicates that either there is a very large query being run, or the Database is not responding.

 

When using Web Content Management, the JCR queries are auto-generated based on the selected criteria, so make sure the Menu and Navigation cmpnts are optimally designed and the Database is well maintained by running reorg and dbstats on a regular basis.

 

Typical Thread stack showing lack of response from LDAP server:

3XMTHREADINFO      "WebContainer : 8" (TID:0x776154F8, sys_thread_t:0x4C89C9A8, state:CW, native ID:0xBDFC) prio=5

4XESTACKTRACE          at java.lang.Object.wait(Native Method)

4XESTACKTRACE          at com.sun.jndi.ldap.Connection.readReply(Connection.java(Compiled Code))

4XESTACKTRACE          at com.sun.jndi.ldap.LdapClient.getSearchReply(LdapClient.java(Compiled Code))

4XESTACKTRACE          at com.sun.jndi.ldap.LdapClient.search(LdapClient.java(Compiled Code))

4XESTACKTRACE          at com.sun.jndi.ldap.LdapCtx.doSearch(LdapCtx.java(Compiled Code))

4XESTACKTRACE          at com.sun.jndi.ldap.LdapCtx.searchAux(LdapCtx.java(Compiled Code))

4XESTACKTRACE          at com.sun.jndi.ldap.LdapCtx.c_search(LdapCtx.java:1751)

4XESTACKTRACE          at com.sun.jndi.toolkit.ctx.ComponentDirContext.p_search(ComponentDirContext.java:386)

4XESTACKTRACE          at com.sun.jndi.toolkit.ctx.PartialCompositeDirContext.search(PartialCompositeDirContext.java:347)

4XESTACKTRACE          at javax.naming.directory.InitialDirContext.search(InitialDirContext.java:259)

4XESTACKTRACE          at com.ibm.ws.wmm.ldap.LdapConnectionImpl.searchAll(LdapConnectionImpl.java:3528)

4XESTACKTRACE          at com.ibm.ws.wmm.ldap.LdapConnectionImpl.search(LdapConnectionImpl.java:3678)

4XESTACKTRACE          at com.ibm.ws.wmm.ldap.LdapConnectionImpl.search(LdapConnectionImpl.java:2120)

 

 

When there is no response from the external sources, engage the appropriate administrators to check the problem on the Database or LDAP side. Make sure all the Web Content Management & WebSphere Portal best practices and tuning guides are followed when tuning the backend resources.

 

2.5 Checking logs for hung threads

IBM WebSphere Application Server provides a hung-thread detection function, whereby the thread monitor checks all managed threads in the system. When ThreadMonitor detects that a thread has been active longer than the time defined by the thread monitor threshold, the application server logs a warning in the WebSphere Application Server log.

 

This warning indicates the name of the thread that is hung and how long it has already been active.

 

The following message is written to the log:

 

[8/25/09 17:15:30:335 EST] 00000020 ThreadMonitor W   WSVR0605W: Thread "WebContainer : 0" (0000004a) has been active for 722918 milliseconds and may be hung.  There is/are 2 thread(s) in total in the server that may be hung.

 

Starting with WebSphere Application Server 6.0.2.29, a new property is available, which can be used to automatically generate threaddumps whenever a hung thread is detected:

 

com.ibm.websphere.threadmonitor.dump.java

 

NOTE:

·        Value: Set to true to cause a Javacore to be created when a hung thread is detected and a WSVR0605W message is printed.

 

·        The thread reported in the SytemOut.log can be cross-checked with the Javacore.

 

·        The stack executing on that thread has been in the same stack for the number of milliseconds reported by the ThreadMonitor.

 

3 Conclusion

Hopefully you now understand how to identify the areas that can cause a WebSphere Portal server to become unresponsive, including how to determine whether the issue is caused by lack of memory due to activity on the threads, how to identify what backend resources are causing the bottleneck, and the next actions that should be taken.

 

4 Resources

MustGather: No response (hang) or performance degradation for IBM WebSphere Portal 5.1:

http://www-01.ibm.com/support/docview.wss?rs=688&uid=swg21209459


IBM WebSphere Portal version 6.1.x Tuning Guide:

http://www-01.ibm.com/support/docview.wss?uid=swg27013972

 

IBM WebSphere Portal Performance Troubleshooting Guide:

http://www-01.ibm.com/support/docview.wss?uid=swg27007059&aid=1

 

WebSphere Portal and Lotus Web Content Management performance tuning guides and supplemental content:

http://www-01.ibm.com/support/docview.wss?uid=swg21314715&loc=en_US&cs=utf-8&lang=en

 

Best practices for using IBM Workplace Web Content Management V6:

http://www.ibm.com/developerworks/websphere/library/techarticles/0701_devos/0701_devos.html

 

About the author

Anuradha Chitta is an Advisory Software Engineer working with the Web Content Management team at IBM's Pune, India, facility. She was a team lead for Portal Performance and search components in IBM US, and worked extensively with JVM issues related to hangs, crashes, high CPU usage, etc., before relocating to IBM India.  Anu holds a Masters degree in Computer Science from LSU, and is an IBM Certified WebSphere ND 6.1 and Portal V6.0 System Administrator.


expanded Article information
collapsed Article information
Category:
Resources for WebSphere Portal administrators and developers, Troubleshooting Web Content Manager,
Tags:
Troubleshooting

This Version: Version 5 March 30, 2011 4:39:16 PM by Sunil Bhatnagar  IBMer
   
expanded Attachments (0)
collapsed Attachments (0)

 


expanded Versions (5)
collapsed Versions (5)
expanded Version Comparison
collapsed Version Comparison
     
Version Date Changed by               Summary of changes
This version (5) Mar 30, 2011 4:39:16 PM Sunil Bhatnagar  
4 Jul 14, 2010 2:02:27 PM Sunil Bhatnagar  
3 Oct 22, 2009 1:14:19 PM Anuradha Chitta  
2 Oct 22, 2009 12:53:16 PM Anuradha Chitta  
1 Oct 22, 2009 12:36:16 PM Anuradha Chitta  
expanded Comments (0)
collapsed Comments (0)
Copy and paste this wiki markup to link to this article from another article in this wiki.
Tip: When linking to articles use the original title, not the edited title. The alias for the link can be the edited title.
Go ElsewhereStay ConnectedSubscribe to RSSHelpAbout
  • All Lotus and WebSphere Portal wikis
  • IBM developerWorks
  • IBM Software support
  • Lotus Technical Information and Education Team Blog
  • Lotus Tech Info on Twitter
  • Lotus Tech Info on Facebook
  • Lotus product forums
  • Lotus Tech Info blog
  • IBM Collaboration Solutions
  • Recently added feedRecently added
  • Recently edited feedRecently edited
  • Recently added comments feedRecently Added Comments
  • Wiki Help
  • Forgot user name/password
  • Wiki design feedback
  • Content feedback
  • About the wiki
  • About IBM
  • Privacy
  • Contact IBM
  • IBM Terms of use
  • Wiki terms of use
Return to English
Arabic
Chinese Simplified
Chinese Traditional
French
German
Italian
Japanese
Korean
Portuguese
Russian
Spanish