[opencms-dev] Repeated (daily) crashes of OpenCms 7.0.5

Nick Straguzzi nick.straguzzi at credosystems.com
Wed Feb 18 17:38:46 CET 2009


All:

We're in need of some good admin assistance from the OpenCms community to help us figure out the cause of repeated system crashes on our new server.  Please, if any of the stuff below looks familiar to you, let us know ASAP!  Thanks in advance  - Nick


THE PROBLEM:

Once a day (usually during the nighttime hours), OpenCms stops responding.  The logs and error messages are of little help.  The problem always ends in a stack overflow, but the root cause is never listed.  Thus it is extraordinarily difficult for us to figure out what's going on.  Bouncing Tomcat always clears up the problem for another 24 hours or so.


THE BASICS:

The server crashes because all of its available processing threads are in a wait state.  All web server processes are also waiting, but there is no indication what they're waiting for.  (See trace below)

The most interesting part is the errors logged in opencms.1.  As noted, it's always a stack overflow with a very deep nested trace, far too deep to put into an email.  However, there is one quirk that may help you figure out the problem.  The top of the trace (and the nested errors too) begins as follows:

   16 Feb 2009 13:07:22,693 ERROR [org.opencms.jsp.CmsJspBean: 298] Error in JSP Bean.
   org.opencms.file.CmsVfsResourceNotFoundException: Error reading resource
   from path "/system/modules/resources/images/featurephoto.jpg".

...note that the resource it cannot find begins "/system/modules/resources/..."  The module name is missing in between "modules" and "resources".  (We are using a standard OpenCms folder structure in our system tree.)  We do not believe that any of our outgoing pages include such a URI; we think the module name is for some reason being dropped internally by OpenCms when it maps a URL to a URI.  At any rate, even if this nonexistent resource was being requested, why would it cause repeated errors and a stack overflow?


THE ENVIRONMENT:

*  Linux with 2GB of memory.
*  Tomcat 5.5.27
*  MySQL 4.1.22
*  Apache front end
*  Java6 (1.6.0_11)

Java settings:
-Xms1024m -Xmx512m -XX:MaxPermSize=256m -server -verbosegc -XX:+PrintGCDetails -Djava.awt.headless=true

OpenCms is installed as the ROOT application.
The main servlet (opencms) has been renamed to "scee"
Thus, a typical URI is:   /scee/en/index.htm

It is a multisite environment.  Apache forwards all requests beginning with /scee and /export to Tomcat.  All sites are mapped properly in opencms-system.xml.

There are two separate Tomcats using two separate Javas (the other is Java5) running on this server.  They meet only at MySQL, but they use separate databases.  The other Tomcat runs smoothly with no problems.


OTHER INFORMATION:

(the thread dump after the system crashes shows this:)

http-9090-Processor12" daemon prio=10 tid=0x0845f000 nid=0x35ad in
Object.wait() [0x634ad000..0x634ae0a0]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x755e0238> (a org.apache.tomcat.util.threads.ThreadPool$ControlRunnable)
        at java.lang.Object.wait(Object.java:485)
        at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:661)
        - locked <0x755e0238> (a org.apache.tomcat.util.threads.ThreadPool$ControlRunnable)
        at java.lang.Thread.run(Thread.java:619)


(Heap report at time of crash)

 PSYoungGen      total 57152K, used 49283K [0xadcb0000, 0xb1590000, 0xb4e70000)
  eden space 56064K, 87% used [0xadcb0000,0xb0c8ab98,0xb1370000)
  from space 1088K, 25% used [0xb1370000,0xb13b60b0,0xb1480000)
  to   space 1024K, 0% used [0xb1490000,0xb1490000,0xb1590000)
 PSOldGen        total 466048K, used 43548K [0x74e70000, 0x91590000, 0xadcb0000)
  object space 466048K, 9% used [0x74e70000,0x778f72b0,0x91590000)
 PSPermGen       total 23296K, used 23253K [0x64e70000, 0x66530000, 0x74e70000)
  object space 23296K, 99% used [0x64e70000,0x665256d0,0x66530000)

...we are not sure why the Permgen only uses 24mb of memory when the max is 256mb.  Any thoughts?



More information about the opencms-dev mailing list