[opencms-dev] Repeated (daily) crashes of OpenCms 7.0.5
Nick Straguzzi
nick.straguzzi at credosystems.com
Thu Feb 19 18:07:06 CET 2009
All: Thank you for the replies to my message yesterday. Let me provide a little more information about the problem that we discovered this morning, which might be of assistance. After that, I'll answer all the questions I can that you've asked so far (see below), in one email for convenience.
NEW INFORMATION
- The issue with the missing module name in the URI ("/system/modules/resources/images/featurephoto.jpg") is probably not relevant to the problem. That URI was being calculated dynamically from a custom folder property, which contains the module name to use. It seems that OpenCms is unable (due to timeout) to read that property, and it defaulted to the empty string. In other words, this is a consequence of the problem, not a symptom.
- There's some evidence that the threads are waiting for something to happen with MySQL. We observe this behavior: our sites remain up and running, but no one can log into the Workplace. The stacktrace in opencms.log is shown below (much clipped to save space.) Our IT people say there is nothing unusual in the MySQL logs that correspond to these exceptions.
org.opencms.db.CmsDbSqlException: An SQL error occurred when executing the following query: .
at org.opencms.db.generic.CmsUserDriver.readUserInfos(CmsUserDriver.java:1473)
[snip]
at java.lang.Thread.run(Thread.java:619)
Caused by: org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error: Timeout waiting for idle object
at org.apache.commons.dbcp.PoolingDriver.connect(PoolingDriver.java:184)
at java.sql.DriverManager.getConnection(DriverManager.java:582)
at java.sql.DriverManager.getConnection(DriverManager.java:207)
at org.opencms.db.CmsSqlManager.getConnectionByUrl(CmsSqlManager.java:104)
at org.opencms.db.generic.CmsSqlManager.getConnection(CmsSqlManager.java:231)
at org.opencms.db.generic.CmsUserDriver.readUserInfos(CmsUserDriver.java:1445)
... 47 more
Caused by: java.util.NoSuchElementException: Timeout waiting for idle object
at org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:825)
at org.apache.commons.dbcp.PoolingDriver.connect(PoolingDriver.java:176)
... 52 more
ANSWERS TO YOUR QUESTIONS
Manfred Schenk wrote:
>
>Hi, could you describe your installation a bit further?
>Do you use a "Standard" installation or use some extra-stuff like "stripping
>away the opencms prefixes of the URLs" or a combination of apache and tomcat?
>
>I also had a stack overflow some months ago when I misconfigured the 440-
>handler so that it had been called recursively until the stack overflowed.
>
It's a "strip" implementation, but nothing too out of the ordinary. We installed OpenCms as the ROOT to eliminate the application /opencms prefix. But, instead of getting rid of the second /opencms prefix, for the main servlet, we just renamed it to /scee. Our customers are perfectly happy with having a URL of, say, http://www.foo.com/scee/mypage.htm so there was no urgency in getting rid of that second prefix.
If one of our domains is served by opencms, then Apache just passes it to Tomcat. Our rewrite rules are pretty simple:
www.foo.com/ -> transform to www.foo.com/scee/en/
*/scee/* -> forward to Tomcat
*/export/* -> forward to Tomcat
Hmm...the 440 handler seems like a good place to check out the stack overflow issue. We have not configured that at all. Certainly I would not want any handler to be used on a JPG, and in particular not on a JPG that is part of the site template (which is the case with that featurephoto.jpg file above). Otherwise, if I try to show a nice 404 message in the user's site template, then that would seem to cause a recursive 404 and a stack overflow. Can anyone tell me how to fix that?
Christian Steinert wrote:
>
>Sadly, I have no idea about your actual problem, but you have your
>memory parameters the wrong way around: "-Xms1024m -Xmx512m " would
>mean: start with 1024M but don't use more than 512M.
>
>Do you have any idea, whether the crashes always happen around the
>same time of the night? Do you maybe have any jobs running around then
> - inside or outside of OpenCms?
>
My fault on the memory parameters; I mistranscribed them in the email. :-) They are in fact x1024m and s512m, just as you'd expect.
The crashes don't seem to happen at any particular time - it is more like 18-24 hours after each reboot. The only job that we do have running overnight is the system backup service, which does both the Tomcat directory and the MySQL database. The crashes don't seem to coincide with that.
Farnaz Fotrousi (and similarly Georgi Naplatanov) wrote:
>
>I had similar problem and increasing max_connection in mysql worked for me.
>You can edit "max_connections=100" in "my.ini or my.cnf" ( mysql configuration file).
>
Thank you Farnaz and Georgi, we'll try that. I see that there is no max_connection parameter in our my.cnf file (max_connections=200 is commented out.)
OTHER THOUGHTS:
- Like Manfred, I am now starting to wonder about the possibility of a 404 recursion. A few of those over time would certainly tie up threads to the point where we'd see the behavior we're seeing. Certainly I do not want any 404 processing on an image file of any sort; just send a 404 response and be done with it. How can I check to see if that's the issue?
- Also, is there any way to configure OpenCms for the maximum number of MySQL database threads it supports? Perhaps that's a Tomcat setting? Keep in mind that we have a separate Tomcat also hitting MySQL, and it has no problems at all - it continues to run just fine while OpenCms repeatedly crashes. Thus, I think that if we're running out of threads/connections, it's inside our Tomcat and not in MySQL (else, the other application would be crashing too, correct?)
Thanks again,
Nick
-----Original Message-----
From: Nick Straguzzi
Sent: Wednesday, February 18, 2009 11:39 AM
To: 'opencms-dev at opencms.org'
Subject: Repeated (daily) crashes of OpenCms 7.0.5
All:
We're in need of some good admin assistance from the OpenCms community to help us figure out the cause of repeated system crashes on our new server. Please, if any of the stuff below looks familiar to you, let us know ASAP! Thanks in advance - Nick
THE PROBLEM:
Once a day (usually during the nighttime hours), OpenCms stops responding. The logs and error messages are of little help. The problem always ends in a stack overflow, but the root cause is never listed. Thus it is extraordinarily difficult for us to figure out what's going on. Bouncing Tomcat always clears up the problem for another 24 hours or so.
THE BASICS:
The server crashes because all of its available processing threads are in a wait state. All web server processes are also waiting, but there is no indication what they're waiting for. (See trace below)
The most interesting part is the errors logged in opencms.1. As noted, it's always a stack overflow with a very deep nested trace, far too deep to put into an email. However, there is one quirk that may help you figure out the problem. The top of the trace (and the nested errors too) begins as follows:
16 Feb 2009 13:07:22,693 ERROR [org.opencms.jsp.CmsJspBean: 298] Error in JSP Bean.
org.opencms.file.CmsVfsResourceNotFoundException: Error reading resource
from path "/system/modules/resources/images/featurephoto.jpg".
...note that the resource it cannot find begins "/system/modules/resources/..." The module name is missing in between "modules" and "resources". (We are using a standard OpenCms folder structure in our system tree.) We do not believe that any of our outgoing pages include such a URI; we think the module name is for some reason being dropped internally by OpenCms when it maps a URL to a URI. At any rate, even if this nonexistent resource was being requested, why would it cause repeated errors and a stack overflow?
THE ENVIRONMENT:
* Linux with 2GB of memory.
* Tomcat 5.5.27
* MySQL 4.1.22
* Apache front end
* Java6 (1.6.0_11)
Java settings:
-Xms1024m -Xmx512m -XX:MaxPermSize=256m -server -verbosegc -XX:+PrintGCDetails -Djava.awt.headless=true
OpenCms is installed as the ROOT application.
The main servlet (opencms) has been renamed to "scee"
Thus, a typical URI is: /scee/en/index.htm
It is a multisite environment. Apache forwards all requests beginning with /scee and /export to Tomcat. All sites are mapped properly in opencms-system.xml.
There are two separate Tomcats using two separate Javas (the other is Java5) running on this server. They meet only at MySQL, but they use separate databases. The other Tomcat runs smoothly with no problems.
OTHER INFORMATION:
(the thread dump after the system crashes shows this:)
http-9090-Processor12" daemon prio=10 tid=0x0845f000 nid=0x35ad in
Object.wait() [0x634ad000..0x634ae0a0]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x755e0238> (a org.apache.tomcat.util.threads.ThreadPool$ControlRunnable)
at java.lang.Object.wait(Object.java:485)
at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:661)
- locked <0x755e0238> (a org.apache.tomcat.util.threads.ThreadPool$ControlRunnable)
at java.lang.Thread.run(Thread.java:619)
(Heap report at time of crash)
PSYoungGen total 57152K, used 49283K [0xadcb0000, 0xb1590000, 0xb4e70000)
eden space 56064K, 87% used [0xadcb0000,0xb0c8ab98,0xb1370000)
from space 1088K, 25% used [0xb1370000,0xb13b60b0,0xb1480000)
to space 1024K, 0% used [0xb1490000,0xb1490000,0xb1590000)
PSOldGen total 466048K, used 43548K [0x74e70000, 0x91590000, 0xadcb0000)
object space 466048K, 9% used [0x74e70000,0x778f72b0,0x91590000)
PSPermGen total 23296K, used 23253K [0x64e70000, 0x66530000, 0x74e70000)
object space 23296K, 99% used [0x64e70000,0x665256d0,0x66530000)
...we are not sure why the Permgen only uses 24mb of memory when the max is 256mb. Any thoughts?
More information about the opencms-dev
mailing list