Difference between revisions of "Diag/swa-problems"

From Scalix Wiki
Jump to: navigation, search
m (Tomcat)
m (Tomcat)
Line 80: Line 80:
 
You should also see the Tomcat process (its executable is 'java'). Its pid will be useful later:
 
You should also see the Tomcat process (its executable is 'java'). Its pid will be useful later:
 
  [root@goat /]# ps -ef | grep java
 
  [root@goat /]# ps -ef | grep java
  root    13136    1  0 Feb05 pts/3    00:20:43 /usr/java/jre1.5.0_13/bin/java -server -Djava.net.preferIPv4Stack=true -Xms256m -Xmx256m -Dscalix.instance=/var/opt/scalix/gt -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djava.util.logging.config.file=/var/opt/scalix/gt/tomcat/conf/logging.properties -Djava.endorsed.dirs=/var/opt/scalix/gt/tomcat/common/endorsed -classpath /usr/java/jre1.5.0_13/lib/tools.jar:/var/opt/scalix/gt/tomcat/bin/bootstrap.jar:/var/opt/scalix/gt/tomcat/bin/commons-logging-api.jar -Dcatalina.base=/var/opt/scalix/gt/tomcat -Dcatalina.home=/var/opt/scalix/gt/tomcat -Djava.io.tmpdir=/var/opt/scalix/gt/tomcat/temp org.apache.catalina.startup.Bootstrap start
+
  root    13136    1  0 Feb05 pts/3    00:20:43 /usr/java/jre1.5.0_13/bin/java -server -Djava.net.preferIPv4Stack=true  
 +
-Xms256m -Xmx256m -Dscalix.instance=/var/opt/scalix/gt -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager  
 +
-Djava.util.logging.config.file=/var/opt/scalix/gt/tomcat/conf/logging.properties  
 +
-Djava.endorsed.dirs=/var/opt/scalix/gt/tomcat/common/endorsed -classpath
 +
/usr/java/jre1.5.0_13/lib/tools.jar:/var/opt/scalix/gt/tomcat/bin/bootstrap.jar:/var/opt/scalix/gt/tomcat/bin/commons-logging-api.jar  
 +
-Dcatalina.base=/var/opt/scalix/gt/tomcat -Dcatalina.home=/var/opt/scalix/gt/tomcat  
 +
-Djava.io.tmpdir=/var/opt/scalix/gt/tomcat/temp org.apache.catalina.startup.Bootstrap start
  
 
====Webmail====
 
====Webmail====

Revision as of 18:09, 12 March 2008

Diagnosing Problems with Scalix Web Access

Introduction

SWA depends on several major subsystems for proper operation. These include the Apache web server, the Tomcat application server running several web applications, Scalix LDAP service and the primary mail store which is accessed via IMAP. When problems are experienced by SWA users that are systemic in nature (that is, not a specific error related to performing one specific operation or referencing one specific message) it can be hard to pin down the source of the trouble. The many interacting subsystems, each with its own set of logs and configuration, present significant problem diagnosis challenges. The purpose of this document is to provide guidance on how to proceed and to collect together all the techniques that have proven useful in collecting diagnostic information from production SWA deployments.

Architecture

Beginning at the SWA user's browser, the system is architected as follows: The browser fetches static content and javascript code from a web server. Once the browser-resident javascript code is running, it too sends requests to the web server. The web server is the standard Apache 2.0 or 2.2 running on the machine that hosts the SWA service (which we will call the front-end server). Apache's job is to handle client HTTP connections, to perform SSL processing and to send the client requests on to Tomcat. TCP connections made by users' browsers to Apache are on port 80 or 443 (for SSL). Tomcat is essentially a web server, but with the special capability to run Java programs in-process. Tomcat itself is written in Java. We use Apache, rather than Tomcat, to handle the end-user browser connections because Apache has better scaling and security than Tomcat, and because SSL is more efficiently handled by Apache. Communication between Apache and Tomcat is via the AJP protocol which is similar but not identical to HTTP. The use of AJP allows the number of connections made to Tomcat to be kept reasonably low. This is because AJP, unlike HTTP, supports connection re-use for requests from different clients. AJP connections between Apache and Tomcat use port 8009. Tomcat is a web server written in the Java language and supporting the servlet interface. A servlet is a bundle of Java code that accepts HTTP requests and processes them to produce responses. The servlet concept has been expanded over time to include various associated aspects of application deployment such as packaging and the specification of URL paths, leading to the definition of the 'web application'. Although there are many web applications running in Tomcat, two are of primary importance for SWA : the swa server (called 'webmail') and the scalix messaging services, or platform (called 'api'). The webmail servlet is responsible for serving all SWA static content : images, html pages, css files and javascript. It also handles all AJAX (XmlHttpRequest) calls made by the browser-resident javascript. To perform certain operations the webmail servlet will in turn make a request to the messaging services servlet. The use of messaging services is optional. The default configuration is to use messaging services. In smaller deployments the messaging services servlet running on the same machine as webmail will be used (hence running in the same Tomcat instance). However is larger deployments it is possible that multiple messaging services web applications have been deployed, on several machines. Both webmail and messaging services in turn connect to LDAP and IMAP servers to access user information and message store content. All the subsystems : Apache, Tomcat, webmail, messaging services, IMAP and LDAP must be available and working properly in order for SWA to function correctly.

Problem Diagnosis Methodology

The best approach when presented with a problem reported by SWA users is to begin at the client and work progressively 'inwards' towards the IMAP service, examining service health at each subsystem along the way. Techniques for the analysis of each subsystem are discussed below:

Javascript Client

SWA's browser-resident Javascript code has a diagnostic logging capability. This feature is disabled by default for security and performance reasons. Follow these instructions to enable client diagnostic logging.

Apache

First check that Apache is running. Use:

/etc/init.d/httpd status

Then verify that Apache is serving basic static web pages. Point a browser at :

http://<server>/

This should display the default Apache home page, or if Apache has been configured to do so, a redirect to /webmail You can also use the telnet command to verify that Apache is accepting connections on ports 80 and 443:

telnet <server> 80

Next, examine Apache's log files. These are stored at /var/log/httpd:

[root@goat bin]# ls -lt /var/log/httpd
total 5268
-rw-r--r-- 1 root root 1173498 Feb 20 00:08 access_log
-rw-r--r-- 1 root root    2458 Feb 20 00:06 error_log
-rw-r--r-- 1 root root     316 Feb 15 04:02 error_log.1
-rw-r--r-- 1 root root    4015 Feb  8 04:02 error_log.2
-rw-r--r-- 1 root root 4048819 Feb  5 02:53 access_log.1
-rw-r--r-- 1 root root     947 Feb  1 04:02 error_log.3
-rw-r--r-- 1 root root  125984 Jan 31 16:25 access_log.2
-rw-r--r-- 1 root root     739 Jan 28 23:00 access_log.3
-rw-r--r-- 1 root root     316 Jan 25 04:02 error_log.4
-rw-r--r-- 1 root root     739 Jan 14 11:36 access_log.4

If SSL is used there will be a second set of log files with the ssl_ prefix. access_log and ssl_access_log contain a log line for each completed request made by clients. This information can be useful to verify that clients are successfully contacting Apache, and to see what type of requests they are making. error_log and ssl_error_log will generally be empty on a server that has no problems, except for the occasional message like these:

[Sun Feb 15 04:02:04 2009] [notice] Apache/2.2.3 (CentOS) configured -- resuming normal operations
[Thu Feb 19 21:17:49 2009] [error] [client 69.145.82.247] File does not exist: /var/www/html/favicon.ico
[Thu Feb 19 21:23:50 2009] [error] [client 69.145.82.247] File does not exist: /var/www/html/favicon.ico
[Thu Feb 19 21:46:32 2009] [error] [client ::1] Directory index forbidden by Options directive: /var/www/html/

If there are large numbers of messages, or messages indicating errors have occurred , these may be useful in diagnosing the problem and should be retained for analysis. For example, here are some messages showing that Tomcat was unavailable:

[Wed Feb 04 03:35:37 2009] [error] proxy: AJP: disabled connection for (xxxxxxx)
[Wed Feb 04 03:48:18 2009] [error] (111)Connection refused: proxy: AJP: attempt to connect to xx.xx.xx.xx:8009 (xxxxxxx) failed
[Wed Feb 04 03:48:18 2009] [error] ap_proxy_connect_backend disabling worker for (xxxxxxx)
[Wed Feb 04 03:48:18 2009] [error] proxy: AJP: failed to make connection to backend: xxxxxxx
[Wed Feb 04 03:49:59 2009] [error] (111)Connection refused: proxy: AJP: attempt to connect to xx.xx.xx.xx:8009 (xxxxxxx) failed
[Wed Feb 04 03:49:59 2009] [error] ap_proxy_connect_backend disabling worker for (xxxxxxx)
[Wed Feb 04 03:49:59 2009] [error] proxy: AJP: failed to make connection to backend: xxxxxxx

The most common types of Apache error seen are:

# Failure to contact Tomcat via AJP, meaning that Tomcat is either down or unresponsive.
# Connection overlimit errors, meaning that for some reason operations are taking a long time to process, leading to more and more connections accumulating under high load.
# Operations errors in Apache 2.0 (RHEL4) AJP proxy code.

It's useful if any error messages can be correlated in time with the specific problem under investigation having occurred. There may be error messages present that have nothing to do with the problem at hand. Check the number of Apache processes running. A large number (hundreds) would be unusual. Use:

ps -ef | grep httpd | wc -l

Also check the number of sockets open on the Apache ports. Use:

netstat --numeric-ports -t -p | grep ":80 "

and

netstat --numeric-ports -t -p | grep ":443 "

Large numbers of open connections (more than one per active user), or large numbers of sockets in unconnected states (TIME_WAIT etc) might indicate trouble.

Tomcat

First check that Tomcat is running:

[root@goat /]# /etc/init.d/scalix-tomcat status
Instance (goat) is not running

You should also see the Tomcat process (its executable is 'java'). Its pid will be useful later:

[root@goat /]# ps -ef | grep java
root     13136     1  0 Feb05 pts/3    00:20:43 /usr/java/jre1.5.0_13/bin/java -server -Djava.net.preferIPv4Stack=true 
-Xms256m -Xmx256m -Dscalix.instance=/var/opt/scalix/gt -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager 
-Djava.util.logging.config.file=/var/opt/scalix/gt/tomcat/conf/logging.properties 
-Djava.endorsed.dirs=/var/opt/scalix/gt/tomcat/common/endorsed -classpath
/usr/java/jre1.5.0_13/lib/tools.jar:/var/opt/scalix/gt/tomcat/bin/bootstrap.jar:/var/opt/scalix/gt/tomcat/bin/commons-logging-api.jar 
-Dcatalina.base=/var/opt/scalix/gt/tomcat -Dcatalina.home=/var/opt/scalix/gt/tomcat 
-Djava.io.tmpdir=/var/opt/scalix/gt/tomcat/temp org.apache.catalina.startup.Bootstrap start

Webmail

Messaging Services

IMAP