All Services aborted - Help! [solved]

Discuss the Scalix Server software

Moderators: ScalixSupport, admin

mephisto

All Services aborted - Help! [solved]

Postby mephisto » Mon Mar 13, 2006 4:34 am

Hi,

after a power outage I'm getting the following on my Scalix 10 server:

Code: Select all

omstat -a
PC Monitor                    Aborted        NON-STOP       0
Directory Relay Server        Aborted        02.20.06
Notification Server           Aborted        02.20.06       0
Shared memory daemon          Aborted        NON-STOP
Notification Monitor          Aborted        NON-STOP
Session Monitor               Aborted        NON-STOP
Container Access Monitor      Aborted        NON-STOP
Item Structure Server         Stopped
Database Monitor              Aborted        02.20.06
Licence Monitor Daemon        Aborted        NON-STOP
LDAP Daemon                   Aborted        02.21.06
Queue Manager                 Aborted        NON-STOP
Item Delete Daemon            Aborted        NON-STOP
IMAP Server Daemon            Aborted        02.20.06
SMTP Relay                    Aborted        02.20.06
Mime Browser Controller       Aborted        02.20.06


The only thing in /var/opt/scalix/logs is

Code: Select all

SERIOUS ERROR           Administration(omstat        ) Mon Mar 13 09:27:04 2006
[OM.DMON 1013] The queue manager is not running.
Pid of logging process: 24050


SERIOUS ERROR           Administration(omstat        ) Mon Mar 13 09:27:04 2006
[OM.DMON 1013] The queue manager is not running.
Pid of logging process: 24052


SERIOUS ERROR           Administration(omstat        ) Mon Mar 13 09:27:04 2006
[OM.DMON 1013] The queue manager is not running.
Pid of logging process: 24054


How can I diagnose this any further? Please help me.

Thanks,

Mephisto
Last edited by mephisto on Mon Mar 13, 2006 5:18 am, edited 1 time in total.

mephisto

Postby mephisto » Mon Mar 13, 2006 5:18 am

OK, I found the culprit. It was a stale lockfile: /var/opt/scalix/tmp/omrc.running
After I deleted it the system came up fine. Shouldn't there be some kind of error-checking for cases like this? I'm running CentOS, is there any init-script I safely add a deletion command for such a stale lockfile?

ScalixSupport
Scalix
Scalix
Posts: 5503
Joined: Thu Mar 25, 2004 8:15 pm

Postby ScalixSupport » Mon Mar 13, 2006 6:50 am

Hi,

glad you got it working so quick, good job. This is a little bit of a philosophical question as our scripts are used to orderly shut down and start the server. If there is a power outage and sudden restart of the system, I am not sure how much more resilience we can put into the system. In fact, human intervention is almost always needed to resolve things like that (UPS help, too;-). I'll take it to dev and see if we can close that window even further.

Cheers,

Sascha.

ScalixSupport
Scalix
Scalix
Posts: 5503
Joined: Thu Mar 25, 2004 8:15 pm

Postby ScalixSupport » Mon Mar 13, 2006 11:03 am

This file is just a check to ensure another omrc is not running rather than the whole instance. It gets deleted during omrc processing.

In a failover scenario, i.e. power outage and failover to another server, our startup scripts work quite happily.

What you've found is a corner case where things error just as omrc is being run and that "lock" file remains.

Cheers

Dave


Return to “Scalix Server”



Who is online

Users browsing this forum: No registered users and 10 guests

cron