Problems With Server Backup (possible solutions)

Discuss the Scalix Server software

Moderators: ScalixSupport, admin

burhankhalid
Posts: 137
Joined: Mon Dec 19, 2005 8:31 am

Problems With Server Backup (possible solutions)

Postby burhankhalid » Thu Nov 09, 2006 5:22 am

Hello:

After reading the forums and the wiki, I decided to try a full backup and restore operation to simulate a hardware failure -- the new machine would have the same hostname, different IP address. During this process, we also updated our DNS records to reflect additional IP addresses from our ISP.

The install process worked very well, as I copied the mailstore from one machine to the other and then installed the scalix components, and to my amazement, everything started working right from the start! Then, problems started arising, some I managed to fix, other still linger.

Since IP addresses changed, I had to update smtpd.cfg to reflect the new changes, before this -- all emails were being rejected as relaying attempts.

First Problem:

A flood of messages like this started filling up my maillog:
Nov 9 05:36:38 avalon sendmail[2362]: kA92acAu002358: smtpquit: mailer scalix exited with exit value 1
Nov 9 05:36:38 avalon sendmail[2362]: kA92acAu002358: to=<foo@mydomain.com>, delay=00:00:00, xdelay=00:00:00, mailer=scalix, pri=122055, relay=avalon, dsn=4.0.0, stat=Deferred: 451 Message rejected by OM.UX Gateway on 'avalon.mydomain.com': Fatal error encountered


I don't know if this fixed it, but I realized that spamassassin was not setup on the new machine, and so I went back into smtpd.cfg and disabled the filter setting (by commenting it out).

Second problem:

Nov 9 04:37:32 avalon sendmail[13666]: kA90bmH5004887: kA91bRfH013666: DSN: Local configuration error


This error kept popping up various times, and I think this was because /etc/mail/local-host-names was blank (whoops). I edited that, and it seems to have solved this.

Third problem:

The SAC stopped working unexpectedly -- error was 'Administration server cannot be contacted, please check server logs'. In logs, I found only these errors, I don't know enough to know if they are significant or not:

2006-11-09 11:05:46,216 INFO [RESNamedInstances.initNamedInstances:61] Loading Scalix named instances.
2006-11-09 11:05:46,228 INFO [RESNamedInstances.initNamedInstances:100] Finised reading Named Instances. Total Named instances in the global config file are = 1
2006-11-09 11:05:46,230 INFO [RESNamedInstances.initNamedInstances:101] Checking Named Instances state..
2006-11-09 11:05:46,237 DEBUG [CmdExecution.executeCmd:138] ENVIRONMENT: LANG=en_US.UTF-8 OM_CHAR=UTF-8
2006-11-09 11:05:46,239 DEBUG [CmdExecution.executeCmd:140] COMMAND: /opt/scalix/bin/sxchkinstances
2006-11-09 11:05:46,323 INFO [RESNamedInstances.initNamedInstances:116] avalon - is an Active named instance
2006-11-09 11:05:46,329 INFO [RESNotifier.startRegistrationThread:80] Launching Registration-thread
2006-11-09 11:05:46,331 DEBUG [RESDispatcherServlet.initialize:373] : Almost finished initialization, now waiting for registration
2006-11-09 11:05:46,384 INFO [RegistrationEventPoller.run:113] Added Registration event to the Notification Event Queue queue:Event=[register|http://avalon.am-ul.com:80/res/RESDispatcher|LISTEN|300|avalon|avalon.am-ul.com|10.0.1] (try 100 retries)
2006-11-09 11:05:46,384 DEBUG [RegistrationEventPoller.run:116] Notified monitor queue that is has a new event
2006-11-09 11:05:46,387 DEBUG [RegistrationEventPoller.run:121] Sleeping for 5000 ms before trying again
2006-11-09 11:05:56,394 ERROR [Notifier.sendNotification:120] java.net.SocketTimeoutException: Receive timed out
2006-11-09 11:06:01,398 DEBUG [Notifier.sendNotification:132] Notifier-thread-0/ntries=4/Event=[register|http://avalon.am-ul.com:80/res/RESDispatcher|LISTEN|300|avalon|avalon.am-ul.com|10.0.1]
2006-11-09 11:06:01,399 INFO [RegistrationEventPoller.run:113] Added Registration event to the Notification Event Queue queue:Event=[register|http://avalon.am-ul.com:80/res/RESDispatcher|LISTEN|300|avalon|avalon.am-ul.com|10.0.1] (try 100 retries)


This error repeats for a while, followed by the following which I suspect is the real error:

2006-11-09 11:12:10,432 DEBUG [CmdExecution.executeCmd:138] ENVIRONMENT: LANG=en_US.UTF-8 OM_CHAR=UTF-8 OMCURRENT=avalon
2006-11-09 11:12:10,433 DEBUG [CmdExecution.executeCmd:140] COMMAND: /opt/scalix/bin/omshowrt -q all
2006-11-09 11:12:42,419 DEBUG [HeartBeatEventPoller.run:67] Added heartbeatEventEvent=[heartbeat|http://avalon.am-ul.com:80/res/RESDispatcher|LISTEN|300|avalon|avalon.am-ul.com|10.0.1] to the Notification Event Queue
2006-11-09 11:12:42,421 DEBUG [Notifier.sendNotification:130] Notifier-thread-1/ACK/Event=[heartbeat|http://avalon.am-ul.com:80/res/RESDispatcher|LISTEN|300|avalon|avalon.am-ul.com|10.0.1]
2006-11-09 11:13:10,477 DEBUG [RESDispatcherServlet.dumpResponse:285] ---> RES Sending Response XML Document <-----
2006-11-09 11:13:10,478 DEBUG [RESDispatcherServlet.dumpResponse:292] <?xml version="1.0" encoding="UTF-8"?>
<ResResponse>
<Command name="omshowrt">
<Status>FAILED:3</Status>
<Output>
<Line value="omshowrt : [OM 8001] Someone else is currently configuring Scalix - please try again"/>
<Line value=""/>
</Output>
</Command>
</ResResponse>


However, there is no one else on the system, as its freshly installed with a new password. I decided to reconfigure the component from the installer, but that failed, I don't remember the exact error code, but the error was regarding omshowmn (returned status code 3). When I tried the command from a shell, it would just hang indefinately. I restarted scalix, and then I was able to reconfigure SAC and RES and things are operational again.

Question: How can I find out why the Administration Server is hanging so I can prevent this from happening again?

Return to “Scalix Server”



Who is online

Users browsing this forum: No registered users and 2 guests

cron