Difference between revisions of "TB/TB-2007-02-HACLUSTER"

From Scalix Wiki
Jump to: navigation, search
 
 
(43 intermediate revisions by 2 users not shown)
Line 1: Line 1:
'''NOTE: This document is under construction. It also relies on some changes in the Scalix Installer that are not yet released as of 2007-04-13 - which is when we started working on this. Stay tuned, but don't trust this (yet!). Florian'''
+
[[TB]] -> '''TB-2007-02-HACLUSTER'''
 +
== Overview ==
 +
The most common high-availability setup is dual nodes in a clustered configuration, each running an active instance of Scalix and setup for mutual failover. With both nodes up and running, instance "mail1" runs on Node A while Instance "mail2" runs on Node B.
  
 +
In case of failure of either node, the active instance on that node fails over to the other node so that now the one remaining node is running both instances. From a user perspective, this happens in a transparent way because each instance is associated with a virtual hostname (instance hostname or service hostname) and IP address through which all user interaction takes place.
  
 +
[[Image:HowTos-HA-ScalixCluster.png]]
  
 +
The Scalix message store and database is kept on external shared storage. Physically, the storage can be accessed by both nodes at all times. However, each instance has it's separate message store, so the cluster software manages access to the storage in such a way that each message store is only accessed by the node that is running the associated instance.
  
 +
Various shared storage technologies such as multi-initiator SCSI or a SAN based on iSCSI or FibreChannel can be used. Scalix does not support the use of NFS for shared storage.
  
 +
Scalix Enterprise Edition software provides multi-instance and virtual hostname support. The actual cluster monitoring and failover as well as access to the storage and network address and hostname management is provided by clustering software. We have tested HA-Scalix to run with RedHat Cluster Suite, however it should be possible to create similar configurations with most other cluster products.
  
 +
== Prerequisites ==
 +
=== Software Versions ===
 +
* This document assumes you are running Scalix 11.0.4 or later on SuSE Linux Enterprise Server 10 or on RedHat Enterprise Linux 4
 +
=== Network Address and Hostname Layout ===
 +
Each of the nodes has one static IP address for itself. This is only used for admin purposes, it stays with the node, never fails over and is not used to access Scalix services.
  
 +
[[Image:HowTos-HA-ScalixCluster-Network.png]]
  
 +
In the example, we use the following IP addresses and hostnames:
  
== OLD STUFF ==
+
Node A                        Node B
== '''Overview''' ==
+
nodea.scalix.demo            nodeb.scalix.demo
 +
192.168.100.11                192.168.100.12
  
The most common multi-server setup is dual servers clustered for failover. In this scenario, server A fails over to Server B and vice versa. Each machine has an active physical instance (A or B) and  a virtual instance (A’ or B’) that takes the load if the other fails.
+
Each of the instances also has an associated IP address. This is used by end user access. It moves between the nodes with the instance. In the example, we use the following IP addresses and hostnames:
  
Both servers mount to a single shared storage solution and clustering software is responsible for relocating the Scalix instances among the machines in case of failure. This includes automated unmounting and mounting of the shared storage solution and automated shutdown and startup of the necessary Scalix services. The clustering software you can use includes:
+
Instance 1                    Instance 2
 +
mail1.scalix.demo            mail2.scalix.demo
 +
192.168.100.21                192.168.100.22
  
* RedHat Cluster Suite
+
All IP addresses and hostname should be registered in DNS so that they can be resolved forward and reverse. Also, on the cluster nodes, the IP address mappings should be recorded in the /etc/hosts config file so that the cluster does not depend on DNS availability.
* SUSE Heartbeat
+
* Veritas Cluster Server
+
  
Each instance should be a complete vertical stack, with the Scalix Server, Postgres database, Management Agent, Tomcat and Search and Index Server packages installed. And connectivity from either of the servers to the shared storage must be through direct means such as scsc cable, fiber channel or ISCSI. NFS is not recommended or supported at this time.
+
=== Storage Configuration ===
 +
Each of the instances needs it's own dedicated storage area. Scalix recommends the use of LVM as this enables filesystem resizing as well as the ability to snapshot a filesystem for backup purposes. From an LVM point of view, each instance should have it's own LVM Volume Group (VG) as this VG can only be activated and accessed by one of the hosts at a time. For the example configuration, we will use a simple disk configuration as follows:
  
'''Alert:''' There are three distinct components to the high availability setup outlined here. These steps and Scalix as a company are only responsible for one: Multi-instance awareness. The others are the responsibility of the user or the third-party provider.
+
[[Image:ScalixCluster-SharedDisk.png]]
  
* Shared storage (customer responsibility)
+
[[Image:ScalixCluster-LVM.png]]
* Server relocation software (customer responsibility)
+
* Multi-instance (Scalix responsibility)
+
  
Setting up a dual-server failover cluster requires the following distinct steps. More detailed sample instructions are provided below.
 
* Plan for both physical and virtual hostnames and IP addresses. For example:
 
    Physical server with IP address 10.17.96.55 = mail1
 
    Physical server with IP address 10.17.96.56 = mail2
 
    Virtual server IP address 10.17.96.57 = virtual1
 
    Virtual server IP address 10.17.96.58 = virtual2
 
The virtual instances  will be accessed by the email clients and referenced during configuration of the Scalix server software.
 
* Build the two as if they are simply two servers within the same organization. When installing the operating system, use the virtual hostname and IP address.
 
* Using the operating system’s DNS configuration process, add the virtual instances to the DNS just as you would a physical server.
 
* Install Scalix on both servers. One should have the Management Console installed and the other should not. For more on how to install without the Management Console, see the Scalix Installation Guide section about custom installations.
 
* Disable automatic startup of services.
 
* Change the name of the server back to the name of the physical server, still retaining the physical server’s IP address.
 
* Install and configure the cluster software, which recognizes the physical servers as real servers, and the virtual instances as services.
 
* Fix the instance.conf file
 
  
== Sample Setup Instructions ==
+
We take two disks or LUNs on the shared storage for each instance, delete any partitions currently on the disks, then create a volume group and a logical volume for the message store and Scalix data area. Also you will need to initialize the filesystem on the logical volume:
  
The instructions outlined below are provided as an example only. They are specific to RedHat. If you have a different operating system, use them simply as guidelines.
+
* '''On Node A:'''
 +
[root@nodea ~]# dd if=/dev/zero of=/dev/sdd bs=512 count=1
 +
  1+0 records in
 +
  1+0 records out
 +
[root@nodea ~]# pvcreate /dev/sdd
 +
  Physical volume "/dev/sdd" successfully created
 +
[root@nodea ~]# vgcreate vgmail1 /dev/sdd
 +
  Volume group "vgmail1" successfully created
 +
[root@nodea ~]# lvcreate -L 140G -n lvmail1 vgmail1
 +
  Logical volume "lvmail1" created
 +
[root@nodea ~]# vgscan
 +
  Reading all physical volumes. This may take a while...
 +
  Found volume group "vgmail1" using metadata type lvm2
 +
  Found volume group "VolGroup00" using metadata type lvm2
 +
[root@nodea ~]# mkfs.ext3 /dev/vgmail1/lvmail1
 +
  mke2fs 1.35 (28-Feb-2004)
 +
  ...
 +
  Writing superblocks and filesystem accounting information: done
 +
[root@nodea ~]# vgchange -a y vgmail1
 +
  1 logical volume(s) in volume group "vgmail1" now active
 +
[root@nodea ~]# vgchange -c y vgmail1
 +
  Volume group "vgmail1" successfully changed
  
Perform all of these steps on both servers.
+
* '''On Node B:'''
 +
[root@nodeb ~]# dd if=/dev/zero of=/dev/sde bs=512 count=1
 +
  1+0 records in
 +
  1+0 records out
 +
[root@nodeb ~]# pvcreate /dev/sde
 +
  Physical volume "/dev/sde" successfully created
 +
[root@nodeb ~]# vgcreate vgmail2 /dev/sde
 +
  Volume group "vgmail2" successfully created
 +
[root@nodeb ~]# lvcreate -L 140G -n lvmail2 vgmail2
 +
  Logical volume "lvmail2" created
 +
[root@nodea ~]# vgscan
 +
  Reading all physical volumes.  This may take a while...
 +
  Found volume group "vgmail2" using metadata type lvm2
 +
  Found volume group "VolGroup00" using metadata type lvm2
 +
[root@nodeb ~]# mkfs.ext3 /dev/vgmail2/lvmail2
 +
  mke2fs 1.35 (28-Feb-2004)
 +
  ...
 +
  Writing superblocks and filesystem accounting information: done
 +
[root@nodea ~]# vgchange -a y vgmail1
 +
  1 logical volume(s) in volume group "vgmail1" now active
 +
[root@nodea ~]# vgchange -a y vgmail2
 +
  1 logical volume(s) in volume group "vgmail2" now active
 +
[root@nodea ~]# vgchange -c y vgmail1
 +
  Volume group "vgmail1" successfully changed
 +
[root@nodea ~]# vgchange -c y vgmail2
 +
  Volume group "vgmail2" successfully changed
  
To set up a dual-server failover cluster for Scalix:
+
* '''On Node A:'''
 +
[root@nodea ~]# vgchange -a y vgmail2
 +
  1 logical volume(s) in volume group "vgmail2" now active
 +
[root@nodea ~]# vgchange -c y vgmail2
 +
  Volume group "vgmail2" successfully changed
  
1. Build the two servers as if they are simply two servers within the same organization. When installing the operating system, use the virtual hostname and IP address.
 
  
2. Using the operating system’s DNS configuration process, add the virtual instances to the DNS just as you would a physical server.
+
Next, you need to create the mountpoints. For this, you need to determine your instance's root directory. The pathname for this will be
To add the virtual instance, launch the RedHat Network Configuration window to change the name of the server from the name of the physical machine to the name of the virtual instance.
+
/var/opt/scalix/az
    # system-config-network
+
where '''az''' is the first and the last letter of your instance name, e.g. if your instances are named mail1 and mail2, your mountpoints will be
In the window that launches, select the DNS tab and enter the name of the virtual instance, then save and exit the Network Configuration window.
+
/var/opt/scalix/m1
For other operating systems, use KDE or the GNOME network setting GUI.
+
and
 +
/var/opt/scalix/m2
 +
respectively.
  
3. Back at the command line, edit the /etc/hosts file to enter the IP address of the physical server.
+
Create those mountpoints on both nodes:
    # vi etc/hosts
+
* '''On Node A:'''
    # ifconfig
+
[root@nodea init.d]# mkdir -p /var/opt/scalix/m1
Type in the IP address of the physical server, followed by the virtual domain name and the virtual name. For example:
+
[root@nodea init.d]# mkdir -p /var/opt/scalix/m2
    10.17.120.52  virtual1.scalixdemo.com  virtual1
+
* '''On Node B:'''
Save the new information and exit vi.
+
[root@nodeb init.d]# mkdir -p /var/opt/scalix/m1
 +
[root@nodeb init.d]# mkdir -p /var/opt/scalix/m2
  
4. Run a check to ensure that the physical server is functioning with the virtual IP address.
+
== Cluster Software Setup ==
    # ifconfig
+
=== RedHat Cluster Suite (RHCS) ===
 +
==== Installation ====
 +
Refer to RedHat documentation for Hardware Prerequisites and basic cluster software installation. Watch out for the following items:
 +
* Install the cluster software on both nodes.
 +
* Run system-config-cluster on the first node, after performing the basic configuration, copy the cluster.conf file to the other node (as described in the cluster documentation).
 +
* You can use the default Distributed Lock Manager (DLM). You don't need the Global Lock Manager.
 +
* If you have more than one cluster on the same subnet, make sure to modify the default cluster name "alpha_cluster". This can be found in the cluster.conf file.
 +
==== Creating the Scalix service ====
 +
The service that represents the Scalix instance to the cluster should be created before starting the Scalix installation. This will provide the virtual IP address and shared storage mountpoint so that Scalix can be installed using this. To create the Scalix service, follow these steps:
  
5. Install Scalix on both servers. One should have the Management Console installed and the other should not. For more on how to install without the Management Console, see the Scalix Installation Guide section about custom installations. Before moving on, check that Scalix is up and running.
+
1. Make sure the cluster software is started on both nodes:
    # omstat -s
+
service ccds start; service cman start; service rgmanager start; service fenced start
  
6. Disable automatic startup of services. This requires a stop first.
+
2. Using system-config-cluster, add the services for both mail1 and mail2:
 +
* Use the instance name,"mail1" as the service name
 +
* Uncheck "Autostart this Service" and set recovery policy to "Disable"
 +
* Create a new resource for the service to represent the virtual IP address defined for the instance
 +
* Create a new resource for the service to represent the filesystem and mountpoint:
 +
Name:              mail1fs
 +
File System Type:  ext3
 +
Mountpoint:        /var/opt/scalix/m1
 +
Device:            /dev/vgmail1/lvmail1
 +
Options:         
 +
File System ID:   
 +
Force Unmount:    Checked
 +
Reboot host node:  Checked
 +
Check file system: Checked
  
To stop the services:
+
3. Bring up the services on the respective primary node:
      # service ccsd stop
+
clusvcadm -e mail1 -m nodea.scalixdemo.com
      # service rgmanager stop
+
clusvcadm -e mail2 -m nodeb.scalixdemo.com
      # service cman stop
+
      # service fenced stop
+
  
To disable Scalix from restarting upon reboot:
+
4. At this point, it's good to check if the IP address and filesystem resources are available.
      # chkconfig -level 35 scalix off
+
* '''On Node A:'''
      # chkconfig -level 35 scalix-tomcat off
+
[root@nodea ~]# clustat
      # chkconfig -level 35 scalix-postgres off  
+
  Member Status: Quorate
Alert: If you do not complete this step, you may corrupt your installation.
+
  
7. Reboot the server.
+
  Member Name                              Status
    shutdown -r now
+
  ------ ----                              ------
 +
  nodea.scalixdemo.com                    Online, Local, rgmanager
 +
  nodeb.scalixdemo.com                    Online, rgmanager
  
8. Verify that the physical server is now offline and the virtual service is up and running.
+
  Service Name        Owner (Last)                  State       
    # clustat
+
  ------- ----        ----- ------                  -----       
 +
  mail1                (nodea.scalixdemo.com)        disabled       
 +
  mail2                (nodeb.scalixdemo.com)        disabled       
 +
 +
[root@nodea ~]# mount
 +
...
 +
/dev/mapper/vgmail1-lvmail1 on /var/opt/scalix/m1 type ext3 (rw)
 +
 +
[root@nodea ~]# ip addr show
 +
...
 +
2: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 1000
 +
    link/ether 00:06:5b:04:f4:25 brd ff:ff:ff:ff:ff:ff
 +
    inet 169.254.0.51/24 brd 169.254.0.255 scope global eth0
 +
    inet 169.254.0.61/32 scope global eth0
 +
    inet6 fe80::206:5bff:fe04:f425/64 scope link
 +
      valid_lft forever preferred_lft forever
  
9. Rename the server back to the name of the physical server, still retaining the physical server’s IP address. For instructions, see steps 2 and 3 above. In this case, you remove the entry in the etc/hosts file that you created in step 3, then save and exit vi.
+
== Scalix Setup ==
    # vi /etc/hosts
+
=== Installation ===
 +
You can now install Scalix. Follow the steps documented in the Scalix Installation Guide, with the follwing change:
 +
* When starting the installer, specify both the instance name and the fully-qualified hostname on the command line:
  
10. Reboot the server again.
+
* '''On Node A:'''
    # shutdown -r now
+
[root@nodea ~]# ./scalix-installer --instance=mail1 --hostname=mail1.scalixdemo.com
 +
* Select 'Typical install'
  
11. Restart the cluster.
+
* '''On Node B:'''
    # service ccsd start
+
[root@nodeb ~]# ./scalix-installer --instance=mail2 --hostname=mail2.scalixdemo.com
    # service rgmanager start
+
* Select 'Typical install'
    # service cman start
+
    # service fenced start
+
  
12. Run another check to ensure that the physical server once again is functioning with the physical IP address.
+
* When asked for the 'Secure Communications' password, make sure to use the '''same''' password on both nodes.
    # ifconfig
+
=== Post-Install Tasks ===
 +
==== Setting up Scalix Management Services ====
 +
One of the instances needs to be nominated as the Admin instance ("Ubermanager"). The other instances will be managed through this. In this example, we assume that "mail1" will be the admin instance. For the instances not running the admin server, follow these steps:
 +
* De-register the Management Console and Server from the node:
 +
[root@nodeb ~]# /opt/scalix-tomcat/bin/sxtomcat-webapps --del mail2 caa sac
 +
* Reconfigure the Management Agent to report to the Admin instance:
 +
[root@nodeb ~]# ./scalix-installer --instance=mail2 --hostname=mail2.scalixdemo.com
 +
* Select "Reconfigure Scalix Components"
 +
* Select "Scalix Management Agent"
 +
* Enter the fully-qualified virtual hostname of the Admin instance when asked for the host where Management Services are installed.
 +
* Re-enter the 'Secure Communitcations' password. Make sure you use the same value as above.
 +
* Don't opt to Create Admin Groups
 +
==== Setting the Tomcat shutdown port ====
 +
Each instance across the cluster needs to have a cluster-wide unique Tomcat shutdown port number. By default, all instances are setup to use 8005. On all but one instance, this has to be modified. Use the following steps:
 +
* Set a unique shutdown port number with the following command:
 +
[root@nodeb ~]# /opt/scalix-tomcat/bin/sxtomcat-modify-instance -p 8006 mail2
 +
* Restart Tomcat
 +
[root@nodeb ~]# service scalix-tomcat restart
 +
==== Disabling Scalix Auto-Start ====
 +
The Scalix services needs to be excluded from starting when the system boots before they can be integrated into the cluster. First, shutdown Scalix manually. To do this, execute the following commands on '''all''' nodes:
 +
[root@node ~]# service scalix-tomcat stop
 +
[root@node ~]# service scalix-postgres stop
 +
[root@node ~]# service scalix stop
 +
Remove the Scalix services from the system auto-start configuration. Again, this must be done on all nodes:
 +
[root@node ~]# chkconfig --del scalix
 +
[root@node ~]# chkconfig --del scalix-tomcat
 +
[root@node ~]# chkconfig --del scalix-postgres
 +
==== Registering all instances on all nodes ====
 +
Each instance is only registered on the node where it was created. For clustered operations, all instances need to be registered on all nodes. Instance registration information is kept in the /etc/opt/scalix/instance.cfg config file. You need to merge the contents of these files and install the combined file on all nodes. At the same time, you should also disable instance autostart in the registration file. In the example, the file should look like this on all nodes:
  
13. Install and configure the cluster software, which recognizes the physical servers as real servers, and the virtual instances as services. Configure the cluster software to include the two physical machines, and then configure the virtual instance of Scalix with your cluster configuration software program. This typically consists of:
+
OMNAME=mail1
* An IP address
+
OMHOSTNAME=mail1.scalixdemo.com
* A shared file system
+
OMDATADIR=/var/opt/scalix/m1/s
* A script
+
OMAUTOSTART=FALSE
 +
#
 +
OMNAME=mail2
 +
OMHOSTNAME=mail2.scalixdemo.com
 +
OMDATADIR=/var/opt/scalix/m2/s
 +
OMAUTOSTART=FALSE
 +
==== Setting up Apache integration on all nodes ====
 +
===== Registering the Rules Wizard with the virtual host =====
 +
For the web-based Scalix Rules Wizard to be available, symbolic links must be created in the Apache config directories. Execute the following commands:
 +
* '''On Node A:'''
 +
[root@nodea ~]# cd /etc/opt/scalix-tomcat/connector
 +
[root@nodea connector]# cp /opt/scalix/global/httpd/scalix-web-client.conf jk/app-mail1.srw.conf
 +
[root@nodea connector]# cp /opt/scalix/global/httpd/scalix-web-client.conf ajp/app-mail1.srw.conf
 +
* '''On Node B:'''
 +
[root@nodeb ~]# cd /etc/opt/scalix-tomcat/connector
 +
[root@nodeb connector]# cp /opt/scalix/global/httpd/scalix-web-client.conf jk/app-mail2.srw.conf
 +
[root@nodeb connector]# cp /opt/scalix/global/httpd/scalix-web-client.conf ajp/app-mail2.srw.conf
  
14. On both machines, define the virtual instances to Scalix by editing the /etc/opt/scalix/<instance.cfg> and changing the omname to the actual virtual machine name, then changing the value of autostart to FALSE.
+
===== Registering all workers for mod_jk =====
 +
All workers for Apache mod_jk.so must be registered in /etc/opt/scalix-tomcat/connector/jk/workers.conf. In the example, the file should look like this on all nodes:
 +
JkWorkerProperty worker.list=mail1,mail2
  
15. Relocate the information store to a shared directory so that it can switch back and forth between the two clustered servers.
+
===== Copying the Apache Tomcat connector config files =====
 +
All files in /etc/opt/scalix-tomcat/connector/ajp and /etc/opt/scalix-tomcat/connector/jk must now be copied between all nodes so that those directories have the same contents on all nodes in the cluster. In the example, the directories must contain these files on both nodes:
  
    # cd ~/mail1 (or mail2)
+
[root@nodea connector]# ls -R *
    # mkdir temp
+
ajp:
    # mv /mail1* ./temp (or mail2)
+
app-mail1.api.conf  app-mail1.srw.conf      app-mail2.srw.conf
 +
app-mail1.caa.conf  app-mail1.webmail.conf  app-mail2.webmail.conf
 +
app-mail1.m.conf    app-mail2.api.conf      instance-mail1.conf
 +
app-mail1.res.conf  app-mail2.m.conf        instance-mail2.conf
 +
app-mail1.sac.conf  app-mail2.res.conf
 +
app-mail1.sis.conf  app-mail2.sis.conf
 +
 +
jk:
 +
app-mail1.api.conf  app-mail1.srw.conf      app-mail2.webmail.conf
 +
app-mail1.caa.conf  app-mail1.webmail.conf  instance-mail1.conf
 +
app-mail1.m.conf    app-mail2.api.conf      instance-mail2.conf
 +
app-mail1.res.conf  app-mail2.m.conf        workers.conf
 +
app-mail1.sac.conf  app-mail2.res.conf
 +
app-mail1.sis.conf  app-mail2.sis.conf
  
16. Relocate the virtual instance to this machine.
+
===== Restarting Apache =====
    # clusvcadm -r <virtual hostname> -n <physical domain name>
+
After applying all apache configuration changes, Apache should be restarted on both nodes:
 +
service httpd restart
  
17. Move everything from temp to the shared disk.
+
== Scalix Cluster Integration ==
    # mv ./temp/* ./mail1
+
=== RedHat Cluster Suite (RHCS) ===
 +
t.b.d.
  
18. Edit the file /opt/scalix/global/config to change the omautostart value from true to false.
+
== Upgrading Scalix in the Cluster ==
    # vi /opt/scalix/global/config
+
For upgrade, the cluster should be in a healthy state, i.e. each node should be running one instance. While on the node, just follow the instructions to upgrade Scalix as per the install guide, however, specify instance and hostname on the installer command line, e.g.:
 
+
./scalix-installer --instance=mail --hostname=mai1.scalixdemo.com
19. Restart the virtual instance.
+
After performing the upgrade on a node hosting an instance which is not running the Management server, you will again need to disable SAC from running on this instance:
    # service scalix start <virtual server hostname>
+
* De-register the Management Console and Server from the node:
 
+
[root@nodeb ~]# /opt/scalix-tomcat/bin/sxtomcat-webapps --del mail2 caa sac
20. Telnet into the virtual instance to make sure it’s running
+
* Restart Tomcat
    # telnet <virtual server hostname> <port>
+
[root@nodeb ~]# service scalix-tomcat restart mail2
 
+
21. Stop Scalix.
+
    service scalix stop <hostname>
+
 
+
22. Copy the configuration files from each machine to the other so that both have the exact same files.
+
    cd /etc/opt/scalix-tomcat/connector/jk
+
    scp *-<hostname 1>.* <hostname 2>:/etc/opt/sclaix-tomcat/connector/jk
+
Then repeat, reversing the hostnames.
+
    scp mail1:/etc/opt/scalix-tomcat/connector/jk/*-virtual1.*
+
 
+
23. Edit the file workers.conf to include both mailnodes of the virtual hosts. Do this on both physical hosts.
+
    cd /etc/opt/scalix-tomcat/connector/jk/workers.conf
+
    vi workers.conf
+
 
+
24. Go to the following directory to make the shutdown port 8006.
+
    cd /var/opt/scalix/<nn>/tomcat/conf/server.xml
+
    vi server.xml
+
Look for the following value:
+
    Server port =”8005” shutdown=”SHUTDOWN”
+
Change the port number from 8005 to 8006.
+
While there, also change the short name of the instance to the fully qualified domain name in the three separate places that it appears.
+
 
+
25. Repeat the entire sequence on the other virtual machine.
+
 
+
26. Change the server names in the following file to the virtual machine names.
+
    cd /etc/httpd/conf.d
+
    vi scalix-web-client.conf
+
At the bottom of the file, notice that the diretory refers to the physical directory instead of the virtual. You’re going to move these lines to one of the tomcat.conf files.
+
 
+
Select the bottom seven lines and copy then delete, them.
+
 
+
    Alias /omhtml/ /var/opt/scalix/n1/s/omhtml/ <Directory "/var/opt/scalix/n1/s/omhtml">
+
        AllowOverride None
+
        Order allow,deny
+
        Allow from all
+
        AddDefaultCharset off
+
    </Directory>
+
 
+
27. Open the etc/opt/scalix-tomcat/connector/jk/instance-name1.conf
+
    cd /etc/opt/scalix-tomcat/connector/jk/app-virtual1.*.conf
+
Paste these seven lines in, beginning with the third line. The first nine lines of the modified instance-name1.conf file should look as follows.
+
    <VirtualHost virtual1.scalixdemo.com:80>
+
      Include /etc/opt/scalix-tomcat/connector/jk/app-virtual1.*.conf
+
    Alias /omhtml/ /var/opt/scalix/n1/s/omhtml/ <Directory "/var/opt/scalix/n1/s/omhtml">
+
        AllowOverride None
+
        Order allow,deny
+
        Allow from all
+
        AddDefaultCharset off
+
    </Directory>
+
Perform this step on both physcial hosts.
+

Latest revision as of 06:31, 5 May 2007

TB -> TB-2007-02-HACLUSTER

Overview

The most common high-availability setup is dual nodes in a clustered configuration, each running an active instance of Scalix and setup for mutual failover. With both nodes up and running, instance "mail1" runs on Node A while Instance "mail2" runs on Node B.

In case of failure of either node, the active instance on that node fails over to the other node so that now the one remaining node is running both instances. From a user perspective, this happens in a transparent way because each instance is associated with a virtual hostname (instance hostname or service hostname) and IP address through which all user interaction takes place.

HowTos-HA-ScalixCluster.png

The Scalix message store and database is kept on external shared storage. Physically, the storage can be accessed by both nodes at all times. However, each instance has it's separate message store, so the cluster software manages access to the storage in such a way that each message store is only accessed by the node that is running the associated instance.

Various shared storage technologies such as multi-initiator SCSI or a SAN based on iSCSI or FibreChannel can be used. Scalix does not support the use of NFS for shared storage.

Scalix Enterprise Edition software provides multi-instance and virtual hostname support. The actual cluster monitoring and failover as well as access to the storage and network address and hostname management is provided by clustering software. We have tested HA-Scalix to run with RedHat Cluster Suite, however it should be possible to create similar configurations with most other cluster products.

Prerequisites

Software Versions

  • This document assumes you are running Scalix 11.0.4 or later on SuSE Linux Enterprise Server 10 or on RedHat Enterprise Linux 4

Network Address and Hostname Layout

Each of the nodes has one static IP address for itself. This is only used for admin purposes, it stays with the node, never fails over and is not used to access Scalix services.

HowTos-HA-ScalixCluster-Network.png

In the example, we use the following IP addresses and hostnames:

Node A                        Node B
nodea.scalix.demo             nodeb.scalix.demo
192.168.100.11                192.168.100.12

Each of the instances also has an associated IP address. This is used by end user access. It moves between the nodes with the instance. In the example, we use the following IP addresses and hostnames:

Instance 1                    Instance 2
mail1.scalix.demo             mail2.scalix.demo
192.168.100.21                192.168.100.22

All IP addresses and hostname should be registered in DNS so that they can be resolved forward and reverse. Also, on the cluster nodes, the IP address mappings should be recorded in the /etc/hosts config file so that the cluster does not depend on DNS availability.

Storage Configuration

Each of the instances needs it's own dedicated storage area. Scalix recommends the use of LVM as this enables filesystem resizing as well as the ability to snapshot a filesystem for backup purposes. From an LVM point of view, each instance should have it's own LVM Volume Group (VG) as this VG can only be activated and accessed by one of the hosts at a time. For the example configuration, we will use a simple disk configuration as follows:

ScalixCluster-SharedDisk.png

ScalixCluster-LVM.png


We take two disks or LUNs on the shared storage for each instance, delete any partitions currently on the disks, then create a volume group and a logical volume for the message store and Scalix data area. Also you will need to initialize the filesystem on the logical volume:

  • On Node A:
[root@nodea ~]# dd if=/dev/zero of=/dev/sdd bs=512 count=1
 1+0 records in
 1+0 records out
[root@nodea ~]# pvcreate /dev/sdd
 Physical volume "/dev/sdd" successfully created
[root@nodea ~]# vgcreate vgmail1 /dev/sdd
 Volume group "vgmail1" successfully created
[root@nodea ~]# lvcreate -L 140G -n lvmail1 vgmail1
 Logical volume "lvmail1" created
[root@nodea ~]# vgscan
 Reading all physical volumes.  This may take a while...
 Found volume group "vgmail1" using metadata type lvm2
 Found volume group "VolGroup00" using metadata type lvm2
[root@nodea ~]# mkfs.ext3 /dev/vgmail1/lvmail1 
 mke2fs 1.35 (28-Feb-2004)
 ...
 Writing superblocks and filesystem accounting information: done
[root@nodea ~]# vgchange -a y vgmail1
 1 logical volume(s) in volume group "vgmail1" now active
[root@nodea ~]# vgchange -c y vgmail1
 Volume group "vgmail1" successfully changed
  • On Node B:
[root@nodeb ~]# dd if=/dev/zero of=/dev/sde bs=512 count=1
 1+0 records in
 1+0 records out
[root@nodeb ~]# pvcreate /dev/sde
 Physical volume "/dev/sde" successfully created
[root@nodeb ~]# vgcreate vgmail2 /dev/sde
 Volume group "vgmail2" successfully created
[root@nodeb ~]# lvcreate -L 140G -n lvmail2 vgmail2
 Logical volume "lvmail2" created
[root@nodea ~]# vgscan
 Reading all physical volumes.  This may take a while...
 Found volume group "vgmail2" using metadata type lvm2
 Found volume group "VolGroup00" using metadata type lvm2
[root@nodeb ~]# mkfs.ext3 /dev/vgmail2/lvmail2 
 mke2fs 1.35 (28-Feb-2004)
 ...
 Writing superblocks and filesystem accounting information: done
[root@nodea ~]# vgchange -a y vgmail1
 1 logical volume(s) in volume group "vgmail1" now active
[root@nodea ~]# vgchange -a y vgmail2
 1 logical volume(s) in volume group "vgmail2" now active
[root@nodea ~]# vgchange -c y vgmail1
 Volume group "vgmail1" successfully changed
[root@nodea ~]# vgchange -c y vgmail2
 Volume group "vgmail2" successfully changed
  • On Node A:
[root@nodea ~]# vgchange -a y vgmail2
 1 logical volume(s) in volume group "vgmail2" now active
[root@nodea ~]# vgchange -c y vgmail2
 Volume group "vgmail2" successfully changed


Next, you need to create the mountpoints. For this, you need to determine your instance's root directory. The pathname for this will be

/var/opt/scalix/az

where az is the first and the last letter of your instance name, e.g. if your instances are named mail1 and mail2, your mountpoints will be

/var/opt/scalix/m1

and

/var/opt/scalix/m2

respectively.

Create those mountpoints on both nodes:

  • On Node A:
[root@nodea init.d]# mkdir -p /var/opt/scalix/m1
[root@nodea init.d]# mkdir -p /var/opt/scalix/m2
  • On Node B:
[root@nodeb init.d]# mkdir -p /var/opt/scalix/m1
[root@nodeb init.d]# mkdir -p /var/opt/scalix/m2

Cluster Software Setup

RedHat Cluster Suite (RHCS)

Installation

Refer to RedHat documentation for Hardware Prerequisites and basic cluster software installation. Watch out for the following items:

  • Install the cluster software on both nodes.
  • Run system-config-cluster on the first node, after performing the basic configuration, copy the cluster.conf file to the other node (as described in the cluster documentation).
  • You can use the default Distributed Lock Manager (DLM). You don't need the Global Lock Manager.
  • If you have more than one cluster on the same subnet, make sure to modify the default cluster name "alpha_cluster". This can be found in the cluster.conf file.

Creating the Scalix service

The service that represents the Scalix instance to the cluster should be created before starting the Scalix installation. This will provide the virtual IP address and shared storage mountpoint so that Scalix can be installed using this. To create the Scalix service, follow these steps:

1. Make sure the cluster software is started on both nodes:

service ccds start; service cman start; service rgmanager start; service fenced start

2. Using system-config-cluster, add the services for both mail1 and mail2:

  • Use the instance name,"mail1" as the service name
  • Uncheck "Autostart this Service" and set recovery policy to "Disable"
  • Create a new resource for the service to represent the virtual IP address defined for the instance
  • Create a new resource for the service to represent the filesystem and mountpoint:
Name:              mail1fs
File System Type:  ext3
Mountpoint:        /var/opt/scalix/m1
Device:            /dev/vgmail1/lvmail1
Options:           
File System ID:    
Force Unmount:     Checked
Reboot host node:  Checked 
Check file system: Checked

3. Bring up the services on the respective primary node:

clusvcadm -e mail1 -m nodea.scalixdemo.com
clusvcadm -e mail2 -m nodeb.scalixdemo.com

4. At this point, it's good to check if the IP address and filesystem resources are available.

  • On Node A:
[root@nodea ~]# clustat
Member Status: Quorate
 Member Name                              Status
 ------ ----                              ------
 nodea.scalixdemo.com                     Online, Local, rgmanager
 nodeb.scalixdemo.com                     Online, rgmanager
 Service Name         Owner (Last)                   State         
 ------- ----         ----- ------                   -----         
 mail1                (nodea.scalixdemo.com)         disabled        
 mail2                (nodeb.scalixdemo.com)         disabled        

[root@nodea ~]# mount
...
/dev/mapper/vgmail1-lvmail1 on /var/opt/scalix/m1 type ext3 (rw)

[root@nodea ~]# ip addr show
...
2: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 1000
   link/ether 00:06:5b:04:f4:25 brd ff:ff:ff:ff:ff:ff
   inet 169.254.0.51/24 brd 169.254.0.255 scope global eth0
   inet 169.254.0.61/32 scope global eth0
   inet6 fe80::206:5bff:fe04:f425/64 scope link 
      valid_lft forever preferred_lft forever

Scalix Setup

Installation

You can now install Scalix. Follow the steps documented in the Scalix Installation Guide, with the follwing change:

  • When starting the installer, specify both the instance name and the fully-qualified hostname on the command line:
  • On Node A:
[root@nodea ~]# ./scalix-installer --instance=mail1 --hostname=mail1.scalixdemo.com
  • Select 'Typical install'
  • On Node B:
[root@nodeb ~]# ./scalix-installer --instance=mail2 --hostname=mail2.scalixdemo.com
  • Select 'Typical install'
  • When asked for the 'Secure Communications' password, make sure to use the same password on both nodes.

Post-Install Tasks

Setting up Scalix Management Services

One of the instances needs to be nominated as the Admin instance ("Ubermanager"). The other instances will be managed through this. In this example, we assume that "mail1" will be the admin instance. For the instances not running the admin server, follow these steps:

  • De-register the Management Console and Server from the node:
[root@nodeb ~]# /opt/scalix-tomcat/bin/sxtomcat-webapps --del mail2 caa sac 
  • Reconfigure the Management Agent to report to the Admin instance:
[root@nodeb ~]# ./scalix-installer --instance=mail2 --hostname=mail2.scalixdemo.com
  • Select "Reconfigure Scalix Components"
  • Select "Scalix Management Agent"
  • Enter the fully-qualified virtual hostname of the Admin instance when asked for the host where Management Services are installed.
  • Re-enter the 'Secure Communitcations' password. Make sure you use the same value as above.
  • Don't opt to Create Admin Groups

Setting the Tomcat shutdown port

Each instance across the cluster needs to have a cluster-wide unique Tomcat shutdown port number. By default, all instances are setup to use 8005. On all but one instance, this has to be modified. Use the following steps:

  • Set a unique shutdown port number with the following command:
[root@nodeb ~]# /opt/scalix-tomcat/bin/sxtomcat-modify-instance -p 8006 mail2
  • Restart Tomcat
[root@nodeb ~]# service scalix-tomcat restart

Disabling Scalix Auto-Start

The Scalix services needs to be excluded from starting when the system boots before they can be integrated into the cluster. First, shutdown Scalix manually. To do this, execute the following commands on all nodes:

[root@node ~]# service scalix-tomcat stop
[root@node ~]# service scalix-postgres stop
[root@node ~]# service scalix stop

Remove the Scalix services from the system auto-start configuration. Again, this must be done on all nodes:

[root@node ~]# chkconfig --del scalix 
[root@node ~]# chkconfig --del scalix-tomcat
[root@node ~]# chkconfig --del scalix-postgres

Registering all instances on all nodes

Each instance is only registered on the node where it was created. For clustered operations, all instances need to be registered on all nodes. Instance registration information is kept in the /etc/opt/scalix/instance.cfg config file. You need to merge the contents of these files and install the combined file on all nodes. At the same time, you should also disable instance autostart in the registration file. In the example, the file should look like this on all nodes:

OMNAME=mail1
OMHOSTNAME=mail1.scalixdemo.com
OMDATADIR=/var/opt/scalix/m1/s
OMAUTOSTART=FALSE
#
OMNAME=mail2
OMHOSTNAME=mail2.scalixdemo.com
OMDATADIR=/var/opt/scalix/m2/s
OMAUTOSTART=FALSE

Setting up Apache integration on all nodes

Registering the Rules Wizard with the virtual host

For the web-based Scalix Rules Wizard to be available, symbolic links must be created in the Apache config directories. Execute the following commands:

  • On Node A:
[root@nodea ~]# cd /etc/opt/scalix-tomcat/connector
[root@nodea connector]# cp /opt/scalix/global/httpd/scalix-web-client.conf jk/app-mail1.srw.conf
[root@nodea connector]# cp /opt/scalix/global/httpd/scalix-web-client.conf ajp/app-mail1.srw.conf
  • On Node B:
[root@nodeb ~]# cd /etc/opt/scalix-tomcat/connector
[root@nodeb connector]# cp /opt/scalix/global/httpd/scalix-web-client.conf jk/app-mail2.srw.conf
[root@nodeb connector]# cp /opt/scalix/global/httpd/scalix-web-client.conf ajp/app-mail2.srw.conf
Registering all workers for mod_jk

All workers for Apache mod_jk.so must be registered in /etc/opt/scalix-tomcat/connector/jk/workers.conf. In the example, the file should look like this on all nodes:

JkWorkerProperty worker.list=mail1,mail2
Copying the Apache Tomcat connector config files

All files in /etc/opt/scalix-tomcat/connector/ajp and /etc/opt/scalix-tomcat/connector/jk must now be copied between all nodes so that those directories have the same contents on all nodes in the cluster. In the example, the directories must contain these files on both nodes:

[root@nodea connector]# ls -R *
ajp:
app-mail1.api.conf  app-mail1.srw.conf      app-mail2.srw.conf
app-mail1.caa.conf  app-mail1.webmail.conf  app-mail2.webmail.conf
app-mail1.m.conf    app-mail2.api.conf      instance-mail1.conf
app-mail1.res.conf  app-mail2.m.conf        instance-mail2.conf
app-mail1.sac.conf  app-mail2.res.conf
app-mail1.sis.conf  app-mail2.sis.conf 

jk:
app-mail1.api.conf  app-mail1.srw.conf      app-mail2.webmail.conf
app-mail1.caa.conf  app-mail1.webmail.conf  instance-mail1.conf
app-mail1.m.conf    app-mail2.api.conf      instance-mail2.conf
app-mail1.res.conf  app-mail2.m.conf        workers.conf
app-mail1.sac.conf  app-mail2.res.conf
app-mail1.sis.conf  app-mail2.sis.conf
Restarting Apache

After applying all apache configuration changes, Apache should be restarted on both nodes:

service httpd restart

Scalix Cluster Integration

RedHat Cluster Suite (RHCS)

t.b.d.

Upgrading Scalix in the Cluster

For upgrade, the cluster should be in a healthy state, i.e. each node should be running one instance. While on the node, just follow the instructions to upgrade Scalix as per the install guide, however, specify instance and hostname on the installer command line, e.g.:

./scalix-installer --instance=mail --hostname=mai1.scalixdemo.com

After performing the upgrade on a node hosting an instance which is not running the Management server, you will again need to disable SAC from running on this instance:

  • De-register the Management Console and Server from the node:
[root@nodeb ~]# /opt/scalix-tomcat/bin/sxtomcat-webapps --del mail2 caa sac 
  • Restart Tomcat
[root@nodeb ~]# service scalix-tomcat restart mail2