Cloud Servers - High Availability with heartbeat on CentOS 5.5

This article is about setting up high-availability on CentOS using heartbeat software. We will have two web servers (named web1 and web2 in this article), both of these servers would have a shared IP (virtual IP) between them, this virtual IP would be active only at one web server at any time. So, it would be like an Active/Passive high-availability, where one web server would be active (have virtual IP) and other host would be in passive mode(waiting for first server to fail). All your web requests would be directed to the virtual IP address through DNS configuration. Both the server would have an heartbeat package installed and configured on them, this heartbeat service would be used by both the servers to check if the other box is active or have failed. So, let’s get on with it. I’m going to use rackspace cloud server to configure it.

Creating the cloud servers and shared IP:

Login into your Rackspace Cloud Control Panel at https://manage.rackspacecloud.com and create two CentOS 5.5 Cloud Servers. Choose the configuration which suites your resource requirement, give them descriptive names so that you can easily identify them for e.g. web1 and web2. Once you have your two Cloud Servers created, you will have to create a support ticket to get a shared IP for your cloud servers, as mentioned on this link cloudservers.rackspacecloud.com/index.php/Frequently_Asked_Questions#Can_I_buy_extra_IPs.3F

Installing heartbeat software:

Note: All the following commands are need to be run on both the cloud servers (e.g. web01 web02)

You will have to install heartbeat package to setup heartbeat between both the cloud servers for monitoring.

[root@ha01 /]# yum update
[root@ha01 /]# yum install heartbeat-pils heartbeat-stonith  heartbeat

Once all the above packages get installed, you can confirm them by running following command:

[root@ha01 /]# rpm -qa | grep heartbeat
heartbeat-pils-2.1.3-3.el5.centos
heartbeat-stonith-2.1.3-3.el5.centos
heartbeat-2.1.3-3.el5.centos
[root@ha01 /]# 

Configuring heartbeat:

First, we need to copy sample configuration files from the /usr/share/doc/heartbeat-2.1.3 directory to /etc/ha.d directory

[root@ha01 ha.d]# cd /usr/share/doc/heartbeat-2.1.3/
[root@ha01 heartbeat-2.1.3]# cp ha.cf authkeys haresources /etc/ha.d
[root@ha01 heartbeat-2.1.3]# cd /etc/ha.d/
[root@ha01 ha.d]# ls
authkeys  ha.cf  harc  haresources  rc.d  README.config  resource.d  shellfuncs
[root@ha01 ha.d]# 

Next, we need to populate authkeys file with an MD5 sum key. You can generate the key with following command.

[root@ha01 ha.d]#  dd if=/dev/urandom bs=512 count=1 2>/dev/null | openssl md5
ea6cdc1133c424e432aed155dd48a49d

Now we need to enter the key into “authkeys” file, so it looks like following.

[root@ha01 ha.d]# cat authkeys 
auth 1
1 md5 a77030a32d0cc2b6cac31f9cddfe4b09

Next, we need to configure ha.cf and add/update the following parameters in it with appropriate values: You will need to change the hostname as per your cloud server host names for node parameters.

on web01

debugfile /var/log/ha-debug
logfile /var/log/ha-log
logfacility local0
keepalive 2
deadtime 10
udpport 694
bcast eth1
ucast eth1 
auto_failback on
node web01
node web02

On web02:

debugfile /var/log/ha-debug
logfile /var/log/ha-log
logfacility local0
keepalive 2
deadtime 10
udpport 694
bcast eth1
ucast eth1 <private IP address of web01>
auto_failback on
node web01
node web02

Next, we need to configure haresources and add resources into it. The haresources file contains a list of resources that move from machine to machine as nodes go down and come up in the cluster. Do not include any fixed IP addresses in this file.

Note: The haresources file MUST BE IDENTICAL on all nodes of the cluster.

The node names listed in front of the resource group information is the name of the preferred node to run the service. It is not necessarily the name of the current machine. Like in below example, I have chosen web01 as the preferred node to run the HTTPD service, but if web01 is not available then the httpd service will be started on web02. Should the service move back to web01 again once it becomes available is controlled by the auto_failback ON configuration in ha.cf file

So, add the following line into haresrouces file on both the servers.

web01 <shared IP address>/24/eth0 httpd

Starting the heartbeat service:

Now, let’s start the heartbeat service on both the nodes using following command

[root@web01 /# chkconfig heartbeat on
[root@web01 /# service heartbeat start
Starting High-Availability services: 
2011/04/27_08:16:04 INFO:  Resource is stopped
                                                           [  OK  ]

Now if you check httpd service status, it should be running. And your shared IP address should be up on the web01 node.

[root@web01 ~]# service httpd status
httpd (pid  23938) is running...

ifconfig -a command will show you all the available IP address. By running ifconfig -a command on the web01 you can confirm that it has the virtual IP address up and accessible on it.

Testing the failover using heartbeat service

Let’s test the high availability. Shutdown the web01 node using halt command. The virtual IP address and httpd service should automatically be failed over to web02.

w