Starting and testing CTDB
The CTDB log is in /var/log/log.ctdb so look in this file if something
did not start correctly.
You can ensure that ctdb is running on all nodes using
onnode all service ctdb start
Verify that the CTDB daemon started properly. There should normally be at least 2 processes started for CTDB, one for the main daemon and one for the recovery daemon.
onnode all pidof ctdbd
Once all CTDB nodes have started, verify that they are correctly
talking to each other.
There should be one TCP connection from the private ip address on each
node to TCP port 4379 on each of the other nodes in the cluster.
onnode all netstat -tn | grep 4379
Automatically restarting CTDB
If you wish to cope with software faults in ctdb, or want ctdb to
automatically restart when an administration kills it, then you may
wish to add a cron entry for root like this:
* * * * * /etc/init.d/ctdb cron > /dev/null 2>&1
Testing CTDB
Once your cluster is up and running, you may wish to know how to test that it is functioning correctly. The following tests may help with that
The ctdb tool
The ctdb package comes with a utility called ctdb that can be used to
view the behaviour of the ctdb cluster.
If you run it with no options it will provide some terse usage information. The most commonly used commands are:
ctdb status
ctdb ip
ctdb ping
ctdb status
The status command provides basic information about the cluster and the status of the nodes. when you run it you will get some output like:
Number of nodes:4
vnn:0 10.1.1.1 OK (THIS NODE)
vnn:1 10.1.1.2 OK
vnn:2 10.1.1.3 OK
vnn:3 10.1.1.4 OK
Generation:1362079228
Size:4
hash:0 lmaster:0
hash:1 lmaster:1
hash:2 lmaster:2
hash:3 lmaster:3
Recovery mode:NORMAL (0)
Recovery master:0
The important parts are in bold. This tells us that all 4 nodes are in
a healthy state.
It also tells us that recovery mode is normal, which means that the
cluster has finished a recovery and is running in a normal fully
operational state.
Recovery state will briefly change to "RECOVERY" when there ahs been a
node failure or something is wrong with the cluster.
If the cluster remains in RECOVERY state for very long (many seconds)
there might be something wrong with the configuration. See
/var/log/log.ctdb.
ctdb ip
This command prints the current status of the public ip addresses and which physical node is currently serving that ip.
Number of nodes:4
192.168.1.1 0
192.168.1.2 1
192.168.2.1 2
192.168.2.1 3
ctdb ping
this command tries to "ping" each of the CTDB daemons in the cluster.
ctdb ping -n all
response from 0 time=0.000050 sec (13 clients)
response from 1 time=0.000154 sec (27 clients)
response from 2 time=0.000114 sec (17 clients)
response from 3 time=0.000115 sec (59 clients)
|