This is one of the most common issues that could carry issues if you have a MariaDB Galera Cluster. After restarting the cluster, the nodes are not joining. If you look into your log, you may find a line like this:
It may not be safe to bootstrap the cluster from this node. It was not the last one to leave the cluster and may not contain all the updates. To force cluster bootstrap with this node, edit the grastate.dat file manually and set safe_to_bootstrap to 1.
Continue reading, here there are some steps I did to recover mine.
If all nodes are down at some point, take not of what node was the last to be offline. This node will need to be the first to be started. Start that node with the --wsrep-new-cluster flag on. In CentOS 6, and any Linux that still uses SystemV it can be done with a service mysql start --wsrep-new-cluster command. If you are using CentOS 7 or any distribution with Systemd, you need to do something like systemctl set-environment _WSREP_NEW_CLUSTER='--wsrep-new-cluster' && systemct start mariadb && systemctl set-environment _WSREP_NEW_CLUSTER=''. The second systemctl is important because Systemd has environment persistency and it is very unlikely that that node will be the last to be offline, not to mention it is very easy to forget that flag.
Go to the other server and restart it. Check the log, you should see rsync messages and a WSREP log telling the cluster name and reporting a new node in the cluster.
If this doesn't fix your cluster, continue with the second attempt.
Find a file named grstate.dat, usually, it is in the /var/lib/mysql directory. This file is very interesting, it contains cluster state information. Edit it on both servers and put the safe_to_bootstrap field to 1 on both servers. Start the first one (from the first attempt). In the second server run the galera_recovery script. This script will recover the position number you need to start your cluster. It will output something like WSREP: Recovered position <uuid>. Next is to start the mariadb with that using the --wsrep_start_position flag.
In my case, I started to see rsync synchronization and after a while, my second server was online.
Good luck!blog comments powered by Disqus