WhiskeyTech

Octavia 911

Recently I was at a customer who had their compute nodes reboot all at once, but that wasn’t the primary issue. The issue was that when those nodes came back up, the Octavia load balancer that was deployed as part of their Openshift install was hosed. The load balancers were in a “PENDING_UPDATE” state and the associated VM’s were gone. They just were not in the output of “openstack server list --all” anymore. [1]

This customer told me that they contacted support on the issue and they were told there was no recovery path from this state and they had to re-install Openshift to fix it. That was fine the first time, however, due to varying issues, actually had the compute nodes reboot several additional times and each time they had to go through the re-install. Needless to say, the developers were less than thrilled with this and to me, it sounded like a sledgehammer way to drive a nail.

I started looking at one of their “downed” environments and realized all of the artifacts that I needed to rebuild the load balancers were readily available since all of the Octavia data existed, the only thing missing were the actual Virtual Machines. So I wrote a script that looks at that data and creates one script per load balancer with the commands to create that load balancer. The end result is the attached script.

When you run this script, it creates a new script for each load balancer it finds. Each script is a set of commands to create the load balancer in your “openstack loadbalancer list”. So at that point, you can delete the existing load balancer out of Octavia, run the associated script and be back in business.

If your load balancer is stuck in PENDING_UPDATE, unfortunately at this time, the only way to delete it is to go into the database, set the provisioning_status to ERROR and then you can delete it with “openstack loadbalancer delete“. To update the database, SSH to one of your overcloud controller nodes and perform the following (OSP 13+):

[root@overcloud-controller-0 ~]# docker exec -u root -it $(docker ps | grep galera-bundle-docker | awk '{print $NF}') /bin/bash

()[root@overcloud-controller-0 /]# mysql

Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 141262
Server version: 10.1.20-MariaDB MariaDB Server
Copyright (c) 2000, 2016, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> use octavia;

Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed

MariaDB [octavia]> update load_balancer set provisioning_status = 'ERROR' where id = '';

MariaDB [octavia]> exit;

I’ve suggested to the customer that going forward they run the rebuild_lb.sh after each deployment of OCP they perform so they always have a recovery path until whatever our supported solution comes out.

Hopefully this is useful to others.

Back to Top