WhiskeyTech

Red Hat OSP13+: Ad-hoc Patching Containers

Recently I experienced unhealthy containers in my OSP13 deployment:

[root@overcloud-controller-0 ~]# docker ps | grep unheal
8d1596305185 172.16.0.11:8787/rhosp13/openstack-gnocchi-statsd:13.0-54 "kolla_start" 23 hours ago Up 23 hours (unhealthy) gnocchi_statsd
ef2aeb43a2f0 172.16.0.11:8787/rhosp13/openstack-nova-placement-api:13.0-58 "kolla_start" 23 hours ago Up 30 minutes (unhealthy) nova_placement

Taking a closer look at them, gave me something to look for:

[root@overcloud-controller-0 ~]# docker inspect nova_placement –format='{{json .State.Health}}’ | jq .
{
“Log”: [
{
“Output”: “\n403 172.16.1.20:8778 0.011 seconds\ncurl: (22) The requested URL returned error: 403 Forbidden\n”,
“ExitCode”: 1,
“End”: “2018-09-20T14:22:51.05793874-04:00”,
“Start”: “2018-09-20T14:22:49.888372665-04:00”
},
{
“Output”: “\n403 172.16.1.20:8778 0.011 seconds\ncurl: (22) The requested URL returned error: 403 Forbidden\n”,
“ExitCode”: 1,
“End”: “2018-09-20T14:23:21.996468599-04:00”,
“Start”: “2018-09-20T14:23:21.063831743-04:00”
},
{
“Output”: “\n403 172.16.1.20:8778 0.006 seconds\ncurl: (22) The requested URL returned error: 403 Forbidden\n”,
“ExitCode”: 1,
“End”: “2018-09-20T14:23:52.160645916-04:00”,
“Start”: “2018-09-20T14:23:51.996602458-04:00”
},
{
“Output”: “\n403 172.16.1.20:8778 0.008 seconds\ncurl: (22) The requested URL returned error: 403 Forbidden\n”,
“ExitCode”: 1,
“End”: “2018-09-20T14:24:22.570588091-04:00”,
“Start”: “2018-09-20T14:24:22.161032051-04:00”
},
{
“Output”: “\n403 172.16.1.20:8778 0.005 seconds\ncurl: (22) The requested URL returned error: 403 Forbidden\n”,
“ExitCode”: 1,
“End”: “2018-09-20T14:24:52.823193549-04:00”,
“Start”: “2018-09-20T14:24:52.570781415-04:00”
}
],
“FailingStreak”: 58,
“Status”: “unhealthy”
}

[root@overcloud-controller-0 ~]

As it turns out, there were already open BZ’s for this issue as well as the issue with gnocchi_statsd:

Bug 1630129 – openstack-nova-placement-api reports unhealthy

Bug 1623463 – [FFU] Gnocchi-statsd container reported unhealthy after FFU

While both bugs have to do with the healthcheck and do not impact the functionality of Openstack, I wanted to fix these so if I were to do a deployment in the field for a Proof of Concept, I didn’t leave a customer with “unhealthy” containers.

At first, I figured I would just patch the code in /var/lib/config-data/puppet-generated and restart the containers to ensure everything was working, however, there was no /openstack/healthcheck to be found in there. Prior to the container world, I would just patch the TripleO code in /usr/share and either re-deploy or stack update to fix it. Unfortunately, that doesn’t work in this case either since it seems the code is baked into the containers I pulled down from registry.access.redhat.com. I needed to figure out how to patch the code in the container. The first thing I did was I patched all of the code in /usr/share/openstack-tripleo-common/healthcheck per the BZ’s above. Then I rebuilt the containers. For example, to fix the nova-placement healthcheck, I performed the following:

(undercloud) [stack@ds-osp13-uc ~]$ mkdir rebuild_image # create a working directory


(undercloud) [stack@ds-osp13-uc ~]$ cd rebuild_image


(undercloud) [stack@ds-osp13-uc rebuild_image]$ cp /usr/share/openstack-tripleo-common/healthcheck/nova-placement . # Copy the patched code to here


(undercloud) [stack@ds-osp13-uc rebuild_image]$ grep NovaPlacement ~/templates/overcloud_images.yaml | awk ‘{print $NF}’ | sort -u # Identify the image being used for nova-placement

172.16.0.11:8787/rhosp13/openstack-nova-placement-api:13.0-58


(undercloud) [stack@ds-osp13-uc rebuild_image]$ cat Dockerfile # Created this Dockerfile

FROM 172.16.0.11:8787/rhosp13/openstack-nova-placement-api:13.0-58
ADD ./nova-placement /openstack/healthcheck


(undercloud) [stack@ds-osp13-uc rebuild_image]$ docker build -t 172.16.0.11:8787/rhosp13/openstack-nova-placement-api:13.0-58-custom . # Build the new image


(undercloud) [stack@ds-osp13-uc rebuild_image]$ sudo docker push 172.16.0.11:8787/rhosp13/openstack-nova-placement-api:13.0-58-custom . # Push the new image


(undercloud) [stack@ds-osp13-uc rebuild_image]$ sed -i ‘s/openstack-nova-placement-api:13.0-58/openstack-nova-placement-api:13.0-58-custom/g’ ~/templates/overcloud_images.yaml # Update the image to be used for the service to the newly built custom image


I performed a similar process for gnocchi_statsd and once I was done, I performed a stack update to push the new images to my environment. It was near the end of Step 3 where I saw my nova-placement container restart and Step 5 for the gnocchi-statsd restart:

[root@overcloud-controller-0 ~]# docker ps | egrep '(nova-placement|gnocchi-statsd)'

8c36a0ef5550 172.16.0.11:8787/rhosp13/openstack-gnocchi-statsd:13.0-54-custom "kolla_start" 17 hours ago Up 17 hours (healthy) gnocchi_statsd
84fc00607146 172.16.0.11:8787/rhosp13/openstack-nova-placement-api:13.0-58-custom "kolla_start" 19 hours ago Up 51 minutes (healthy) nova_placement


[root@overcloud-controller-0 ~]#

Now my containers are reporting as healthy once again. One thing to note, when you do the docker push, you need to do this as root. If you do not, your overcloud nodes will not see the image even though you can see it when you execute docker images on the undercloud and you’ll be struggling to figure out what’s going on.

Back to Top