WhiskeyTech

CephFS Backed Red Hat Virtualization

I have been working on a home lab and since I haven’t had much experience with Gluster and we now support CephFS, I wanted to take a swing at deploying a CephFS backed RHV cluster. First, apparently this is not a supported deployment configuration from Red Hat at the time of this writing. That being said, this entire project has been a science experiment so your mileage may vary!

Here’s a quick architecture drawing of what I am trying to do with my home lab:

In summary, I have 3 nodes that will be “Hyperconverged” running RHV and Ceph. I do have one standalone node which is running KVM and hosting a VM I am referring to as “Deployer” which I will use to execute ceph-ansible from to perform the installation on my 3 hyperconverged nodes.

After installing RHEL on the 3 nodes, I did run some performance tests on the SSD’s that I have installed on the baremetal nodes prior to starting this endeavor so I can have a baseline on performance metrics when I am up and running.

[root@server50 ~]# mkfs.xfs -f /dev/sdb1
meta-data=/dev/sdb1              isize=512    agcount=4, agsize=30524097 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=0, sparse=0
data     =                       bsize=4096   blocks=122096385, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal log           bsize=4096   blocks=59617, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
[root@server50 ~]# mount /dev/sdb1 /mnt
[root@server50 ~]# cd /mnt
[root@server50 mnt]# df -h .
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdb1       466G   33M  466G   1% /mnt
[root@server50 mnt]# for X in $(seq 1 10)
> do
> dd if=/dev/zero of=./test${X}.img bs=1M count=1000 conv=fdatasync
> done
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 2.48825 s, 421 MB/s
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 2.46448 s, 425 MB/s
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 2.4612 s, 426 MB/s
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 2.46239 s, 426 MB/s
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 2.46171 s, 426 MB/s
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 2.4624 s, 426 MB/s
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 2.46319 s, 426 MB/s
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 2.46367 s, 426 MB/s
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 2.46112 s, 426 MB/s
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 2.46366 s, 426 MB/s
[root@server50 mnt]#

It quickly occurred to me that my limiting factor here is going to be my network, which is a 1G network. I tested performance bandwith using iperf3:

[root@server50 ~]# iperf3 -c 172.16.210.60 -f M
Connecting to host 172.16.210.60, port 5201
[  4] local 172.16.210.50 port 47624 connected to 172.16.210.60 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec   114 MBytes   114 MBytes/sec    2    423 KBytes
[  4]   1.00-2.00   sec   113 MBytes   113 MBytes/sec    0    491 KBytes
[  4]   2.00-3.00   sec   112 MBytes   112 MBytes/sec    0    494 KBytes
[  4]   3.00-4.00   sec   112 MBytes   112 MBytes/sec    0    495 KBytes
[  4]   4.00-5.00   sec   113 MBytes   113 MBytes/sec    0    495 KBytes
[  4]   5.00-6.00   sec   111 MBytes   111 MBytes/sec    0    495 KBytes
[  4]   6.00-7.00   sec   113 MBytes   113 MBytes/sec    0    495 KBytes
[  4]   7.00-8.00   sec   112 MBytes   112 MBytes/sec    0    495 KBytes
[  4]   8.00-9.00   sec   112 MBytes   112 MBytes/sec    0    495 KBytes
[  4]   9.00-10.00  sec   112 MBytes   112 MBytes/sec    0    495 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  1.10 GBytes   112 MBytes/sec    2             sender
[  4]   0.00-10.00  sec  1.10 GBytes   112 MBytes/sec                  receiver

iperf Done.
[root@server50 ~]#

While I can get 400+ MB/s on my drives, it’s looking like I should expect closer to 100MB/s from CephFS, at least until I upgrade my networking gear.

Lastly, before diving into things, when deploying Ceph using ceph-ansible, you have the option to deploy Ceph in containers or not in containers. Initially, I thought containerizing Ceph would be the way to go to ensure there’s definitely a separation of dependencies between Ceph and RHV, however this simply resulted in more headaches than it was worth. At the end of the day, when you deploy a containerized Ceph with Ganesha-NFS, it also containerizes the rpcbind service into the Ganesha container. This is problematic because VDSM also uses rpcbind. Figuring out how to make containerized Ceph and VDSM work together is another science project for another day…

Getting Started With Ceph

For the most part, I utilized the official Red Hat documentation to install Ceph. First thing I did was install RHEL on the 3 nodes in my lab (server50, server60 and server70). As of this writing, RHEL 7.7 is out, however, there was an issue with RHV on 7.7 with regards to some hardware virtualization so I opted to version lock RHEL to 7.6 when I installed and updated it. If you’ve never done this before, here’s the quick cliff notes:

subscription-manager repos --disable=*
subscription-manager release --set=7.6
subscription-manager repos --enable=rhel-7-server-extras-rpms subscription-manager repos --enable=rhel-7-server-extras-rpms \
                     --enable=rhel-7-server-rh-common-rpms \
                     --enable=rhel-7-server-rpms \
                     --enable=rhel-7-server-rhceph-3-tools-rpms \
                     --enable=rhel-7-server-ansible-2-rpms \
                     --enable=rhel-7-server-rhv-4-mgmt-agent-rpms
yum update -y

In the screenshot above, you will notice that my repos differ slightly from what is in the official documentation. Instead of using “rhel-7-server-ansible-2.6-rpms” I went with “rhel-7-server-ansible-2-rpms” because the RHV installation instructions require a version greater than 2.6. At the time of this writing, using the repos displayed above, Ansible 2.8 was installed which worked fine for my purposes.

Once patched and up to date, reboot.

By version locking RHEL, this does cause an issue with the Ceph installation because the RPM’s for Ceph are built against the latest version of selinux-policy-base. So during the install, it fails with the following error:

Package: 2:ceph-selinux-12.2.12-48.el7cp.x86_64 (rhel-7-server-rhceph-3-mon-rpms)\n           Requires: selinux-policy-base >= 3.13.1-252.el7.1

To work around this, I used yumdownloader on my deployer node, which is at RHEL 7.7 (latest as of this writing) and pulled down the latest versions of selinux-policy and selinux-policy-targeted, copied the RPM’s to my RHV/Ceph nodes and used yum to install them.

[root@deployer ~]# mkdir pkg-deps
[root@deployer ~]# cd pkg-deps/
[root@deployer pkg-deps]# yumdownloader selinux-policy
Loaded plugins: product-id, subscription-manager
selinux-policy-3.13.1-252.el7.1.noarch.rpm                                                                                                                                                  | 492 kB  00:00:00     
[root@deployer pkg-deps]#  yumdownloader selinux-policy-targeted
Loaded plugins: product-id, subscription-manager
selinux-policy-targeted-3.13.1-252.el7.1.noarch.rpm                                                                                                                                         | 7.0 MB  00:00:00     
[root@deployer pkg-deps]# for NDX in 50 60 70
> do
> scp * root@server${NDX}:~
> done
selinux-policy-3.13.1-252.el7.1.noarch.rpm                                                                                                                                       100%  492KB  66.4MB/s   00:00    
selinux-policy-targeted-3.13.1-252.el7.1.noarch.rpm                                                                                                                              100% 7141KB  89.1MB/s   00:00    
selinux-policy-3.13.1-252.el7.1.noarch.rpm                                                                                                                                       100%  492KB  67.1MB/s   00:00    
selinux-policy-targeted-3.13.1-252.el7.1.noarch.rpm                                                                                                                              100% 7141KB  89.3MB/s   00:00    
selinux-policy-3.13.1-252.el7.1.noarch.rpm                                                                                                                                       100%  492KB  71.4MB/s   00:00    
selinux-policy-targeted-3.13.1-252.el7.1.noarch.rpm                                                                                                                              100% 7141KB  89.6MB/s   00:00    
[root@deployer pkg-deps]# for NDX in 50 60 70; do ssh root@server${NDX} 'yum install ~/*.rpm -y'; done
Loaded plugins: product-id, search-disabled-repos, subscription-manager
Examining /root/selinux-policy-3.13.1-252.el7.1.noarch.rpm: selinux-policy-3.13.1-252.el7.1.noarch
Marking /root/selinux-policy-3.13.1-252.el7.1.noarch.rpm as an update to selinux-policy-3.13.1-229.el7_6.15.noarch
Examining /root/selinux-policy-targeted-3.13.1-252.el7.1.noarch.rpm: selinux-policy-targeted-3.13.1-252.el7.1.noarch
Marking /root/selinux-policy-targeted-3.13.1-252.el7.1.noarch.rpm as an update to selinux-policy-targeted-3.13.1-229.el7_6.15.noarch
Resolving Dependencies
--> Running transaction check
---> Package selinux-policy.noarch 0:3.13.1-229.el7_6.15 will be updated
---> Package selinux-policy.noarch 0:3.13.1-252.el7.1 will be an update
---> Package selinux-policy-targeted.noarch 0:3.13.1-229.el7_6.15 will be updated
---> Package selinux-policy-targeted.noarch 0:3.13.1-252.el7.1 will be an update
--> Finished Dependency Resolution

Dependencies Resolved

================================================================================
 Package
      Arch   Version           Repository                                  Size
================================================================================
Updating:
 selinux-policy
      noarch 3.13.1-252.el7.1  /selinux-policy-3.13.1-252.el7.1.noarch    6.7 k
 selinux-policy-targeted
      noarch 3.13.1-252.el7.1  /selinux-policy-targeted-3.13.1-252.el7.1.noarch
                                                                           19 M

Transaction Summary
================================================================================
Upgrade  2 Packages

Total size: 19 M
Downloading packages:
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Updating   : selinux-policy-3.13.1-252.el7.1.noarch                       1/4 
  Updating   : selinux-policy-targeted-3.13.1-252.el7.1.noarch              2/4 
  Cleanup    : selinux-policy-targeted-3.13.1-229.el7_6.15.noarch           3/4 
  Cleanup    : selinux-policy-3.13.1-229.el7_6.15.noarch                    4/4 
  Verifying  : selinux-policy-targeted-3.13.1-252.el7.1.noarch              1/4 
  Verifying  : selinux-policy-3.13.1-252.el7.1.noarch                       2/4 
  Verifying  : selinux-policy-3.13.1-229.el7_6.15.noarch                    3/4 
  Verifying  : selinux-policy-targeted-3.13.1-229.el7_6.15.noarch           4/4 

Updated:
  selinux-policy.noarch 0:3.13.1-252.el7.1                                      
  selinux-policy-targeted.noarch 0:3.13.1-252.el7.1                             

Complete!
Loaded plugins: product-id, search-disabled-repos, subscription-manager
Examining /root/selinux-policy-3.13.1-252.el7.1.noarch.rpm: selinux-policy-3.13.1-252.el7.1.noarch
Marking /root/selinux-policy-3.13.1-252.el7.1.noarch.rpm as an update to selinux-policy-3.13.1-229.el7_6.15.noarch
Examining /root/selinux-policy-targeted-3.13.1-252.el7.1.noarch.rpm: selinux-policy-targeted-3.13.1-252.el7.1.noarch
Marking /root/selinux-policy-targeted-3.13.1-252.el7.1.noarch.rpm as an update to selinux-policy-targeted-3.13.1-229.el7_6.15.noarch
Resolving Dependencies
--> Running transaction check
---> Package selinux-policy.noarch 0:3.13.1-229.el7_6.15 will be updated
---> Package selinux-policy.noarch 0:3.13.1-252.el7.1 will be an update
---> Package selinux-policy-targeted.noarch 0:3.13.1-229.el7_6.15 will be updated
---> Package selinux-policy-targeted.noarch 0:3.13.1-252.el7.1 will be an update
--> Finished Dependency Resolution

Dependencies Resolved

================================================================================
 Package
      Arch   Version           Repository                                  Size
================================================================================
Updating:
 selinux-policy
      noarch 3.13.1-252.el7.1  /selinux-policy-3.13.1-252.el7.1.noarch    6.7 k
 selinux-policy-targeted
      noarch 3.13.1-252.el7.1  /selinux-policy-targeted-3.13.1-252.el7.1.noarch
                                                                           19 M

Transaction Summary
================================================================================
Upgrade  2 Packages

Total size: 19 M
Downloading packages:
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Updating   : selinux-policy-3.13.1-252.el7.1.noarch                       1/4 
  Updating   : selinux-policy-targeted-3.13.1-252.el7.1.noarch              2/4 
  Cleanup    : selinux-policy-targeted-3.13.1-229.el7_6.15.noarch           3/4 
  Cleanup    : selinux-policy-3.13.1-229.el7_6.15.noarch                    4/4 
  Verifying  : selinux-policy-targeted-3.13.1-252.el7.1.noarch              1/4 
  Verifying  : selinux-policy-3.13.1-252.el7.1.noarch                       2/4 
  Verifying  : selinux-policy-3.13.1-229.el7_6.15.noarch                    3/4 
  Verifying  : selinux-policy-targeted-3.13.1-229.el7_6.15.noarch           4/4 

Updated:
  selinux-policy.noarch 0:3.13.1-252.el7.1                                      
  selinux-policy-targeted.noarch 0:3.13.1-252.el7.1                             

Complete!
Loaded plugins: product-id, search-disabled-repos, subscription-manager
Examining /root/selinux-policy-3.13.1-252.el7.1.noarch.rpm: selinux-policy-3.13.1-252.el7.1.noarch
Marking /root/selinux-policy-3.13.1-252.el7.1.noarch.rpm as an update to selinux-policy-3.13.1-229.el7_6.15.noarch
Examining /root/selinux-policy-targeted-3.13.1-252.el7.1.noarch.rpm: selinux-policy-targeted-3.13.1-252.el7.1.noarch
Marking /root/selinux-policy-targeted-3.13.1-252.el7.1.noarch.rpm as an update to selinux-policy-targeted-3.13.1-229.el7_6.15.noarch
Resolving Dependencies
--> Running transaction check
---> Package selinux-policy.noarch 0:3.13.1-229.el7_6.15 will be updated
---> Package selinux-policy.noarch 0:3.13.1-252.el7.1 will be an update
---> Package selinux-policy-targeted.noarch 0:3.13.1-229.el7_6.15 will be updated
---> Package selinux-policy-targeted.noarch 0:3.13.1-252.el7.1 will be an update
--> Finished Dependency Resolution

Dependencies Resolved

================================================================================
 Package
      Arch   Version           Repository                                  Size
================================================================================
Updating:
 selinux-policy
      noarch 3.13.1-252.el7.1  /selinux-policy-3.13.1-252.el7.1.noarch    6.7 k
 selinux-policy-targeted
      noarch 3.13.1-252.el7.1  /selinux-policy-targeted-3.13.1-252.el7.1.noarch
                                                                           19 M

Transaction Summary
================================================================================
Upgrade  2 Packages

Total size: 19 M
Downloading packages:
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Updating   : selinux-policy-3.13.1-252.el7.1.noarch                       1/4 
  Updating   : selinux-policy-targeted-3.13.1-252.el7.1.noarch              2/4 
  Cleanup    : selinux-policy-targeted-3.13.1-229.el7_6.15.noarch           3/4 
  Cleanup    : selinux-policy-3.13.1-229.el7_6.15.noarch                    4/4 
  Verifying  : selinux-policy-targeted-3.13.1-252.el7.1.noarch              1/4 
  Verifying  : selinux-policy-3.13.1-252.el7.1.noarch                       2/4 
  Verifying  : selinux-policy-3.13.1-229.el7_6.15.noarch                    3/4 
  Verifying  : selinux-policy-targeted-3.13.1-229.el7_6.15.noarch           4/4 

Updated:
  selinux-policy.noarch 0:3.13.1-252.el7.1                                      
  selinux-policy-targeted.noarch 0:3.13.1-252.el7.1                             

Complete!
[root@deployer pkg-deps]#

You will note in my architecture diagram, I am using one NIC (eno3) as the front end NIC and plan to use the other NIC (eno4) as my ceph cluster network and private network space within RHV. When you install RHEL, even if you configure the second NIC and activate it, it still has “ONBOOT=no”. You’ll need to change this yourself and bring up the interface before attempting to install Ceph. I recommend attempting to ping each node from each other one on both interfaces and making sure your networking is squared away as well. Lastly, if you do not have an DNS server in your home lab or even if you do, I recommend updating /etc/hosts with an entry for each host in your cluster as well as the IP you will be using for RHV Manager just to make sure they can resolve each other.

Once the nodes come back up, we will open a number of firewall rules for the services. The ceph-ansible deployment configuration says it can be configured to perform this for you, however, when I was testing it, for whatever reason it never opened the rules for me so I just did it manually as a matter of course:

#Monitor Nodes
firewall-cmd --zone=public --add-port=6789/tcp
firewall-cmd --zone=public --add-port=6789/tcp --permanent
#Manager Nodes
firewall-cmd --zone=public --add-port=6800-7300/tcp
firewall-cmd --zone=public --add-port=6800-7300/tcp --permanent
#Metadata Nodes (Included in Manager Rules)
#firewall-cmd --zone=public --add-port=6800/tcp
#firewall-cmd --zone=public --add-port=6800/tcp --permanent
#Object Gateway Nodes
firewall-cmd --zone=public --add-port=8080/tcp
firewall-cmd --zone=public --add-port=8080/tcp --permanent
#NFS for Ganesha
firewall-cmd --zone=public --add-service=nfs
firewall-cmd --zone=public --add-service=nfs --permanent

Per the documentation, the deployment will connect to the 3 servers using Ansible and a user named “admin” that I create on the three nodes. So I go ahead and create that now:

In working through this, I had to do this installation several times, so each time I wanted to ensure that the disks I was going to use as OSD drives were cleared of any prior data from a previous installation. We can do this by installing the gdisk package and using sgdisk to wipe the drives:

Performing the prep work on the nodes multiple times resulted in the following playbook which I run from my deployer node:

[root@deployer pre-ceph]# cat prep-ceph.yaml 
- name: Configure hosts in prep to install Ceph
  hosts: all
  vars:
    osd_disks:
      - /dev/sdb
      - /dev/sdc
  tasks:
    - name: Add the ansible user
      user:
        name: admin
        password: "$6$homelab$nSgIkGCosYbRE9hPTcMmJr/ohMT7erAYdONcH/NquWS/Vq4BxdoVJb4BSnRZr6q8nmC5x0/mLGTlRGV8vHf2N0"
        comment: Ansible user to install Ceph
        generate_ssh_key: yes
        ssh_key_bits: 2048
        ssh_key_file: .ssh/id_rsa
    - name: Add user to sudoers
      copy:
        dest: /etc/sudoers.d/admin
        content: 'admin ALL = (root) NOPASSWD:ALL'
        owner: root
        group: root
        mode: 0440
    - name: Update the firewall for Ceph
      firewalld:
        port: "{{ item }}"
        permanent: true
        state: enabled
        zone: public
        immediate: true
      with_items:
        - 6789/tcp
        - 6800-7300/tcp
        - 8080/tcp
    - name: Update the firewall for NFS
      firewalld:
        service: nfs
        permanent: true
        state: enabled
        zone: public
        immediate: true
    - name: Add the gdisk package
      yum:
        name: gdisk
        state: present
    - name: Wipe OSD Disks
      shell: |
        sgdisk -Z {{ item }}
        sgdisk -g {{ item }}
      with_items: "{{ osd_disks }}"
[root@deployer pre-ceph]#                

To see how to generate the encrypted password for the admin user, see this link.

Now that our nodes appear to be ready for Ceph, we login to the Deployer node. First, push the ssh key to the three nodes and then according to the installation instructions, it has you create a SSH config file in step 3 to avoid using the -u switch when executing the Ansible playbook. I accomplish the same thing in my Ansible hosts file specifying ansible_user there:

[root@deployer ~]# egrep -v '(^#|^$)' /etc/ansible/hosts 
[mons]
server50.homelab.net ansible_user=admin ansible_host=172.16.210.50 monitor_address=172.16.210.50
server60.homelab.net ansible_user=admin ansible_host=172.16.210.60 monitor_address=172.16.210.60
server70.homelab.net ansible_user=admin ansible_host=172.16.210.70 monitor_address=172.16.210.70
[osds]
server50.homelab.net ansible_user=admin ansible_host=172.16.210.50
server60.homelab.net ansible_user=admin ansible_host=172.16.210.60
server70.homelab.net ansible_user=admin ansible_host=172.16.210.70
[mgrs]
server50.homelab.net ansible_user=admin ansible_host=172.16.210.50
server60.homelab.net ansible_user=admin ansible_host=172.16.210.60
server70.homelab.net ansible_user=admin ansible_host=172.16.210.70
[mdss]
server50.homelab.net ansible_user=admin ansible_host=172.16.210.50
server60.homelab.net ansible_user=admin ansible_host=172.16.210.60
server70.homelab.net ansible_user=admin ansible_host=172.16.210.70
[rgws]
server50.homelab.net ansible_user=admin ansible_host=172.16.210.50
server60.homelab.net ansible_user=admin ansible_host=172.16.210.60
server70.homelab.net ansible_user=admin ansible_host=172.16.210.70
[nfss]
server50.homelab.net ansible_user=admin ansible_host=172.16.210.50
server60.homelab.net ansible_user=admin ansible_host=172.16.210.60
server70.homelab.net ansible_user=admin ansible_host=172.16.210.70
[root@deployer ~]# 

One thing I want to mention with regards to the pre-requisites is that step 1 has you create ~/ceph-ansible-keys directory. In the event that you have to run the purge-cluster.yaml to start all over, you must re-do this pre-requisite as the purge appears to also remove this directory. If you do not create this directory before starting, you will get some very strange errors which will waste a ton of time researching it. Also, prior to executing the installation, not only make sure that ~/ceph-ansible-keys exists, but make sure it’s empty as well.

Here’s what my configuration files look like for this deployment:

[root@deployer ceph-ansible]# egrep -v '(^#|^$)' group_vars/all.yml
---
dummy:
fetch_directory: ~/ceph-ansible-keys
configure_firewall: false
ntp_service_enabled: true
ntp_daemon_type: chronyd
ceph_repository_type: cdn
ceph_origin: repository
ceph_repository: rhcs
ceph_rhcs_version: 3
generate_fsid: true
ceph_conf_key_directory: /etc/ceph
ceph_keyring_permissions: '0600'
cephx: true
monitor_interface: eno3
ip_version: ipv4
cephfs: cephfs # name of the ceph filesystem
cephfs_data: cephfs_data # name of the data pool for a given filesystem
cephfs_metadata: cephfs_metadata # name of the metadata pool for a given filesystem
cephfs_pools:
  - { name: "{{ cephfs_data }}", pgs: "64", size: "{{ osd_pool_default_size }}" }
  - { name: "{{ cephfs_metadata }}", pgs: "64", size: "{{ osd_pool_default_size }}" }
is_hci: true
public_network: 172.16.0.0/16
cluster_network: 192.168.210.0/24
osd_objectstore: bluestore
radosgw_interface: eno3
nfs_file_gw: true
nfs_obj_gw: true
ceph_docker_image: "rhceph/rhceph-3-rhel7"
ceph_docker_image_tag: "latest"
ceph_docker_registry: "registry.access.redhat.com"
[root@deployer ceph-ansible]#

[root@deployer ceph-ansible]# egrep -v '(^#|^$)' group_vars/osds.yml
---
dummy:
devices:
  - /dev/sdb
  - /dev/sdc
osd_scenario: collocated
osd_objectstore: bluestore
[root@deployer ceph-ansible]# 

Once you’re all configured, you can go ahead and kick off the deployment. In the event your deployment fails, check the troubleshooting section at the bottom of this blog to help try to figure out the issue and once you do, you’ll want to purge your deployment and start over and don’t forget to recreate ~/ceph-ansible-keys.

After about 40 minutes or so, you should have a successful run:

So we can test and ensure that we are working my mounting the filesystem both natively and as NFSv4 from the deployer node:

Notice we have both a cephfile and cephobject being served out on NFSv4. This is because if you go back and take a look at the all.yml file I used, I have nfs_obj_gw set to true. You will also notice that when I did my mounting test, I used the IP to server50.homelab.net. That is because there is no VIP when you deploy Ganesha NFS so you actually need to connect to each node independently. At first, you’re probably thinking, we can use Linux HA and put a VIP managed by pacemaker on the nodes. Well, I tried that first and the problem is that the time it takes to fail over the VIP is too long for RHV and ultimately what happens is any VM’s that were running go into a pause state which is less than ideal. So I needed a better solution.

I came to the realization that if my NFS IP went down/stopped working, there was a high likelihood that the node that was hosting the NFS also has issues or went down. So I decided that what I would do is I would use the loopback as the IP address for my NFS when I install RHV this way, each node can talk directly to it’s own Ganesha installation using the CephFS shared storage! And that brings me to the next hurdle: shared storage. If we are using CephFS as shared storage, we will need a locking mechanism to ensure that when a node goes down that the files are in a state where it can then be picked up by the other node. Fortunately, I don’t have to re-invent the wheel here as Jeff Layton has a great write up on setting up an Active-Active NFS cluster over CephFS. So let’s go ahead and implement the rados_cluster back end on our deployment:

Here’s a diff of the changes

And here’s the full version of ganesha.conf. Note, in the RADOS_KV section, each one of the nodes will have a unique nodeid specified. I used the FQDN to differentiate them, but you can use whatever identifier you like. You’ll just need to ensure it matches what you populate the database with in the next step.

[root@server50 ganesha]# cat ganesha.conf
# Please do not change this file directly since it is managed by Ansible and will be overwritten


NFS_Core_Param
{
    Enable_NLM = false;
    Enable_RQUOTA = false;
    Protocols = 4;
}

EXPORT_DEFAULTS {
	Attr_Expiration_Time = 0;
}

CACHEINODE {
	Dir_Max = 1;
	Dir_Chunk = 0;

	Cache_FDs = false;

	NParts = 1;
	Cache_Size = 1;
}

EXPORT
{
	Export_id=20133;
	Path = "/";
	Pseudo = /cephfile;
	Access_Type = RW;
	Protocols = 4;
	Transports = TCP;
	SecType = sys,krb5,krb5i,krb5p;
	Squash = Root_Squash;
	Attr_Expiration_Time = 0;
	FSAL {
		Name = CEPH;
		User_Id = "admin";
	}
}
EXPORT
{
	Export_id=20134;
	Path = "/";
	Pseudo = /cephobject;
	Access_Type = RW;
	Protocols = 3,4;
	Transports = TCP;
	SecType = sys,krb5,krb5i,krb5p;
	Squash = Root_Squash;
	FSAL {
		Name = RGW;
		User_Id = "cephnfs";
		Access_Key_Id ="199A7GV226J2UL485CYB";
		Secret_Access_Key = "VQqjzRPjbMEQjSQqgMC9bVFjCJ0ta28wmwuiaqHf";
	}
}

RGW {
        ceph_conf = "/etc/ceph/ceph.conf";
        cluster = "ceph";
        name = "client.rgw.server50";
}

LOG {
        Facility {
                name = FILE;
                destination = "/var/log/ganesha/ganesha.log";
                enable = active;
        }
}

NFSv4
{
    RecoveryBackend = rados_cluster;
    Minor_Versions = 1,2;
}

RADOS_KV
{
    pool = "cephfs_metadata";
    namespace = "ganesha";
    nodeid = "server50.homelab.net";
}

After we have updated ganesha.conf on each node, we need to restart Ganesha for our changes to take effect, however, before we bring it back up, let’s populate the recovery table with each nodeid. Unfortunately, Red Hat doesn’t provide the ganesha-rados-grace package, so I downloaded it from Fedora packages here and executed the necessary commands to extract it and execute what was needed:

As it’s a shared namespace, we only need to execute these on one node. Ensure the names you’re adding match what you put in your ganesha.conf file for the nodeid.

Once we have added all of our nodes to the recovery backend, each will have an ‘E’ flag after it meaning they are currently Enforcing the grace period. When we restart each of the NFS services, they will then get the ‘N’ flag added to them meaning they require a grace period. Documentation on ganesha-rados-grace can be found here.

It will take a minute or two before the flags clear and CephFS you’re ready to proceed with the RHV installation.

Red Hat Virtualization: Hosted Engine Deployment

Now we need to get ready for our RHV Hosted Engine deployment. First thing we will do is create a filesystem in CephFS to be used for the hosted engine. To do this, we will mount Ceph from our deployer node, create the necessary directories and then change ownership to the proper UID:GID.

[root@deployer ~]# mount -t ceph 172.16.210.50:6789:/ ~/mycephfs/  -o name=admin,secretfile=/root/admin.secret
[root@deployer ~]# cd mycephfs/
[root@deployer mycephfs]# ls
testfile.txt
[root@deployer mycephfs]# rm testfile.txt 
rm: remove regular file ‘testfile.txt’? yes
[root@deployer mycephfs]# mkdir hosted_engine
[root@deployer mycephfs]# mkdir iso_domain
[root@deployer mycephfs]# mkdir data_domain
[root@deployer mycephfs]# chown -R 36.36 .
[root@deployer mycephfs]# chmod 775 *
[root@deployer mycephfs]# ls -latr
total 4
dr-xr-x---. 10 root root 4096 Sep 18 20:07 ..
drwxrwxr-x   1 vdsm kvm     0 Sep 18 20:46 hosted_engine
drwxrwxr-x   1 vdsm kvm     0 Sep 18 20:46 iso_domain
drwxrwxr-x   1 vdsm kvm     0 Sep 18 20:47 data_domain
drwxr-xr-x   1 vdsm kvm     3 Sep 18 20:47 .
[root@deployer mycephfs]# cd ..
[root@deployer ~]# umount mycephfs/
[root@deployer ~]# 

At this point, normally we would start our hosted engine deployment, however, what I have found is that during the deployment, VDSM is installed which causes some issues for us.

During the VDSM installation, it installs multipath and the default configuration pulls all devices under multipath. The problem is, the OSD’s are pulled in and then your Ceph OSD service starts failing.

To work around this issue, I pre-installed device-mapper-multipath and installed my own multipath.conf file and made it so that VDSM doesn’t change it. After installing device-mapper-multipath, I took the default VDSM multipath.conf file from here and placed it as /etc/multipath.conf.

Next, I edited the /etc/multipath.conf and added my OSD drives to the blacklist:

The second thing I had do to was add a second line to the configuration to mark it as PRIVATE per this documentation.

After these changes, I was able to proceed with the installation and not have it destroy my Ceph cluster. We can now start our RHV installation per the official documentation.

# Quick rundown of commands
subscription-manager repos --enable=rhel-7-server-rhv-4-mgmt-agent-rpms
yum install rhvm-appliance tmux -y
tmux # Otherwise it warns you that you should use screen.
hosted-engine --deploy

We answer the questions when prompted until we reach the section where it is asking for our storage configuration. When it does, we will provide it with the loopback address for the server. Also, I’ve added additional NFS options in an attempt to improve performance (noacl,nocto,rsize=32768,wsize=32768) but to be honest, I haven’t actually verified they do improve anything being my network is my bottleneck anyway:

And with that deployment should complete cleanly:

Now that we have our hosted engine up and running on our single node, first thing that peaked my curiosity was the disk performance:

So this is sort of what I expected using the 1Gb network. I will need to look into upgrading to 10Gb. Anyway, next we need to add the other nodes into our cluster. To do this, go ahead and add the hosts like you would normally would and that’s about it!

Don’t forget to deploy the Hosted Engine so it will be highly available!

The new nodes will go to installing…

And eventually complete:

Post-Publishing Update:

The last piece of the puzzle that I had missed when I published this was to move the networking configuration for eno4 into VDSM so when we start using eno4 for VM’s, we don’t knock out our Ceph Cluster (which is exactly what I just did). The easiest way I found to do this is to do it manually on each node.

cd /var/lib/vdsm/persistence/netconf/nets
cp ovirtmgmt cephcluster
vi cephcluster

Update the contents for what we configured eno4 for (I also moved this to a specific VLAN in the process which is why you see the VLAN tag in the configuration below):

{
    "ipv6autoconf": true, 
    "nameservers": [], 
    "nic": "eno4", 
    "vlan": 210, 
    "ipaddr": "192.168.210.50", 
    "switch": "legacy", 
    "mtu": 1500, 
    "netmask": "255.255.255.0", 
    "dhcpv6": false, 
    "stp": false, 
    "bridged": true, 
    "defaultRoute": false, 
    "bootproto": "none"
}

Next, update ifcfg-eno4 as such:

cat /etc/sysconfig/network-scripts/ifcfg-eno4
# Generated by VDSM version 4.30.17.1.git0de043f
DEVICE=eno4
ONBOOT=yes
MTU=9216
DEFROUTE=no
NM_CONTROLLED=no
IPV6INIT=no

And then I rebooted the node. When it came back up, the networking all worked fine so the Ceph Cluster became healthy again, this time using the VDSM NIC Configuration. Then it was just a matter of going into RHV-M, Compute->Hosts and clicking on the host and selecting Management->Refresh Capabilities and cephcluster now showed as a configured network on that NIC. Finally, you just need to get that network recognized by RHV-M. So go into Networks->Networks, then click New and add the new network “cephcluster” to match what you created on each host. For my case, I added a VLAN tag to it as well. Save that and then go to each host again in RHV-M and this time do “Sync Networks”. After that, everything should be all lined up and ready to be used!

Testing the Deployment

The proof is in the final result, right? So let’s locate our HostedEngine (which would normally be on the server we installed on initially, however I moved it around while doing some testing so just for completeness sake I am showing it’s location):

Now I will access the IMM and power off the server. What I noticed immediately is that using the ganesha-rados-grace command didn’t work once the host was powered off. I then took a look at the state of ceph:

The hosted-engine status went into a bad state where all of the nodes were marked as not being “up to date”:

So I then started watching the ceph cluster using ceph -w and eventually the server I shutdown (server50) was evicted as a client:

2019-10-31 12:28:49.504526 mds.server60 [WRN] evicting unresponsive client server50.homelab.net (84102), after 303.649 seconds
2019-10-31 12:28:49.504536 mds.server60 [INF] Evicting (and blacklisting) client session 84102 (172.16.210.50:0/1385146810)

After this happened, I was able to execute ganesha-rados-grace again, but it showed all nodes with no flags:

But I still had no Hosted Engine anywhere. I started digging through VDSM logs, and such but couldn’t find anything and after about 10 minutes had passed, I noticed all of the nodes were now reporting their status as being up to date and the server I had powered off showed that the VM was running on it?

Odd. So I let a couple more minutes go by and then the hosted-engine was brought up on server70:

hosted-engine –vm-status shows server70 is now hosting the HE.
Confirmed with virsh!

I was able to login to the Admin Portal and everything looked good! I will look into why the failover took so long, but for now, I will mark this experiment as complete and start using my lab for other useful things. I hope. 🙂

Troubleshooting

Re-installing a failed deployment of Ceph

If you have a failed deployment, you will need to purge the existing run before re-executing ceph-ansible. To do this, on your Deployer machine:

[root@deployer ~]# cd /usr/share/ceph-ansible/
[root@deployer ceph-ansible]# cp ./infrastructure-playbooks/purge-cluster.yml .
[root@deployer ceph-ansible]# ansible-playbook purge-cluster.yml
# Answer with yes when prompted, lowecase y.
# When it's purged:
[root@deployer ceph-ansible]# rm -rf ~/ceph-ansible-keys;mkdir ~/ceph-ansible-keys

Ceph installation: OSD Preparation Fails

If during the deployment, you fail on the OSD preparation:

PLAY RECAP ***********************************************************************************************************************
server50.homelab.net       : ok=383  changed=39   unreachable=0    failed=1
server60.homelab.net       : ok=368  changed=37   unreachable=0    failed=1
server70.homelab.net       : ok=370  changed=38   unreachable=0    failed=1


INSTALLER STATUS *****************************************************************************************************************
Install Ceph Monitor        : Complete (0:12:31)
Install Ceph Manager        : Complete (0:03:47)
Install Ceph OSD            : In Progress (0:02:59)
        This phase can be restarted by running: roles/ceph-osd/tasks/main.yml

Most likely it is due to a previous deployment of Ceph being picked up on the disks or something became corrupt with the bluestore metadata. If you look through the large traceback, you can confirm this if you see the following message:

2019-09-18 14:53:46.261954 7fdc79e4dd80 -1 bluestore(/var/lib/ceph/tmp/mnt.0_ooJA/block) _check_or_set_bdev_label bdev /var/lib/ceph/tmp/mnt.0_ooJA/block fsid 7ca5859b-6788-468c-961b-2db1baa49134 does not match our fsid b4acc93b-562d-4ecd-856c-63cfc1e684b0

2019-09-18 14:53:46.519128 7fdc79e4dd80 -1 bluestore(/var/lib/ceph/tmp/mnt.0_ooJA) mkfs fsck found fatal error: (5) Input/output error

2019-09-18 14:53:46.519157 7fdc79e4dd80 -1 OSD::mkfs: ObjectStore::mkfs failed with error (5) Input/output error

If you hit this, you can use the Ceph to run ceph-disk and zap the drives. Run the following to use ceph-disk to zap the devices and then use sgdisk to also clear the partition table:

ceph-disk zap /dev/sdb
ceph-disk zap /dev/sdc
sgdisk -Z /dev/sdb
sgdisk -Z /dev/sdc

Additionally, ensure that after you run the purge that ~/ceph-ansible-keys exists and is empty prior to running the ceph-ansible installation.

Node shows Status Up however the HE icon states “Unavailable due to HA Score”

This happens when you do a re-install and appears to be related to this bugzilla. To fix this, login to the host as root and enter the following:

hosted-engine --set-maintenance --mode=none
Back to Top