RHHI-V Gluster Corruption: Shards and all...

March 25, 2021 ( last updated : March 25, 2021 )
rhel gluster ovirt


I was recently asked to help out with a problematic cluster. The 3 node cluster was running Red Hat Hyperconverged Infrastructure for Virtualization (RHHI-V), essentially a 3 node oVirt cluster running on the same hardware as a 3 node Gluster cluster. Storage is local but shared, as is the compute resources. However, one of the hosts was throwing a “Host (host) cannot access the Storage Domain(s) (storage) attached to the Data Center (dc). Setting Host state to Non-Operational.” and a simple transition via Maintenance mode was not fixing it. This is that story…

Warning

First things first, I genuinely hope that no one never needs this page. This is not a place of joy and unicorns. Or at least prepare yourself with some emotional comfort, as this will get deep.

Version Information

Yes, it is 2021. Don’t judge, this is what it is:

Possibly Useful Related Links

Other Notes

In oVirt, a Storage Domain is generally reprented as a name and a UUID. In the below section I am using $STORAGE_DOMAIN_UUID to refer to the UUID of the Storage Domain. This value can be found in the web UI page for the storage in question.

So, what is the issue?

As I mentioned earlier, in the Events section of the Host information within the UI, there was a constant stream of errors relating to why the host could not participate with the cluster, one key phrase seemed to be:

“Host (host) cannot access the Storage Domain(s) (storage) attached to the Data Center (dc). Setting Host state to Non-Operational.”

In addition to this, there had been some recent weird behaviour with the VM’s using this storage domain. In fact, the main fault apparently started with some uncharacteristically heavy disk usage during a migration process.

vdsm.log

One of the main log files that can generally be useful on hosts is the vdsm log file at /var/log/vdsm/vdsm.log. This yielded the following:

    ERROR (monitor/397c12a) [storage.Monitor] Setting up monitor for ($STORAGE_DOMAIN_UUID) failed (monitor:330)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line 327, in _setupLoop
    self._setupMonitor()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line 349, in _setupMonitor
    self._produceDomain()
  File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 159, in wrapper
    value = meth(self, *a, **kw)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line 367, in _produceDomain
    self.domain = sdCache.produce(self.sdUUID)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 110, in produce
    domain.getRealDomain()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 51, in getRealDomain
    return self._cache._realProduce(self._sdUUID)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 134, in _realProduce
    domain = self._findDomain(sdUUID)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 151, in _findDomain
    return findMethod(sdUUID)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/glusterSD.py", line 62, in findDomain
    return GlusterStorageDomain(GlusterStorageDomain.findDomainPath(sdUUID))
  File "/usr/lib/python2.7/site-packages/vdsm/storage/glusterSD.py", line 58, in findDomainPath
    raise se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does not exist: (u'$STORAGE_DOMAIN_UUID',)

Oh. This is not a great start. Needless to say, the uuid included in place of the $STORAGE_DOMAIN_UUID in above was the id of the storage domain that was raised as problematic in the host errors above.

Querying this log file on the functioning hosts confirmed this error was absent. It looked like this node had lost the mount point for this Storage Domain even though it was not showing any errors or warnings in the UI.

Querying against vdsm directly using the CLI also resulted in the following error, only on the failing host.

[[email protected]]# vdsm-client StorageDomain getInfo storagedomainID="$STORAGE_DOMAIN_UUID"
vdsm-client: Command StorageDomain.getInfo with args {'storagedomainID': '$STORAGE_DOMAIN_UUID'} failed:
(code=358, message=Storage domain does not exist: (u'$STORAGE_DOMAIN_UUID',))

Gluster

Wanting to confirm that we were not likely to be looking at a data loss scenario, I wanted to confirm that the gluster volumes were apparently healthy:

gluster volume status all

The output was showing that all the related volumes were online, no tasks were runnings and I could see that all three nodes were online. Great!

At this point Gluster is apparently happy and has all nodes online. I have confirmed this via checking mount points (mount) and all were present across all nodes in the cluster. At one point I attempted to check the contents of the mounted gluster volume of the failed note (ll /rhev/data-center/mnt/glusterSD/(fqdn\:_(data_domain)/) and was greeted with the not uncommon error message: “Transport endpoint is not connected”.

So it turns out that whilst the mount exists, it is a long way from happy.

The original plan at this point was to wait for a window when the remaining functional VM’s using this storage could be shutdown, allowing this storage to be placed in maintenance mode and reactivated using the oVirt UI. In the meantime, I decided to try and find further logs to get to the bottom of the issue.

Deeper Gluster

I eventually discovered that the gluster client keeps log files for individual volumes at /var/log/glusterfs/rhev-data-center-mnt-glusterSD-(fqdn)\:_(storage_domain).log and these yielded further information:

[datetime] W [MSGID: 109009] [dht-common.c:3072:dht_lookup_linkfile_cbk] 0-(domain)-dht: /.shard/84f6af5c-1f6a-4d95-9ac7-f909ec204cd8.10245: gfid different on data file on (domain)-replicate-1, gfid local = 00000000-0000-0000-0000-000000000000, gfid node = 0eecf260-96f1-4f55-ac26-e6dd0772e8dd 
[datetime] W [MSGID: 109009] [dht-common.c:2810:dht_lookup_everywhere_cbk] 0-(domain)-dht: /.shard/84f6af5c-1f6a-4d95-9ac7-f909ec204cd8.10245: gfid differs on subvolume (domain)-replicate-1, gfid local = 748bc912-d085-4f34-8197-8783c69dcbc5, gfid node = 0eecf260-96f1-4f55-ac26-e6dd0772e8dd
[datetime]E [MSGID: 133010] [shard.c:2299:shard_common_lookup_shards_cbk] 0-(domain)-shard: Lookup on shard 10245 failed. Base file gfid = 84f6af5c-1f6a-4d95-9ac7-f909ec204cd8 [Stale file handle]
pending frames:
frame : type(1) op(WRITE)
frame : type(1) op(WRITE)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash: 
2021-03-25 06:18:53
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.12.2
/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0x9d)[0x7fadb0eafbdd]
/lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7fadb0eba154]
/lib64/libc.so.6(+0x363f0)[0x7fadaf4e93f0]
/lib64/libuuid.so.1(+0x2570)[0x7fadb0610570]
/lib64/libuuid.so.1(+0x2606)[0x7fadb0610606]
/lib64/libglusterfs.so.0(uuid_utoa+0x1c)[0x7fadb0eb92ec]
/usr/lib64/glusterfs/3.12.2/xlator/features/shard.so(+0xa596)[0x7fada9112596]
/usr/lib64/glusterfs/3.12.2/xlator/features/shard.so(+0x8fa5)[0x7fada9110fa5]
/usr/lib64/glusterfs/3.12.2/xlator/features/shard.so(+0xa61c)[0x7fada911261c]
/usr/lib64/glusterfs/3.12.2/xlator/features/shard.so(+0x6b47)[0x7fada910eb47]
/usr/lib64/glusterfs/3.12.2/xlator/features/shard.so(+0xa721)[0x7fada9112721]
/usr/lib64/glusterfs/3.12.2/xlator/features/shard.so(+0xad84)[0x7fada9112d84]
/usr/lib64/glusterfs/3.12.2/xlator/features/shard.so(+0xb196)[0x7fada9113196]
/usr/lib64/glusterfs/3.12.2/xlator/features/shard.so(+0xb87f)[0x7fada911387f]
/lib64/libglusterfs.so.0(synctask_wrap+0x10)[0x7fadb0ee9840]
/lib64/libc.so.6(+0x48180)[0x7fadaf4fb180]
---------

At this point it was back to ye ole internet search, with very minimal results found. Though I did find one interesting related link: https://lists.gluster.org/pipermail/gluster-users/2019-January/035602.html

See the above Possibly Related Links section for further lists, but there was not much success in recovering from this failure. As I mentioned earlier, I was hoping that dropping the storage through Maintenance mode may be enough to update some configuration and move on. This was looking increasingly unlikely.

Remediation?

Once the outage window was granted and all guest VM’s using this storage domain were shutdown, the Storage Domain was placed in Maintenance Mode. This is not always trivial, but you have to navigate to the Domains from the Data Center navigation to get to the appropriate management page. No, I don’t know why.

At this point, nothing worked. Even with the previously degraded host in Maintenance mode it was not possible to bring the previously functional Domain back online on any hosts. Congrats, the problem is now worse!

Checking the gluster mount logs on all three nodes showed that the one shard error documented earlier on the failed node was now present on all nodes. Not part of the plan, but not entirely suprising in hindsight.

At this point I started getting creative based on the suggestions in the mailing list posts mentioned above… after all, how bad could it be?

The Remediation Plan NOTE: This will likely result in data loss, I have found no other way to continue though. And the data may be already lost to corruption anyway.

1) Identify Problematic Shards by running this and comparing the output on all hosts within the cluster. cat /var/log/glusterfs/rhev-data-center-mnt-glusterSD-(fqdn)\:_(domain).log | grep "dht-common.c:3072:dht_lookup_linkfile_cbk" I only had one shard file that was on all hosts, so started with that one. The part you are looking for is something like: /.shard/84f6af5c-1f6a-4d95-9ac7-f909ec204cd8.10245. The hash is likely different in your installation.

1) Identify relevant gluster files with the trusted.gfid attribute to find relevant files

```
getfattr -d -m . -e hex /gluster_bricks/(domain)/(domain)/.shard/{from above}
``` 1) The `trusted.gfid` attribute from the previous step needs to be reformatted to identify the `glusterfs` file. For example, the relevant file for my situation ended up being `/gluster_bricks/(domain)/(domain)/.glusterfs/74/8b/748bc912-d085-4f34-8197-8783c69dcbc5`. Note the first 4 characters form part of the path. 

Using `stat` this file can be inspected. I repeated this process on all bricks and nodes of the cluster. In my case all files were empty.

1) Delete the glusterfs file and the shard file. As I said, this may result in data loss, but I figured this data was lost already as all these files were 0 bytes in my situation.

1) To encourage oVirt to take care of the situation, I unmounted the gluster volume on all nodes with umount /rhev/data-center/mnt/glusterSD/(fqdn)\:_(domain).

1) At this point I was able to re-activate the Storage Domain within the Web UI. Once this completed succesfully I was then able to move the previously problematic host out of maintenance mode and the cluster was completely Green again!

This will do for now.

How did we get here?

Great question. None of the descriptions of similar issues encountered online match our experience. However, putting together some info from https://bugzilla.redhat.com/show_bug.cgi?id=1553133 hinting at a correlation of potential corruption to high disk utilisation in these earlier versions of Gluster. It is really important to note that the issue mentioned in this bug is fixed in a very near following minor release of Gluster, meaning that none of this should be possible on newer versions.

And this is why I say that I hope no one ever needs this page.

Meme: NOOO

Aside: Installing bonnie++ and htop on Atomic RHEL

No external dependencies are needed for these packages, just pull them down with curl and install locally.

curl https://download-ib01.fedoraproject.org/pub/epel/7/x86_64/Packages/h/htop-2.2.0-3.el7.x86_64.rpm > htop.rpm
rpm -ivh htop.rpm
rm htop.rpm

curl https://download-ib01.fedoraproject.org/pub/epel/7/x86_64/Packages/b/bonnie++-1.97.3-1.el7.x86_64.rpm > bonnie.rpm
rpm -ivh bonnie.rpm
rm bonnie.rpm

It goes without saying this is not really recommended… but it works!

Originally published March 25, 2021 (Updated March 25, 2021)

Related posts :