[WRN] evicting unresponsive client
This warning shows that a client stopped responding to messages from the MDS. Sometimes it is harmless (perhaps a client disconnected "uncleanly", e.g. a hard reboot), or it could indicate the client is overloaded, deadlocked on something else.
If the same client is appearing repeatedly, it may be useful to get in touch with the owner of the client machine. (
ai-dump <hostname> on aiadm).
[WRN] clients failing to respond to cache pressure
When the MDS cache is full, it will need to clear inodes from its cache. This normally also means that the MDS needs to ask some clients to also remove some inodes from their cache too.
If the client fails to respond to this cache recall request, then Ceph will log this warning.
Clients stuck in this state for an extended period of time can cause issues -- follow up with the machine owner to understand the problem.
Note: Ceph-fuse v13.2.1 has a bug which triggers this issue -- users should update to a newer client release.
[WRN] client session with invalid root denied
This means that a user is trying to mount a Manila share that either doesn't exist or they didn't create a key yet. It is harmless, but if repeated then get in touch with the user.
Procedure to unblock hung HPC writes
An HPC client was stuck like this for several hours:
HEALTH_WARN 1 clients failing to respond to capability release; 1 MDSs report slow requests MDS_CLIENT_LATE_RELEASE 1 clients failing to respond to capability release mdscephflax-mds-2a4cfd0e2c(mds.1): Client hpc070.cern.ch:hpcscid02 failing to respond to capability release client_id: 69092525 MDS_SLOW_REQUEST 1 MDSs report slow requests mdscephflax-mds-2a4cfd0e2c(mds.1): 1 slow requests are blocked > 30 sec
Indeed there was a hung write on hpc070.cern.ch:
# cat /sys/kernel/debug/ceph/*/osdc 245540 osd100 1.9443e2a5 1.2a5 [100,1,75]/100 [100,1,75]/100 e74658 fsvolumens_393f2dcc-6b09-44d7-8d20-0e84b072ed26/2000b2f5905.00000001 0x400024 1 write
I restarted osd.100 and the deadlocked request went away.