Symptoms
- Two external drives (one standard ext4 and the other part of LVM to ease later expansion) mounted in a USB-connected docking bay would from time-to time simply disappear. Semi-solution was to unplug and replug the USB cable, but sometimes that didn't work and I had to cycle power to the device for it to be visible again.
- I could remount the non-LVM disk, but remounting the LVM disk required rebooting the computer until I discovered the incantation of commands to get LVM's toys and its cot reunited.
Background
I have an old Dell Precision "workstation" laptop with a burnt out GPU and broken screen which I use at home for a Proxmox server running a few infrastructure type programs. It still idles most of the time (just checked: load average of 0.00 over the last 15 minutes), so I also use it for testing things.
Some time ago I was evaluating a potential backup server. I had set the backup agent on my wife's laptop to do an incremental backup so many times an hour, while mine was about once every hour or two. Since I had two old hard drives handy I connected them via a dual docking station / disc copier because they could not fit into the laptop.
Things seemed to be going well after I figured out how they specified what (not) to back up, but then the above problem popped up and it was a real issue to solve.
Discovery of the reason
It was a long and arduous journey (uphill both ways in the snow), and after trying various driver-level tricks (none of which worked), I came to the conclusion that the drive dock decided after n+1 periods of no activity that it might as well shut down. This was borne out by the (spinning) drives going quiet and later the lights on the device going out. When the drives were powered down a cable un- and re-plug worked, but when the lights went off, not even a reboot worked; then I had to cycle the device's power.
The problem was that when it decided to go to sleep it never informed the OS, so the kernel was under the impression that the drives were still attached. When I un- and re-plugged the USB cable or cycled the device's power, the journal (Proxmox needs systemd) showed that new hardware was detected and because the kernel believed that the original bus and device slots assigned were still valid, it assigned new ones. The fact that the serial number was the same didn't help, given that many manufacturers re-use serial numbers (some even use fake product IDs so they can piggy-back on another manufacturer's drivers).
Careful reading of the journal showed that while the ext4 formatted drive was OK, the whole LVM system was unhappy, because it already had that volume with its associated IDs registered and believed it was still live, so it threw its toys out of the cot.
Solution 1
I had to run these commands to get LVM happy again:
Code: Select all
umount /dev/vg-backup/lv-backup
dmsetup remove -f vg--backup-lv--backup
pvscan --cache
vgscan
vgchange -an vg-backup
vgchange -ay vg-backup
mount /mnt/backup
/usr/local/bin/send-push 1 cron remounted /mnt/backup. Check other mount points.
This is all very well, but it is in response to a bad situation which is very difficult to discern because the device disconnected quietly. The actual solution is to ensure that it doesn't happen in the first place, which brings us to…
Solution 2
I needed the docking device equivalent of a stay-alive packet for a network connection, and this is where the fun started (not). It turned out that almost everything I tried to use was, in the name of efficiency, cached by the server, and those commands which weren't cached caused a reset. The first case didn't help, because it would not keep the device alive, while the second would not help, because it might be in the middle of a backup or a restore operation. The other option was to write data to the disc every now and then, but I'd rather it wears out because of actual data, and not as part of keeping a device connected.
In the end I discovered that querying the hard drive's temperature would do the trick as that cannot be cached, and because the dock is involved in the process its sleep timer restarts as well.
What makes it difficult is that
- the program that does the querying doesn't work when you give it a logical volume name, as that could be spread across various drives, so you need to know the actual device assignment.
- the device assignment is not fixed, so it might be "/dev/sdb" today and "/dev/sdc" tomorow.
Doing all of that leads to this rather inelegant code:
Code: Select all
#!/bin/bash
DRIVE="/dev/\
`dmesg \
| tac \
| grep -B 20 DD564198838DA \
| head -21 \
| grep '\[sd.\]' \
| awk '{print $5}' \
| tr -d '\[\]' \
| sort \
| uniq \
| head -1`"
hdparm -H $DRIVE >/dev/null 2>&1
exit 0
Of course, there is a non-zero chance that the assignment of a previously (only just previously) discovered drive would pop up in the journal after the device serial number is recorded and before any related assignments are made, so there is room for improvement. [Edit: fixed grammar]
There you go! The distillation of many clumps of hair. Put your own device's serial number in after the "grep -B 20 " and before that line's " \", save it, then call it as often as required. In my case I use `cron`. [Edit: Sigh. Added "20 " to the "grep" bit.]
Yes, I know that "hdparm" does work with drive entries in "/dev/disk/by-id/" and that using that would obviate the whole "dmesg" dance if I want to query a specific drive, but I want a solution which works with any hard drive in the dock.
This has been running successfully for about 24 hours now, so I think it's safe to say that it works.