Rescuing a Flaky JMicron USB Backup Drive That Kept Going Read-Only

My external USB backup drive started misbehaving: it took ages to mount, sometimes did not mount at all, and when it did mount it would flip itself to read-only within seconds. It turned out to be two problems feeding each other – a buggy USB-to-SATA bridge and a corrupted ext4 filesystem – and crucially they had to be fixed in the right order, or the repair simply would not stick.

The drive (/dev/sda1, ext4, label Backup, mounted at /run/media/jdyer/Backup) appeared to mount fine, then a write test would fail:

touch /run/media/jdyer/Backup/.writetest
# touch: cannot touch ...: Read-only file system

The mount table still claimed rw, but the kernel had silently remounted it read-only to protect the data. The kernel log told the real story.

A look through the kernel ring buffer revealed both problems at once:

sudo dmesg | grep -iE 'sda|uas|usb-storage|EXT4'

Problem 1 – the filesystem was corrupt. Seconds after mounting, ext4 aborted its journal and went read-only:

EXT4-fs error (device sda1): ext4_validate_block_bitmap: bg 125: bad block bitmap checksum
Aborting journal on device sda1-8.
EXT4-fs (sda1): Remounting filesystem read-only

The errors=remount-ro mount option is doing its job here – detecting metadata corruption and flipping read-only rather than risk writing more garbage.

Problem 2 – the USB bridge kept dropping out. The enclosure uses a JMicron 152d:a578 bridge running in UAS (USB Attached SCSI) mode, and its firmware is buggy:

uas_eh_device_reset_handler FAILED err -19
Synchronize Cache failed: hostbyte=DID_NO_CONNECT
usb 4-2: USB disconnect

JMicron bridges are notorious for flaky UAS firmware. That disconnect-mid-write is almost certainly what corrupted the filesystem in the first place – which means repairing the filesystem without fixing the bridge would just let it happen again.

Before trusting the drive with backups, I confirmed the physical disk was healthy with SMART:

sudo pacman -S smartmontools
sudo smartctl -a /dev/sda

The numbers that matter were all pristine:

SMART overall-health self-assessment test result: PASSED
  5 Reallocated_Sector_Ct    0
197 Current_Pending_Sector   0
198 Offline_Uncorrectable    0
199 UDMA_CRC_Error_Count     0

So the platter was fine. The corruption was the bridge, not failing media – exactly the right conclusion to commit to before doing anything destructive.

This is the bit that caught me out. My first instinct was to unmount and run e2fsck straight away:

umount /run/media/jdyer/Backup
sudo e2fsck -f -y /dev/sda1

It fixed "some things" and a second pass reported clean. But after a replug the exact same bg 125 error came straight back. Running e2fsck over the still-unstable UAS link is worse than useless: its repair writes were being dropped or never flushed, so the "clean" result was a lie – the platter still held the corruption.

Fix the bridge first. The cure for the JMicron UAS bug is a usb-storage quirk that forces the slower-but-stable BOT (Bulk-Only Transport) mode instead of UAS:

echo 'options usb-storage quirks=152d:a578:u' | sudo tee /etc/modprobe.d/usb-quirks.conf

After a reboot, confirm the quirk took effect:

dmesg | grep -i uas
# usb 4-2: UAS is ignored for this device, using usb-storage instead
# usb-storage 4-2:1.0: Quirks match for vid 152d pid a578: 800000

No more uas_eh_device_reset or DID_NO_CONNECT lines – the connection was finally stable.

Then repair the filesystem. With a reliable link, e2fsck could finally do real work:

umount /run/media/jdyer/Backup
sudo e2fsck -f -y /dev/sda1

This time it found a huge amount of damage the earlier bogus run had never touched – salvaging corrupted directories, fixing hundreds of inode reference counts, and reconnecting around 5,800 orphaned inodes to /lost+found. The file count jumped from 1,677 to 7,544. Use -f to force a full check: a journal abort can leave the superblock looking "clean" so a normal fsck would skip it.

A repair pass that large must be followed by a confirming pass. Run e2fsck a third time and check it ends with no Fix? prompts and no FILE SYSTEM WAS MODIFIED:

Pass 1: Checking inodes, blocks, and sizes
...
Pass 5: Checking group summary information
Backup: 7544/61054976 files (1.1% non-contiguous), 62904596/244190208 blocks

Identical counts and no modifications – proof the repair persisted, which over the old UAS link it never did.

A few practical notes once everything was clean:

  • /lost+found now holds thousands of recovered files with numeric names. Because this is a backup drive the originals still live on the source, so the cleanest fix is just to re-run the backup and let it restore the proper structure, then clear lost+found afterwards.
  • That first rsync after an fsck re-copies everything, because the repair changes metadata timestamps. It settles back to fast incremental runs afterwards – not a cause for alarm.
  • Keep the /etc/modprobe.d/usb-quirks.conf file in place permanently. It is the thing that actually fixed the root cause; without it the corruption would simply return.

The lesson

When a USB disk goes intermittently read-only, resist the urge to reach straight for fsck. Check dmesg for UAS resets and disconnects first, rule out the disk with SMART, and stabilise the connection before you repair the filesystem – otherwise you are just writing repairs into a void.