ext3 / ext4 online fsck

 

 

Here's a script that uses LVM snapshots to online-check your filesystems.

It helps you to quickly become aware of filesystem bugs and monitor for them using Nagios/Check_MK or whatever you prefer.

What you need is not much:

  • You need to use LVM (otherwise no snapshot, d'oh)
  • You need a little free space in the VG. I just reserve 100M, and basically, that'll be enough in almost all cases
  • ext3/ext4 filesystem

Thin pools alleviate the space requirement, but I don't use them in prod yet.

XFS until recently still had online fsck (they broke it), but I didn't see it break as much as ext. Hand me output of a broken one and I'll add it to the script.

 

The script is run like this:

root# for lv in `egrep "/.*ext4" /etc/fstab | awk '{print $1}'`; do ./fsshot.sh $lv ; done
ext filesystem /dev/wheezytest/root has no errors
ext filesystem /dev/wheezytest/home has no errors
ext filesystem /dev/wheezytest/tmp has no errors
ext filesystem /dev/wheezytest/usr has no errors
ext filesystem /dev/wheezytest/var has no errors

And such a check only takes a few seconds (it does go via the filesystem journal, this is not meant to track deeply buried filesystem rot)

 

 

 

no warranties, bsd license.

 

I didn't know where to put it, so right now the script resides here on the wiki.

fsshot.sh
#!/bin/bash -eu
# hunt for the following error
# /dev/mapper/wheezytest-var: ********** WARNING: Filesystem still has errors **********
if [ $# = 1 ]; then
    lvpath=$1
else
    echo "specify fsck source lv"
    exit 1
fi
if ! [ -b $lvpath ]; then
    echo "specified target $lvpath is not a block device"
fi

t=$( lvdisplay $lvpath 2>&1 )
if [ $? = 5 ]; then
    echo "specified target $lvpath is not a logical volume"
fi
if [ $(dirname $lvpath) == "/dev/mapper" ]; then
    # find the real lv name over devmapper name
    lvpath=$( lvdisplay $lvpath | grep "LV Path" | awk '{print $3}')
fi

check_fs()
{
errstring="WARNING: Filesystem still has errors"
if fsck -n $snapname 2>&1 | grep "$errstring" > /dev/null; then
    msg="errors"
else
    msg="no errors"
fi
echo "ext filesystem $lvpath has $msg"
}

snap_add()
{
vgname=$(dirname  $lvpath)
lvname=$(basename $lvpath)
snapname=snapfsck_$lvname

if [ -b $vgname/$snapname ]; then
    echo "Old fsck snapshot found, will be removed"
    snap_remove
fi

lvcreate -s --size 100m -n $snapname $vgname/$lvname 2>&1 > /dev/null
if [ $? != 0 ]; then
    echo "Snapshot creation failed, aborting"
    exit 1
fi
export vgname snapname 
}

snap_remove()
{
lvremove -f $vgname/$snapname 2>&1 > /dev/null
return $?
}
if snap_add ; then
    check_fs
    snap_remove
fi