LizardFS lost chunks

 

 

On reload you can lose some chunks that were in-transit, i.e. during a replication.

Relevant Bugs

chunkserver: random chunk errors on restart

https://github.com/lizardfs/lizardfs/issues/324

 

Chunk loss on unclean shutdown of master during changes

https://github.com/lizardfs/lizardfs/issues/230

restarting master may cause read errors

https://github.com/lizardfs/lizardfs/issues/358

New undergoal chunks while chunkserver is down

https://github.com/lizardfs/lizardfs/issues/338

 

 

 

 

So we have two affected files, and also two choices.

 

1) Deleting them

root@mfsmaster /mfsmount/movies/Anime1 # mfscheckfile  some-movie.mkv
some-movie.mkv:
 chunks with 0 copies:            1
 chunks with 1 copy:              1
 chunks with 2 copies:            4
root@mfsmaster /mfsmount/movies/Anime1 # rm  some-movie.mkv

This is where you restore your backup :-)

2) Erasing the affected blocks

root@mfsmaster /mfsmount/bacula00 # mfscheckfile  zzz-FullPool-20151205-1839-28130
zzz-FullPool-20151205-1839-28130:
 chunks with 0 copies:           10
 chunks with 1 copy:            132
 chunks with 2 copies:           18
root@mfsmaster /mfsmount/bacula00 # mfsfilerepair  zzz-FullPool-20151205-1839-28130
zzz-FullPool-20151205-1839-28130:
 chunks not changed:        150
 chunks erased:              10
 chunks repaired:             0

Note, we didn't *repair* anything here. I literally had it zero out the affected data.

Reasoning: I'd rather have *most* of that full backup around. Bacula is smart enough to pinpoint the errors, and hopefully I'm smart enough to get along with the rest of that backup.

The amount of data affected would be 640MB, out of a 199GB backup, so a little under 1%.

 

The overview page (Important messages) will still display the missing files until the next check loop has finished.

 

 

How to avoid this?

Lobby for software liability laws

Get a support contract

Force the authors to do regression and fault testing

Be prepared to pay for it.