Context
I have to back up various functioning 4TB HDD which I do not update (think of them as an archive of files from my previous life). Actually rather transfer the data from these many 4TB HDD into fewer big 22TB drives, destroy the 4TB, and not risk losing my data which is now on the 22TB. Better ideas than the strategy below? Do you see any obvious point of failure in the strategy below?
Question
I’m buying two HDD of 22TB each; I’m planning to use good old ext4 on each one of the 22TB, and for each 4TB HDD create an .img (using Gnome disks, or dd, or ddrescue) to store on such 22TB disk. Checkum (md5) each 4TB .img file and note down the values somewhere. Then change computer (to protect against bad ram), and repeat for the other 22TB disk. Check that the checksums match and store the 22TB in different houses; destroy the original 4TB disks; every once in a while access read only this or this other file from one of the 22TB; and also peruodically check the checksum on the 22TB drives (but never have the 22TB physically in the same house simultaneously). All this I think conforms to the 3-2-1 backup strategy because recall as written above: such data will not get updated.
Considerations I’ve made towards an answer to this question.
One of the things I’m wary of is the following. There is a reason why raid5 is generally a bad idea: when one drive fails then during the rebuild process of the array it happens sometimes that one other drive fails (because of the stress that rebuilding puts on the devices) hence causing unfixable data loss. I never understood what really is causing this ‘stress’ and I dont understand why rebuilding raid5 is more stressful than rebuilding say raid1 (that is cloning the surviving raid1 drive). In fact I read that this stress in the raid5 case is indeed due to continuous prolonged read, which also applies in the raid1 case. This is to say that I wonder if either A. creating .img files by reading the 4TB HDD and writing on the 22TB HDD, or B. checksumming the few massive 4TB .img contained in the 22TB every once in a while, or C. cloning the surviving 22TB if one of the 22TB fails, produces this same stress that makes raid5 prone to data loss, hence suggesting my strategy be also prone to data loss.
Is superuser the appropriate SE community for this question? Add suitable tags?