Saturday, February 24, 2018

Data backup and simple duplication

Disclaimer: RAID arrays with duplication protect against most drive failures, and can also protect against corruption of data depending on the setup. However, if all your data is physically located in one building, you can still lose everything. If your PSU decides it's tired of life and wants to take your system with you, you'll lose everything. If someone finds a bedbug in a different apartment and decides to burn the whole building to the ground, you'll lose everything. A safe backup of your local data is a remotely controlled and located NAS, or a cloud storage system.

On the opposite side of the spectrum, you can maximize your backup safety and longevity at the expense of near total inaccessibility. Buy an LTO tape cartridge, marvel at the $14/TB price ratio, then cry when you see that a tape drive will set you back over $3k. Transfer your data and put the cartridges in an airtight stainless steel time capsule. Add some "DO NOT EAT" packets and seal it in a low humidity room. Bury it in the Los Angeles area for optimal storage temperatures. You can return any time within the next two to three decades, retrieve your data, and cry again as you realize your original tape drive has failed and you need a functioning piece of vintage hardware to retrieve your dog pictures and logs of high school IRC/AIM chats.

I've dreamed of having a full-on standalone NAS system for a long time. I wanted a very expandable zRAID setup with tolerance for two drive failures, running on FreeNAS or something comparable. The hardware requirements for FreeNAS seem reasonable at first glance, but after a little digging you will realize that a smoothly-performing system is going to require a LOT of ECC RAM. You'll also need enough SATA ports for all your drives. I recommend an expansion card with internal SAS ports, each of which can support 4 SATA ports via a breakout cable. You'll need an Ethernet controller that can handle the speeds you want. Throw in the case, PSU, CPU, a motherboard that can support all of the above, the drives themselves, and you have a very expensive system. Before buying anything, check the documentation for your NAS software to make sure all components are supported.

I couldn't justify all that to back up my meager 5TB of data, so I spent $30 on a license for Drivepool and $190 on a 6TB drive.

I had two 3TB drives I wanted to duplicate. The goal was protecting against a single drive failure. I assumed I could just turn on duplication and DrivePool would mirror them on the 6TB drive, but that's not how it works. DrivePool used the disks to create a pool as a lettered logical drive. Data has to be copied to the pool before it can be duplicated. I originally decided to partition the 6TB drive into two partitions and create two pools, each containing a 3TB drive and a new 3TB partition. After spending a day or two doing a completely unnecessary full format on these partitions, I realized that the partitions were pointless and it was simpler to leave each drive with a single partition and copy all three to one pool. I made sure duplication was turned off, copied the files to the pool, verified that all the data was on the pool, then deleted the original non-pool data, turned on duplication and let DrivePool do its thing.

Afterward, since all my files were accessible from the new lettered pool drive, I removed the letters of my 3TB drives since I no longer needed to directly access them. It uncluttered my drive list and DrivePool doesn't care if a drive is hidden or not.

The UI is simple and user-friendly, but I will note the few things that initially confused me. First, DrivePool sorts data into categories that are not self-explanatory at first glance. This simple and helpful post was the clearest and most useful one I could find.

Helpful tip: When you are copying files, DrivePool will automatically limit the transfer rate to 40-50% of a disk's maximum I/O speed so that the drive will still be usable during the process. If you don't need this usability, there will be a ⏩ symbol to the right of the progress bar at the bottom of the window. Click it to allow DrivePool to remove the limiter and copy the files twice as fast.

Here is what my current pool looks like: