ZFS on FreeBSD

I have been seriously impressed with the systems posted in the Storage Showoff Thread over at [H]ard|Forum. 10TB  is the minimum amount of storage to even participate and some people are exceeding 100TB.

Interestingly, the top list consists mostly of Windows and Linux machines. Just one entry for OpenSolaris and none for FreeBSD. Of course given the market share of server operating systems in general, Windows and Linux are expected to show up often. However I wonder why so many people with obvious interest in reliable storage miss out on the impressive features offered by ZFS.

The only operating systems with a good ZFS implementation so far are OpenSolaris, Solaris and FreeBSD. I tested the ZFS for Linux module a long time ago and found performance to be lacking. Since Oracle bought Sun, the future of OpenSolaris is highly questionable. Official OpenSolaris is dead, the original Solaris 10 is no longer free, and we’ll have to wait and see what becomes of the OpenSolaris fork Illumos.

Enter FreeBSD

So I think, FreeBSD is the best alternative if you want to use ZFS. It has been around for years, and is actively maintained. I downloaded the tiny bootonly ISO and had a FreeBSD installation running in no time. I set up sshd and connected to the machine with PuTTY. For people with no prior BSD experience like me, I recommend to install the bash shell, you’ll feel a lot more at home then. It’s as easy as

pkg_add -r bash

That is, if you have installed FreeBSD ports, as the installer recommended. I set up a VMWare with 10 disks. Because I encrypt all my disks with dmcrypt in Linux, I wanted something similar in FreeBSD.

Encrypting the disks

I used GELI for disk encryption. First I created a keyfile:

dd if=/dev/random of=/root/keyfile bs=64 count=1

Then I fired up bash to encrypt all devices at once, with the same keyfile. I’d never to this in a real system, but for testing purposes this is the quickest way:

for i in {1..8}; do geli init -s 4096 -P -K /root/keyfile /dev/da$i; done

Then we need to “attach” the devices, so we can put them into a ZFS pool.

for i in {1..8}; do geli attach -p -k /root/keyfile /dev/da$i; done

This will create decrypted versions of the drives in the /dev/ file system, ending with .eli.

Creating ZFS pools

Then we can finally begin to create ZFS pools. I recommend to take a look at the ZFS Best Practices Guide. ZFS is really as easy as 1-2-3:

zpool create fileserver raidz2 /dev/da1.eli /dev/da2.eli /dev/da3.eli /dev/da4.eli

I played around with the commands, creating various pools and configurations. I’ll spare you this and instead talk about

Why I won’t use ZFS

You cannot mix different sized disks

I usually don’t buy hard drives by the dozen. I buy whatever is cheaptest in terms of €/GiB, because I’ll fill that space eventually, and I buy two disks at once, because I need a backup.

Not being able to mix different sized disks in one zpool led me to the next problem:

You cannot extend a RAID-Z

It is really not possible to add new disk to a raidz. Oh, you can put several raidz into one zpool. To do this just update an already existing zpool:

root@bsdbox ~ #  zpool add fileserver raidz2 /dev/da9 /dev/da10 /dev/da11 /dev/da12 /dev/da13 /dev/da14
root@bsdbox ~ #  zpool status
  pool: fileserver
 state: ONLINE
 scrub: none requested
config:

        NAME         STATE     READ WRITE CKSUM
        fileserver   ONLINE       0     0     0
          raidz2     ONLINE       0     0     0
            da1.eli  ONLINE       0     0     0
            da2.eli  ONLINE       0     0     0
            da3.eli  ONLINE       0     0     0
            da4.eli  ONLINE       0     0     0
            da5.eli  ONLINE       0     0     0
            da6.eli  ONLINE       0     0     0
          raidz2     ONLINE       0     0     0
            da9      ONLINE       0     0     0
            da10     ONLINE       0     0     0
            da11     ONLINE       0     0     0
            da12     ONLINE       0     0     0
            da13     ONLINE       0     0     0
            da14     ONLINE       0     0     0

errors: No known data errors

Several restrictions apply however:

  • Your new pool must employ the same raid strategy as the one you’re adding to
  • You need to use the exact same amount of devices

This leaves me in a very uncomfortable position: I never liked RAID5 anyway (of course ZFS’ Copy-on-write alleviates the performance problems). I will not trust RAID5 with my data, and its time seems to be over anyway. RAIDZ2 (RAID6) on the other hand is only really usable with 5 disks or more. That’s not very convenient when you got a chassis that is able to hold 8 disks in total. As Adam Leventhal put it:

It’s common for a home user to want to increase his total storage capacity by a disk or two at a time, but enterprise customers typically want to grow by multiple terabytes at once so adding on a new RAID-Z stripe isn’t an issue.

I simply don’t want to make such large investments at once. One solution would be to put 2 equally sized devices together in mirror mode, and keep adding these pairs to an existing zpool, creating what is basically a RAID10. This allows for easy storage upgrading.

root@bsdbox ~ #  zpool status
  pool: fileserver
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        fileserver  ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            da1     ONLINE       0     0     0
            da2     ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            da3     ONLINE       0     0     0
            da4     ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            da5     ONLINE       0     0     0
            da6     ONLINE       0     0     0

Another method is to just replace every disk in a raidz with a larger model. When you replaced the last one, the array will eventually grow.

You cannot remove a drive from RAID-Z

Srsly. Guys. I need to be able to upgrade storage. I can’t just put new disks into the tower until the end of the time. I need to be able to remove old disks. If you try, ZFS will tell you that

cannot remove da1: only inactive hot spares or cache devices can be removed

 

BTRFS is better

While BTRFS currently only supports RAID0/1/10, I can already use differently sized disks and remove and add them as I like. A look on the Project Ideas page shows promising features, which might be implemented some day:

Raid allocation options need to be configurable per snapshot, per subvolume and per file. It should also be possible to set some IO parameters on a directory and have all files inside that directory inherit the config.

btrfs-vol -b should take a parameter to change the raid config as it rebalances.

I ordered two 2000GB Samsung EcoGreen F3 HD203WI disks, and I plan on using them in a BTRFS RAID10. Right now, this doesn’t leave me in any better position as if I’d just used ZFS, probably worse because of BTRFS’ experimental status. However I don’t feel like ZFS is worth the hassle. I rather stay with my Linux file server and see how BTRFS will progress.

As for the guys in [H]ard|Forum: I can’t understand why not more of them use ZFS. They obviously got the money and storage requirements to buy lots of disks at once, which makes all the ZFS problems I got obsolete. It’s a great file system – if you can afford it.

  1. No comments yet.

  1. No trackbacks yet.