QGL Forum :: ZFS Dedup ... I'm excited!

Author

Topic: ZFS Dedup ... I'm excited!

stinky

Posts: 3309

Location: USA

Early November a Sun blog declared that Deduplication for ZFS was complete. Shortly thereafter it was added into the development branch of OpenSolaris.

It just happened that I've started to set up a bunch of blade servers for LAMP stacks and various *nix utilities so I decided to set up a iSCSI server for hosting some nice to have but non-critical data like centos yum repositories.

I connected up a spare Dell 1950 to a MD1000 ( 12 x 400Gb 15k disks ) and was ready to go.

Installing OpenSolaris is a breeze via LiveCD ( which means you can test hardware compatibility at the same time ) and setting up a ZFS partition with dedup is 3 simple CLI commands. Add a few more commands and you're sharing blocks inside the ZFS via iSCSI

Copying our current YUM mirror gave me a dedup factor of 1.13x, We run two yum servers for redundancy, and not surprisingly after setting up our second yum repository our dedup factor jumped to 2.35x. Which means for a little less storage than a single yum mirror we've got two!

I would love to see this combined with something along the lines of backblaze pods ( http://blog.backblaze.com/2009/09/01/petabytes-on-a-budget-how-to-build-cheap-cloud-storage/ ). While not quite what I'd use for an enterprise storage solution, I think it's a brilliant way to get cheap storage for a cloud based backup solution, or even a server farm hosting web applications etc.

What I'm really keen to do is implement this for backups, go back to oldschool tarballs or a solid linux backup tool like zAmanda ( hell even robocopy/rsync ) and then just do a monthly archive to tape. I hate the direction that VTL is going, and think this will give us a brilliant alternative.

system

HerbalLizard

Posts: 3460

Location: Queenstown, New Zealand

This post gave me wood this morning, cheers you have made my day

trog

AGN Admin

Posts: 28613

Location: Brisbane, Queensland

That sounds pretty awesome. If I ever get some spare time I really want to set up a ZFS-based file server at home.

My main issue atm is that I want RAID-esque redundancy in that I want to be able to know that if a drive dies, all I have to do is pull it out and replace it with another drive. I don't want to have to think about drive sizes, brands, etc, matching - I just want to be able to throw in random drives.

We talked about a bit on irc and couldn't figure out the answer - if using RAID-Z, can you use drives of difference size - eg, 500gb,500gb, 500gb, 1tb - as long as you're happy to "lose" the extra space on the 1tb drive?

Jim

Posts: 10896

Location: Brisbane, Queensland

the wikipedia article for zfs talks about the ability to swap in larger drives one at a time letting parity be restored. so, yes
it's the same thing we did with our nas in the office using raid5 and lvm, incidentally

trog

AGN Admin

Posts: 28615

Location: Brisbane, Queensland

the wikipedia article for zfs talks about the ability to swap in larger drives one at a time letting parity be restored. so, yes
it's the same thing we did with our nas in the office using raid5 and lvm, incidentally

I know it's possible for "regular ZFS" but Dewi was saying he wasn't sure if you could do it if you were running ZFS in RAID-Z

TicMan

Posts: 5475

Location: Melbourne, Victoria

f***ing hell this is awesome, now to find something to apply it too.

Jim

Posts: 10897

Location: Brisbane, Queensland

without raid, there isn't parity or 'healing', there's only integrity checking

Capacity expansion is normally achieved by adding groups of disks as a top-level vdev: simple device, RAID-Z, RAID-Z2, RAID-Z3, or mirrored. Newly written data will dynamically start to use all available vdevs. It is also possible to expand the array by iteratively swapping each drive in the array with a bigger drive and waiting for ZFS to heal itself � the heal time will depend on amount of stored information, not the disk size. The new free space will not be available until all the disks have been swapped.

Hogfather

Posts: 4356

Location: Cairns, Queensland

Jim - I've been wondering about this with my Thecus!

Can I do as suggested above and 'heal' my array to a larger size by replacing the drives one by one?

For example, I have 4x750MB drives at the moment. If I swap in say 1.5TB drives and allow the array heal after each swap will I finally double my storage when the last drive is rebuilt?

XaartaX

Posts: 328

Location: Adelaide, South Australia

Pity it doesn't support variable length dedupe.

Jim

Posts: 10898

Location: Brisbane, Queensland

yeh you can do that with your thecus hogfather, at least I know you can if you chose lvm/ext3 - I'm not sure what it would let you do if you had chosen ZFS. not because ZFS can't do it, but because I'm not sure how they configure the underlying ZFS, or whether fuse-zfs has limitations in that area

stinky

Posts: 3310

Location: USA

Pity it doesn't support variable length dedupe.

variable length dedup is a performance hit, on something like DataDomain ( say goodby to the better part of a million dollars ) which is predominently for backups it makes sense to do variable length as performance on backup data isn't that important ( considering magnetic tape is the benchmark ) whereas on ZFS dedup is intended for more active data which means performance is important.

stinky

Posts: 3338

Location: USA

Thought I'd post a follow up here. My filesystem is currently at 2.39x dedup.


NAME     SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT

tank01  4.72T  52.7G  4.67T     1%  2.39x  ONLINE  -


Filesystem           1K-blocks      Used Available Use% Mounted on

tank01/util01

                      1.5T   63G  1.5T   5% /tank01/util01

tank01/util02

                      1.5T   63G  1.5T   5% /tank01/util02

You can see the results of the dedupic really well in the above output from 'df'. I have quota'd 1.5T from tank01 to each of the util folders. both have pretty much the same data set ( centos repos for network installs etc ). 'df' sees them as having 63G used each, but the zpool list shows only 52.7G actually used across the zpool.

Also been reading on some other cool features like read/write cache on SSD ( L2ARC/ZIL ) which lets you use SSD as a cache for your zfs filesystem

http://blogs.sun.com/brendan/entry/test

can also do snapshotting (not uncommon on FS these days ) and data replication ( locally or piped over ssh http://www.markround.com/archives/38-ZFS-Replication.html ).

not to mention the ZFS native support of nfs/iscsi/smb sharing.

Apart from a fancy GUI to manage this stuff, it's getting super close to being a full blown SAN/NAS solution to rival many of the large vendors.

system

Not a new post since your last visit.
New Post Since your last visit

Back To Forum