btrfs max number of hardlinks gotcha

Here’s a surprising “gotcha” for the filesystem btrfs

You are very limited to the maximum number of hard links to a single file when the hard links and that one target file are all in the same directory. For hard links where the source and target are in different directories, the limit for btrfs is a more reasonable 2^32. Admittedly, having many hard links to a file in the same directory is a strange corner-case for operation, but still one that has been tripped over by real world users and could be a killer for certain applications.

See the kernel bug report: Number of supported hard links is very low – breaks real world software

Comments about this on the btrfs devs list include:

… the max link count on btrfs is 2^32. The lower limit is only in place on links to the same file in the same directory.

… the link count on subdirs being unrelated. The link count on btrfs directories is always one.

And this comment gives a good summary and comparison with other filesystems:

… I’ve made a quick test and managed to create many more links to the same file in the *same* directory on other filsystems:

XFS can do at least 100000, probably more;
Reiserfs did 64535;
ext3 managed to do 32000;
ext4 did 65000.

While I agree it might be a bit stupid to create so many hardlinks to the same file on the same directory, this issue can be seen as one of “backward compatibility” with other widely used and established Linux filesystems.

Despite it being stupid or not, the fact is that I’ve seen some crazy stuff along the years working with Unix, so people will expect this kind of things to *not* break when they switch from their old filesystems to shiny new btrfs.

The fact being that this limit is way lower than on other filesystems (we’re talking 2 orders of magnitude, at best!), I too suggest that the limit should be increased. Not being critical, it might be done when some other features require a format change but, nonetheless, should be done for the sake of avoiding breakage on existing systems. …

The problem is:

… The max number of hard link is depend on total length of hard link names.

… The limit is imposed by the format of inode back references. We can get rid of the limit, but it requires a disk format change.

One (extreme) test fail case is:

… The number of links depends on the length of a filename.

Is _13_ (yes, thirteen) hardlinks in a directory a big number? I don’t think so. …

As of this post, the “last modified” for the bug report is “2010-07-27 07:16”. The maillist thread ended 2011-10-15.

What’s the latest?…

Certainly one to watch out for with any software that might make lots of repeated hard links to a file in the same directory. There’s some real world problem examples noted in that thread for such as: BackupPC (backups); nnmaildir mail backend in Gnus (an Emacs package for reading news and email); and a web archiver. Also, Bacula (backups) and Mutt (email client) are given as problem examples in a summary of the present status for using btrfs in the Ubuntu 12.10 release.

My own interest was tickled when considering a simple example for file deduplication on repeated identical datafiles collected from a scientific instrument (filenames differ, the contents change occasionally, from a proprietary data capture system)… For my example, having found out about this limit feature of btrfs, I now know to write my deduplication script in a different way to how I would have naturally written the script. That is slightly more messy but should work fine.

However, to my mind this is still a very unexpected limit when compared to other commonly used filesystems. There is no problem if it is either fixed, or if it is well publicized enough so that everyone knows about it to then program around it.

Simplest should be to fix the problem at source, even if the problem is for a very rare set of circumstances… (For example, it has cost me some time and has forced me to check that I am ok for how I use rsync for backups on various multiple systems!)

I wish good luck with the further development of btrfs!

Regards,
Martin

 

Update:

Patches to fix this have recently been posted 21/05/2012: “btrfs: extended inode refs”

How soon before they are tested and included?

 

3 comments to btrfs max number of hardlinks gotcha

  • Martin L

    Some further patches 07/06/2012 from Mark Fasheh:

    btrfs-progs: Support for extended inode refs

    The following three patches add support to btrfs-progs for extended inode refs. The kernel patch set can be found:

    mail-archive.com/linux-btrfs@vger.kernel.org/msg16567

    the userspace patches have been tested alongside the kernel patches and seem to be in pretty good order. These patches get us support for mkfs, btrfs-debug-tree and importantly, fsck. The mkfs patch is last so that it can easily be taken in and out of the patch series in case we wish to test these changes without actually enabling the disk feature yet.
    –Mark

    ** For reference, I will include my description of the extended inode ref design:

    Currently btrfs has a limitation on the maximum number of hard links an inode can have. Specifically, links are stored in an array of ref items:

    struct btrfs_inode_ref {
    __le64 index;
    __le16 name_len;
    /* name goes here */
    } __attribute__ ((__packed__));

    The ref arrays are found via key triple:

    (inode objectid, BTRFS_INODE_EXTREF_KEY, parent dir objectid)

    Since items can not exceed the size of a leaf, the total number of links that can be stored for a given inode / parent dir pair is limited to under 4k. This works fine for the most common case of few to only a handful of links. Once the link count gets higher however, we begin to return [the error] EMLINK.

    The following patches fix this situation by introducing a new ref item:

    struct btrfs_inode_extref {
    __le64 parent_objectid;
    __le64 index;
    __le16 name_len;
    __u8 name[0];
    /* name goes here */
    } __attribute__ ((__packed__));

    Extended refs use a different addressing scheme. Extended ref keys look like:

    (inode objectid, BTRFS_INODE_EXTREF_KEY, hash)

    Where hash is defined as a function of the parent objectid and link name.

    This effectively fixes the limitation, though we have a slightly less efficient packing of link data. To keep the best of both worlds then, I implemented the following behavior:

    Extended refs don’t replace the existing ref array. An inode gets an extended ref for a given link _only_ after the ref array has been filled. So the most common cases shouldn’t actually see any difference in performance or disk usage as they’ll never get to the point where we’re using an extended ref.

    It’s important while reading the patches however that there’s still the possibility that we can have a set of operations that grow out an inode ref array (adding some extended refs) and then remove only the refs in the array. I don’t really see this being common but it’s a case we always have to consider when coding these changes.

    Extended refs handle the case of a hash collision by storing items with the same key in an array just like the dir item code. This means we have to search an array on rare occasion.


    Mark Fasheh

  • Martin L

    The patches have been reviewed and look to be getting submitted:

    btrfs: extended inode refs

    From: Mark Fasheh …
    Subject: [PATCH 0/3] btrfs: extended inode refs
    Newsgroups: gmane.comp.file-systems.btrfs
    Date: 2012-08-08 18:55:44 GMT

    Currently btrfs has a limitation on the maximum number of hard links an
    inode can have. …

    … Testing wise, the basic namespace operations work well (link, unlink, etc).
    The rest has gotten less debugging (and I really don’t have a great way of
    testing the code in tree-log.c) Attached to this e-mail are btrfs-progs
    patches which make testing of the changes possible.

    Finally, these patches are based off Linux v3.5.
    –Mark

    Most recent review for this series can be found at:
    https://thread.gmane.org/gmane.comp.file-systems.btrfs/17480

    Thanks to Jan Schmidt for giving the patches thorough review. Most of the
    changes are from his suggestions.

    Changelog:

    – rebased against 3.5 …

    … – I am actually including a patch to btrfs-progs with this drop. 🙂

    From: Mark Fasheh

    [PATCH] btrfs-progs: basic support for extended inode refs

    This patch adds enough mkfs support to turn on the superblock flag and
    btrfs-debug-tree support so that we can visualize the state of extended refs
    on disk. …

  • Martin L

    The “btrfs extended inode refs” were patched nearly a year ago now (already!) and I’ve certainly given them a thorough testing for my rsync backups. And to think they were thought to be an “obscure corner case”… Filesystems seem to be full of surprises…

    This maillist posting nicely summarizes the present state of play: some feedbacks seen on btrfs

    All looks to be developing well but there is still some way to go yet for the developers to complete the features list. Note the “EXPERIMENTAL – under heavy development”. However, noting that, I’m using btrfs on various non-critical systems and for some of my backups and it all works very well for my usage case. Even so, always have a backup to hand!

Leave a Reply