218

I am using Ubuntu 12.04. When I try to create a hard link for any directory, it fails. I can create hard links for files inside file system boundary. I know the reason why we cannot create hardlinks for files beyond file system.

I tried these commands:

$ ln /Some/Directory /home/nischay/Hard-Directory
hard link not allowed for directory
$ sudo ln /Some/Directory /home/nischay/Hard-Directory
[sudo] password for nischay: 
hard link not allowed for directory

I just want to know the reason behind this. Is it same for all GNU/Linux distros and Unix flavours (BSD, Solaris, HP-UX, IBM AIX) or only in Ubuntu or Linux?

Nischay
  • 3,771

6 Answers6

250

Directory hardlinks break the filesystem in multiple ways

They allow you to create loops

A hard link to a directory can link to a parent of itself, which creates a file system loop. For example, these commands could create a loop with the back link l:

mkdir -p /tmp/a/b
cd /tmp/a/b
ln -d /tmp/a l

A filesystem with a directory loop has infinite depth:

cd /tmp/a/b/l/b/l/b/l/b/l/b

Avoiding an infinite loop when traversing such a directory structure is somewhat difficult (though for example POSIX requires find to avoid this).

A file system with this kind of hard link is no longer a tree, because a tree must not, by definition, contain a loop.

They break the unambiguity of parent directories

With a filesystem loop, multiple parent directories exist:

cd /tmp/a/b
cd /tmp/a/b/l/b

In the first case, /tmp/a is the parent directory of /tmp/a/b.
In the second case, /tmp/a/b/l is the parent directory of /tmp/a/b/l/b, which is the same as /tmp/a/b.
So it has two parent directories.
Even without loops, multiple hardlinks to the same directory will create ambiguous parent directories.

They multiply files

Files are identified by paths, after resolving symlinks. So

/tmp/a/b/foo.txt
/tmp/a/b/l/b/foo.txt

are different files.
There are infinitely many further paths of the file. They are the same in terms of their inode number of course. But if you do not explicitly expect loops, there is no reason to check for that.

A directory hardlink can also point to a child directory, or a directory that is neither child nor parent of any depth. In this case, a file that is a child of the link would be replicated to two files, identified by two paths.

Your example

$ ln /Some/Directory /home/nischay/Hard-Directory
$ echo foo > /home/nischay/Hard-Directory/foobar.txt
$ diff -s /Some/Directory/foobar.txt /home/nischay/Hard-Directory/foobar.txt
Files /Some/Directory/foobar.txt and /home/nischay/Hard-Directory/foobar.txt are identical
$ echo bar >> /Some/Directory/foobar.txt
$ diff -s /Some/Directory/foobar.txt /home/nischay/Hard-Directory/foobar.txt
Files /Some/Directory/foobar.txt and /home/nischay/Hard-Directory/foobar.txt are identical
$ cat /Some/Directory/foobar.txt
foo
bar

How can soft links to directories work then?

A path that may contain softlinks and even soft linked directory loops is often used just to identify and open a file. It can be used as a normal, linear path.

But there are other situations, when paths are used to compare files. In this case, symbolic links in the path can be resolved first, converting it to a minimal, and commonly agreed upon representation creating a canonical path:

This is possible, because the soft links can all be expanded to paths without the link. After doing that with all soft links in a path, the remaining path is part of a tree, where a path is always unambiguous.

The command readlink can resolve a path to its canonical path:

$ readlink -f /some/symlinked/path

Soft links are different from what the filesystem uses

A soft link cannot cause all the trouble because it is different from the links inside the filesystem. It can be distinguished from hard links, and resolved to a path without symlinks if needed.
In some sense, adding symlinks does not alter the basic file system structure - it keeps it, but adds more structure like an application layer.


From man readlink:

 NAME
        readlink - print resolved symbolic links or canonical
        file names

SYNOPSIS readlink [OPTION]... FILE...

DESCRIPTION Print value of a symbolic link or canonical file name

    -f, --canonicalize
           canonicalize by  following  every  symlink  in
           every component of the given name recursively;
           all but the last component must exist
    [  ...  ]

Volker Siegel
  • 13,295
87

"You generally should not use hard links anyway" is over-broad. You need to understand the difference between hard links and symlinks, and use each as appropriate. Each comes with its own set of advantages and disadvantages:

Symlinks can:

  • Point to directories
  • Point to non-existent objects
  • Point to files and directories outside the same filesystem

Hard links can:

  • Keep the file that they reference from being deleted

Hard links are especially useful in performing "copy on write" applications. They allow you to keep a backup copy of a directory structure, while only using space for the files that change between two versions. Note that the implementation must first break the link (or modifications will apply to the original file, too!).

The command cp -al is especially useful in this regard. It makes a complete copy of a directory structure, where all the files are represented by hard links to the original files. You can then proceed to update files in the structure (after creating actual copies of only these files), and only the files that you update will take up additional space. This is especially useful when maintaining multigenerational backups.

72

FYI, you can achieve the same thing as hard links for directories by using mount:

mount -t bind /var/www /home/user/workspace/www

This is very dangerous because most tools and programs will not be aware of the binding. I once did something like in the above example and then proceeded to rm -rf /home/user. Luckily, there was nothing relevant in /var/www.

stackount
  • 926
23

The reason hard-linking directories is not allowed is a little technical. Essentially, they break the file-system structure. You should generally not use hard links anyway. Symbolic links allow most of the same functionality without causing problems (e.g ln -s target link).

astex
  • 1,081
0

It would not be a hard link.

A directory is essentially a collection of inodes pointing to the data blocks of the parent directory, .., the directory itself, . (for sake of sub/child directories having it as a super/parent directory), child directories, and files.

Hard links are not supposed to change the data in the data block(s) that the inode indexes.

A hard link to a directory would either need to add another parent directory (another ..) inode, thus breaking the rule above (that hard links should not alter the data block indexed by the linked to inode), or it would need to cause the hard linked directory to differ from the original directory (it would have an inode to a parent directory that was not the apparent parent directory from the perspective of the user).

Clarification

Say you had a directory, dir-1a, with / as the parent directory. Then you hard linked it inside one of / child directories, dir-1b , but you did not add a second .. to the linked directory (as this would change the data block).

Now if you looked at the hard link in dir-1b, it will have as its parent a directory that is not dir-1b but is instead root, /. This means that you can tell which is the hard link and which is the original.

Hard links are not intended to be differentiable from the originals.

This occurrence would have a knock on effect on processes that expect hard links to not be differentiable from the original files. Directories are designed to contain the inode to themselves and to the directory of their hard link (parent directories).

Artur Meinild
  • 31,035
0

The other answers here are good, but here's a simple practical reason. Let's say you have a directory named fred which has a bunch of files and subdirectories, such as:

  • fred/barney/wilma
  • fred/betty

Now you create a hardlink to fred named homer. Then you execute rm -rf fred. You then try cd homer/barney/wilma only to be told that homer/barney/wilma does not exist. Then you try to open homer/betty, and that's gone, too! In fact, homer is completely empty!

Why? Well, what does rm -rf do? It deletes not only the directory, but the contents of the directory. So the first thing it does is delete fred/barney/wilma, then it deletes fred/barney, and then it deletes fred/betty. Since fred and homer are two names for the same object, another way of putting it is it deletes homer/barney/wilma, then homer/barney, then homer/betty. Deleting the contents of fred means deleting the contents of homer.

Why does rm -rf delete all the files like this? Imagine what would happen if fred didn't have a hardlink before you deleted it. Then the files barney/wilma and betty would become orphaned and take up space with no way to remove them except to run a filesystem cleanup tool that looks for orphaned files.

Now, a hardlink-aware version of rm could be written that does not delete a directory's contents if a hardlink to that directory exists, but you'd also have to make sure that any other program that can delete directories is hardlink aware. I think that's a disaster waiting to happen, and the benefits are rather questionable. It also means that hardlinks no longer Just Work™ the way they do in a standard Unix filesystem.