vfs: Allow filesystems with foreign owner IDs to override UID checks
A number of ownership checks made by the VFS make a number of assumptions:
(1) that it is meaningful to compare inode->i_uid to a second ->i_uid or
to current_fsuid(),
(2) that current_fsuid() represents the subject of the action,
(3) that the number in ->i_uid belong to the system's ID space and
(4) that the IDs can be represented by 32-bit integers.
Network filesystems, however, may violate all four of these assumptions.
Indeed, a network filesystem may not even have an actual concept of a UNIX
integer UID (cifs without POSIX extensions, for example). Plug-in block
filesystems (e.g. USB drives) may also violate this assumption.
In particular, AFS implements its own ACL security model with its own
per-cell user ID space with 64-bit IDs for some server variants. The
subject is represented by a token in a key, not current_fsuid(). The AFS
user IDs and the system user IDs for a cell may be numerically equivalent,
but that's matter of administrative policy and should perhaps be noted in
the cell definition or by mount option. A subsequent patch will address
AFS.
To help fix this, three functions are defined to perform UID comparison
within the VFS:
(1) vfs_inode_is_owned_by_me(). This defaults to comparing i_uid to
current_fsuid(), with appropriate namespace mapping, assuming that the
fsuid identifies the subject of the action. The filesystem may
override it by implementing an inode op:
int (*is_owned_by_me)(struct mnt_idmap *idmap, struct inode *inode);
This should return 0 if owned, 1 if not or an error if there's some
sort of lookup failure. It may use a means of identifying the subject
of the action other than fsuid, for example by using an authentication
token stored in a key.
(2) vfs_inodes_have_same_owner(). This defaults to comparing the i_uids
of two different inodes with appropriate namespace mapping. The
filesystem may override it by implementing another inode op:
int (*have_same_owner)(struct mnt_idmap *idmap, struct inode *inode1,
struct inode *inode2);
Again, this should return 0 if matching, 1 if not or an error if
there's some sort of lookup failure.
(3) vfs_inode_and_dir_have_same_owner(). This is similar to (2), but
assumes that the second inode is the parent directory to the first and
takes a nameidata struct instead of a second inode pointer.
Fix a number of places within the VFS where such UID checks are made that
should be deferring interpretation to the filesystem.
(*) chown_ok()
(*) chgrp_ok()
Call vfs_inode_is_owned_by_me(). Possibly these need to defer all
their checks to the network filesystem as the interpretation of the
new UID/GID depends on the netfs too, but the ->setattr() method gets
a chance to deal with that.
(*) coredump_file()
Call vfs_is_owned_by_me() to check that the file created is owned by
the caller - but the check that's there might be sufficient.
(*) inode_owner_or_capable()
Call vfs_is_owned_by_me(). I'm not sure whether the namespace mapping
makes sense in such a case, but it probably could be used.
(*) vfs_setlease()
Call vfs_is_owned_by_me(). Actually, it should query if leasing is
permitted.
Also, setting locks could perhaps do with a permission call to the
filesystem driver as AFS, for example, has a lock permission bit in
the ACL, but since the AFS server checks that when the RPC call is
made, it's probably unnecessary.
(*) acl_permission_check()
(*) posix_acl_permission()
Unchanged. These functions are only used by generic_permission()
which is overridden if ->permission() is supplied, and when evaluating
a POSIX ACL, it should arguably be checking the UID anyway.
AFS, for example, implements its own ACLs and evaluates them in
->permission() and on the server.
(*) may_follow_link()
Call vfs_inode_and_dir_have_same_owner() and vfs_is_owned_by_me() on
the the link and its parent dir.
(*) may_create_in_sticky()
Call vfs_is_owned_by_me() and also vfs_inode_and_dir_have_same_owner()
both.
[?] Should this return ok immediately if the open call we're in
created the file being checked.
(*) __check_sticky()
Call vfs_is_owned_by_me() on both the dir and the inode, but for AFS
vfs_is_owned_by_me() on a directory doesn't work, so call
vfs_inodes_have_same_owner() instead to check the directory (as is
done in may_create_in_sticky()).
(*) may_dedupe_file()
Call vfs_is_owned_by_me().
(*) IMA policy ops.
Unchanged for now. I'm not sure what the best way to deal with this
is - if, indeed, it needs any changes.
Note that wrapping stuff up into vfs_inode_is_owned_by_me() isn't
necessarily the most efficient as it means we may end up doing the uid
idmapping an extra time - though this is only done in three places, all to
do with world-writable sticky dir checks.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Etienne Champetier <champetier.etienne@gmail.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Jeffrey Altman <jaltman@auristor.com>
cc: Chet Ramey <chet.ramey@case.edu>
cc: Cheyenne Wills <cwills@sinenomine.net>
cc: Alexander Viro <viro@zeniv.linux.org.uk>
cc: Christian Brauner <brauner@kernel.org>
cc: Steve French <sfrench@samba.org>
cc: Mimi Zohar <zohar@linux.ibm.com>
cc: linux-afs@lists.infradead.org
cc: openafs-devel@openafs.org
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: linux-integrity@vger.kernel.org
Link: https://groups.google.com/g/gnu.bash.bug/c/6PPTfOgFdL4/m/2AQU-S1N76UJ
Link: https://git.savannah.gnu.org/cgit/bash.git/tree/redir.c?h=bash-5.3-rc1#n733
9 files changed