Consistent ACL and EA semantics across BSD and Linux

Marius Bendiksen mbendiks at eunet.no
Tue Jun 27 23:17:49 GMT 2000


I have replied to this mail previously, but, at the time, got a phone
call, so I could not write up the MyFS description. Therefore, I am again
replying, this time CC:ing the other participants, that somehow got lost
along the way. The MyFS description will follow at the end. If anyone has
not gotten my reply to Mr. Gruenbacher's mail, but would like to, please
mail me about it.

> > > So you want to be able to stat each individual attribute, keeping
> > > timestamps etc.
> > 
> > Actually, my primary interest is to be able to obtain the size of the
> > attribute, really. The remainder of the attributes would probably not
> > be filled in. But you might still want to use an ordinary stat struct
> > to hold the data.
> 
> Alright. Then I don't see using struct stat as superior of any other
> mechanism capable of returning an integer value. Noting that that's a
> system call issue, and that we will never have identical system calls
> between FreeBSD and Linux, we don't need to discuss that here.
> 
> > > That is exactly the problem. You probably won't have a KDE icon stored
> > > as an extended attribute. Rather, you would store the icon's name. Then,
> > > the benefit
> > > of sharing the icon name doesn't pay off.
> > 
> > First off. You are quite likely to have the entire icon stored within the
> > extended attribute in a number of cases as the mechanism gains widespread
> > use in the community. You may not see the icon stored in there initially,
> > but once people start realizing the potential of EAs, and using it fully,
> > you will see this happening, especially once archivers start bringing EAs
> > into the archive. ZIP and RAR already do this on OS/2. Also, name sharing
> > still pays off, as u_int16_t+u_int32_t < strlen( "/usr/share/..." );
> 
> In a new filesystem (your filesystem), you may be able to allocate some
> space for extended attributes in the inodes themselves. For the ext2
> filesystem, I only have a 32-bit value per inode as a pointer. So my
> only option is to allocate some other space on disk and store my
> extended attributes there. This makes one additional disk seek.
> 
> Now if you're able to store the EA names and values on this same disk
> block, you don't need any additional disk seeks. That's what I've
> implemented so far. Extended attributes are allocated on that same disk
> block if possible.
> 
> As for the sharing argument above, consider that space isn't the only
> measure. You're better off keeping copies of EA values if that saves you
> another disk seek. On the other hand, sharing big extended attributes
> may make sense.
> 
> > As to the former point, foresight and planning are your friends. Offer up
> > a mechanism, and it will be used. And at some point in the future it will
> > be insufficient. We must strive to push that point as far into the future
> > as possible. As to the latter, bear in mind the importance of maintaining
> > a high node density. This is easier to achieve in my model.
> > 
> > > You are right about sharing all extended attributes of an inode, though.
> > 
> > I doubt this will make much sense in the typical case. I think it will be
> > much more beneficial to stick, say 16, pairs of attrid/bodyid directly in
> > the inode and share the translated contents.
> 
> Not an option for ext2, unfortunately.
> 
> I don't think it's not a good choice to store attribute ids instead of
> their names. This can be done for a clearly-defined set of attribute
> names (say, ACL, MAC, ...) but there will also be others. A combined
> scheme that either stores a name or an index in a predefined table may
> be a good compromise.
> 
> > > Im a very early implementation of Access Control Lists for Linux, I was
> > > sharing ACLs among inodes. ACLs were not hard-linked manually, though.
> > > That was performed automatically using a cache. The on-disk format was
> > > something like:
> > 
> > ACLs are a different matter. However, bear in mind that an ACL extension,
> > performed through the use of EAs would still have this advantage under my
> > model, as the individual EA bodies would be shared; and an ACL block will
> > be a single EA body.
> 
> This is fine assuming that you can store the EA pointer in the inode
> directly.
> 
> > > Keeping a full ACL for each extended attribute sounds ridiculous. Also
> > > keep
> > 
> > First off, no, this does not sound ridiculous, if you are willing to
> > sacrifice space and bandwidth.
> > 
> > Second off, I was not suggesting a full ACL for each EA. I was suggesting
> > that you maintain extra bits in the object-ACL representing EA permission
> > sets. So, you maintain only *one* ACL, which applies for the entire inode
> > and data pointed to by it; however, rather than having each entry reflect
> > just R/W/X, you have it reflect R/W/X/[GS]ET[US]EA. Since you would be in
> > need of at least a byte anyway, you aren't using extra space at all.
> 
> So you would have seven bits instead of three. But that wouldn't give
> you per-attribute permissions (different attributes would still share
> the same permissions).
> 
> > > in mind that you would have to manage the extended attribute permissions
> > > (including utilities to manipulate them, etc.). I agree you need to
> > > separate system attributes from user attributes, which I'm doing based
> > > on attribute namespaces (as described in one of the previous messages).
> > > I don't think you should implement permissions for user attributes,
> > > though. The mechanism Irix implements makes sense, is very simple, and
> > > understandable for users.
> > 
> > I think the idea of using namespaces is shortchanging the options of such
> > a mechanism. It should be possible for exactly identical names to coexist
> > if one is a system attribute and the other is a user attribute, without a
> > collision.
> 
> Sorry, but I can't see any need for this. There's no intrinsic
> difference between storing the namespace (say, 'U' and 'S') separately
> from the attribute name (say, "acl", "mime") versus storing the
> namespace as part of the name itself ("Sacl", "Umime", or even "Uacl").
> 
> > Besides, you might open potential for DoS'ing of system EA who
> > have been dynamically assigned.
> 
> True. That's similar to a user filling up a filesystem.
> 
> > > I'd rather like to hear about your ideas first. Would you mind to
> > > elaborate on them first, so that we can discuss their effects?
> > 
> > Certainly. Do you want a verbal description, or code?
> 
> A technical description would be best I guess.
> 
> > > I don't think this would work, unless it's performed automatically and
> > > hidden from the user. That's because basic operations such as chmod
> > > modify at least the ACL of a file; perhaps there will be other basic
> > > operations that have an effect on system attributes.
> > 
> > First off, I think you want ACLs natively implemented anyhow.
> 
> What's your idea of native? I'm thinking of using extended system
> attributes for storing ACLs. You could of course have a dedicated
> pointer to ACL, Default ACL, CAP, MAC, ... for each inode, but that
> wouldn't be space efficient.
> 
> > Secondly, I agree that you must be able to do transparent collapsing of
> > EA bodies somehow, if you are to acheive maximum performance. I will in
> > either case note that you should *also* allow manual hardlinking if the
> > user ever requests it. I can imagine cases where this would be useful.
> 
> That /may/ be useful, right. I'm not at all sure about favorable
> semantics.
> 
> Maybe you'd even want to refer to another inode (e.g., by inode number)
> as the attribute value. On the other hand, that would be the same as
> streams (aka forks etc.). You would then have the opportunity to do
> hard-linking, etc.

MyFS description:
-----------------
Metadata contained in files. Three position-locked files, /.boot, /.config
and /.config.journal, in that order from the start of the disk. The former
holds boot code. The mid one holds the textual equivalent of a superblock,
while the latter holds a journal record to the config file.

The disk is logically subdivided into 31 block groups, to approximate what
FFS uses cylinder groups for, though the concept is used quite differently
here. I mostly use this to maintain some minimal associativeness between a
object and its metadata, as well as the containment hierarchy.

Freespace is recorded in /.freelog, which tracks alloc/free operations, as
well as being occasionally packed. To avoid lag, the filesystem will alloc
large chunks from this space based on alloc statistics. This space will be
recorded as "possibly allocated" and any waste due to a crash is recovered
upon the next mount, by scanning the inode file asynchronously.

Inodes are located in /.inodefile, and contain mostly POSIX data. However,
group ownership is optional, and the mode field now only holds IFMT. Also,
a DACL has been added, which can hold from 9 to 32 entries. A MAC label is
present, though I have need for further input from you guys as to how this
should be handled. Also, a file type specifier, intended for use with some
planned MIME expansions, is present, as is a type specific flag. Lastly, I
have 16 EA entries, as well as a pointer into the /.eaexternal chain. This
pointer can optionally be null. Further, a backlink to a directory can, if
desired, be set, and this will be considered in layout optimizations. Last
in the inode, you will find an extant list. There is no indirection. As to
managing the allocation and freeing of inodes, the entry for inode 0 is in
this case abused for freelist info.

As to extended attributes, their names are stored in /.eanames, and bodies
are stored in /.eabody, except when an external block is used. When such a
thing is the case, the body is stored directly in the external block if it
will fit. The names are cached. System and user EAs are distinguished by a
bit (the high bit) in their ID.

The backing store and namespace are clearly seperated. The number of those
inodes used for holding metadata is tracked in the config file. I note the
varying needs of users, and therefore try to abstract a bit here. However,
a namespace is necessary, and thus I provide certain default spaces. These
are:
	1) Flat inode-number namespace
	2) Ordinary namespace, outperforms UFS

As to CAPabilities, these will go into a system EA.

Any questions?

Marius

To Unsubscribe: send mail to majordomo at cyrus.watson.org
with "unsubscribe posix1e" in the body of the message



More information about the posix1e mailing list