Re: git: 9404c479946c - main - pci_user: Report NUMA domain
- In reply to: Jake Freeland: "Re: git: 9404c479946c - main - pci_user: Report NUMA domain"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Mon, 03 Nov 2025 15:10:43 UTC
On 11/2/25 21:01, Jake Freeland wrote:
>> For future reference: you didn't need to bump to a new number for the ioctl
>> since the size of the argument is encoded in the ioctl as well. This
>> feature of ioctls also means that we generally don't add padding (as you did
>> in the followup commit) as you will naturally get a new ioctl cmd value
>> anytime you add more fields in the future. This avoids the one point Warner
>> raised in the review about how do you define the semantics of the padding
>> as that approach defers reserving space in the structure until the semantics
>> of that space is known.
>
> Hey John,
>
> I just took another look at this and understand your point about not
> needing to bump the ioctl number, but am struggling to see why that
> would remove the need for padding.
>
> The point of the padding was to minimize the in-kernel churn needed
> when adding new members to struct pci_conf. Without it, adding a new
> member means we need to account for the old ioctl (with the old length)
> in every switch statement in pci_user.c and there are a lot.
>
> The only solution I can think of to get around this is to switch on the
> IOCGROUP of every command and copy out IOCPARM_LEN number of bytes.
> Then modifying pci_conf wouldn't require an additional case for every
> switch statement. There may be a reason, that I'm not aware of, for why
> we don't already do this. Perhaps this is what you were thinking?
Hummmm. So after reading more, you did indeed need to add a new number.
First, a general thought about adding padding in structures used as ioctl
arguments. With padding you have to be sure that all calling code follows
good sanitation of the padding (e.g. all client code know needs to know
that it has to memset/bzero the entire structure before assigning members
to ensure the padding is zeroed) as otherwise you can't depend on using
zeroes in the padding to determine which API the calling application
believes it is using. Most of the userspace ioctl code I've seen in
general doesn't use this, it is often:
struct foo;
foo.field_one = x;
foo.field_two = y;
ioctl(fd, SIOCSFOO, &foo);
So that is the origin of the general rule that we don't tend to use padding
for ioctl structures.
However, this case is a bit more unusual. Here, `struct pci_conf_io`
is the structure we pass to the ioctl, and it's size did not in fact change.
Rather, this structure contains a pointer to an array of `struct pci_conf`
objects. So, you had to bump the number to deal with the alternate ABI of
`struct pci_conf`. Hmm, but you didn't actually have to. `pci_conf_io` already has
these fields:
u_int32_t match_buf_len; /* match buffer length */
u_int32_t num_matches; /* number of matches returned */
struct pci_conf *matches; /* match buffer */
So you can infer the caller's size of `struct pci_conf` by doing
match_buf_len / num_matches. We really should just be passing that size
as an argument to `pci_conf_for_copyout` so that only in that one place do
you need to deal with the different layouts of `struct pci_conf`. It does
mean you would need to switch on the structure size inside of the current
switches for the ioctl command, but you wouldn't need the ioctl command,
instead your prior patch would look something like:
static void
pci_conf_for_copyout(const struct pci_conf *pcp, union pci_conf_union *pcup,
u_long cmd, size_t conf_size)
{
...
switch (cmd) {
case PCIOCGETCONF:
switch (conf_size) {
#ifdef COMPAT_FREEBSD14
case sizeof(pcup->pc14):
...
#endif
case sizeof(pcpu->pc):
pcup->pc = *pcp;
return;
default:
__assert_unreachable();
}
I see that we currently do some lazy-ish handling for the size of pci_conf
now though:
/*
* Determine how much room we have for pci_conf structures.
* Round the user's buffer size down to the nearest
* multiple of sizeof(struct pci_conf) in case the user
* didn't specify a multiple of that size.
*/
confsz = pci_conf_size(cmd);
iolen = min(cio->match_buf_len - (cio->match_buf_len % confsz),
pci_numdevs * confsz);
I think in practice the user always provides an exact array and we should depend
on that, so this should be rewritten more like:
if (cio->match_buf_len % cio-num_matches != 0) {
error = EINVAL;
goto confexit;
}
confsz = cio->match_buf_len / cio->num_matches;
error = pci_validate_conf_size(cmd, confsz);
if (error != 0)
goto confexit;
ionum = min(cio->num_matches, pci_numdevs);
And pci_validate_conf_size would use a similar structure of a switch on cmd with
nested commands on the size to check for supported ABIs of pci_conf.
--
John Baldwin