getdirentries_args and other kernel syscall structures

Wed Nov 23 18:35:46 GMT 2005

At about the time of 11/23/2005 3:11 AM, Stefan Farfeleder stated the
following:

> On Tue, Nov 22, 2005 at 08:32:10PM -0800, Daniel Rudy wrote:
> 
>>Ok, I'va got a little question here.  In the structure
>>getdirentries_args, there seems to be duplicated fields that I'm not
>>entirely sure what they do.  Here's the definition of a structure
>>verbatim from sys/sysproto.h:
>>
>>struct getdirentries_args {
>>        char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
>>        char buf_l_[PADL_(char *)]; char * buf; char buf_r_[PADR_(char *)];
>>        char count_l_[PADL_(u_int)]; u_int count; char
>>count_r_[PADR_(u_int)];
>>        char basep_l_[PADL_(long *)]; long * basep; char
>>basep_r_[PADR_(long *)];
>>};
>>
>>Now my question is what does the l and r variables do?  It seems that
>>they do something with padding the data based on the endian of the
>>machine?  I look through this header file, and I see all the structures
>>have similar constructs.  Is it something that can be safely ignored?
> 
> 
> This file is automatically generated by makesyscalls.sh.  The l and r
> variables are a hack to correctly the member between them.  One of PADL_
> or PADR_ always evalutes to 0, the other one to the needed padding,
> depending on the passed type.  This is unfortunate because it relies on
> the GCC extension to accept 0-sized arrays.

The file is automatically generated?  Now that I think about it, I
remember seeing something to that effect in the header file...

> I'd love to fix that but couldn't come up with something that
> isn't very involved.

I have an idea about that...

Why not do something like this:

#define PAD_(t) (sizeof(register_t) <= sizeof(t) ? \
                0 : sizeof(register_t) - sizeof(t))

struct getdirentries_args {
#if BYTE_ORDER == LITTLE_ENDIAN
	int fd; char fd_r_[PAD_(int)];
	char * buf; char buf_r_[PAD_(char *)];
	u_int count; char count_r_[PAD_(u_int)];
	long * basep; char basep_r_[PAD_(long *)];
#else
	char fd_l_[PAD_(int)]; int fd;
	char buf_l_[PAD_(char *)]; char * buf;
	char count_l_[PAD_(u_int)]; u_int count;
	char basep_l_[PAD_(long *)]; long * basep;
#endif
};

This way, you only pad what you need based on the endian of the machine,
and the one that isn't used is stripped out by the preprocessor.  And as
a bonus, it's pretty clear as to what is going on and not as confusing.
 I haven't looked at the script that generates this file, but I don't
see how hard it could be to modify the script to do something like this.
 Yes, the header file will be larger, but only half will be used.

Now, onto the actuall problem that I have been having with the
getdirentries syscall.  It has to do with the buf member.  The buffer
seems to be 4K, which we all know is the size of 1 page on IA32
hardware.  The problem is that I can only get data for the first struct
dirent record.  Subsequest records are null.  The first record
represents the '.' directory.  According to the man page, the d_reclen
member specifies the length of the current record, so the following code
should be able to walk from one record to another:

static int new_getdirentries(struct thread *t, struct getdirentries_args
*uap)
  {
    unsigned int buffsize;
    unsigned int n;
    int flag;
    struct dirent *dirp_start;
    struct dirent *dirp_current;
    struct dirent *dirp2;
    struct dirent *dirp3;
    int result;
    int status;

    /* issue the syscall */
    result = getdirentries(t, uap);
    printf("getdirentries result %d\n", result);
    if (result != 0) return(result);

    /* and check the buffer */
    dirp_start = (struct dirent *)uap->buf;
    if (dirp_start == NULL) return(result);

    /* do we have work to do? */
    if (dirp_start->d_namlen > 0)
      {
        /* get our buffer size */
        buffsize = uap->count;

        /* allocate memory buffer in kernel space */
        MALLOC(dirp2, struct dirent *, buffsize, M_DIRP2, M_NOWAIT);
        if (dirp2 == NULL) return(result);

        /* copy data into kernel space */
        copyin(uap->buf, dirp2, buffsize);

        /* setup pointers */
        dirp_start = dirp2;
        dirp_current = dirp_start;

        /* set the flag */
        flag = 0;
        n = buffsize;
        r = 0;

        printf("buffsize %u; dirent size %d; getdirent %d\n", buffsize,
sizeof(struct dirent),
          sizeof(struct getdirentries_args));
        /* now walk through the dirent structures */
        while (flag == 0)
          {
            /* decrement the used buffer length */
            n -= dirp_current->d_reclen;

            printf("**BEFORE\n");
            printf("n %u; r %u\n", n, r);
            printf("name %s; len %hhu; reclen %hu\n", dirp_current->d_name,
              dirp_current->d_namlen, dirp_current->d_reclen);
            /* set next pointer */
            dirp3 = dirp_current + dirp_current->d_reclen;
            dirp_current = dirp3;
            printf("**AFTER\n");
            printf("n %u; r %u\n", n, r);
            printf("name %s; len %hhu; reclen %hu\n", dirp_current->d_name,
              dirp_current->d_namlen, dirp_current->d_reclen);
            if (dirp_current->d_namlen == 0) flag = 1;
          }

        /* free kernel buffer space */
        FREE(dirp2, M_DIRP2);
      }

    /* return the function call result */
    return(result);
  }

But it doesn't seem to be working correctly and I'm not sure why.  The
documentation, what little of it that I could find, not not exactly
forthcoming about this.  I was wondering if you could give me some
advise as to how I should be doing this.  I tried looking though the
source code, but I cannot find exactly where the actuall syscall is
being made.  It seems that there are a multitude of wrapper functions
and #define layers which makes it quite difficult to figure out how
something works.

> Stefan
>

-- 
Daniel Rudy