apache 2.x + php 5.x http post temporary file name non-randomness

Erik Stian Tefre erik at tefre.com
Tue Nov 13 07:04:25 PST 2007


Jeremy Chadwick wrote:
> On Mon, Nov 12, 2007 at 09:21:56PM +0100, Erik Stian Tefre wrote:
>> There seems to be a bug (or feature?) somewhere that limits the number of 
>> unique temporary file names used when storing temporary files that are 
>> uploaded by posting a form. Looking through my webserver logs of 110000 
>> file uploads, I find no more than 495 unique temporary file names which are 
>> being reused again and again.
>> (File name example: /var/tmp/phpzzJuIt)
>>
>> I think PHP is supposed to use mkstemp(). From the mkstemp(3) manual:
>> "The number of unique file names mktemp() can return depends on the number 
>> of `Xs' provided; six `Xs' will result in mktemp() selecting one of 
>> 56800235584 (62 ** 6) possible temporary file names."
>>
>> PHP uses 6 Xs. This makes the low number of observed unique file names 
>> (495) a bit disappointing.
> 
> It sounds as if the limitation in range (56800235584 vs. 495) may be due
> to what's considered a permittable character in a filename.  I'm betting
> the function ANDs the per-byte results, requiring them to be within
> [0-9A-Za-z].  That's (26+26+10)^6.

(26+26+10)^6 = 62^6 = 56800235584. So I guess the limited permittable 
characters are already accounted for in the manual...?

> Based on that, it sounds as if there's no "easy" way to increase the
> entropy.
> 
> I'm not really sure I'd use gettimeofday() for extending this, though.
> If I remember correctly (someone please correct me if I'm wrong):
> 
> * The clock is not a good source of randomness because it's predictable
>   (although in this case it's not the sole source of entropy)

My main concern is random file name collisions, not the predictability 
of file names. The clock fixes the collision problem. But I guess 
predictable file names may be a security problem for some applications.

> * gettimeofday() is an expensive call due to communication with the RTC.

Probably not too expensive when compared to the time and resources used 
for handling the uploaded file in the filesystem etc.

#include <sys/time.h>
int main (int argc, char ** argv) {
         struct timeval tval;
         int i;
         for (i = 0; i < 1000000; i++) {
                 gettimeofday(&tval, NULL);
                 printf("%d %d\n", tval.tv_sec, tval.tv_usec);
         }
         exit(0);
}

%time ./gettime > /dev/null
0.492u 5.824s 0:08.06 78.2%     5+190k 0+0io 0pf+0w

Which is about 125k gettimeofday()s per second (including the useless 
printf()).

> I'm left believing that adding more X's to the path passed to mkstemp()
> would be a better solution, and a more compatible one.

If mkstemp() was behaving as expected and according to the docs, I would 
agree. But it isn't, so I would not be surprised if I found no more than 
495 longer filenames being reused after adding more Xs. ;-)

I'd like to find the real reason for the limited number of unique 
filenames. Maybe it's related to how mkstemp() or its random number 
generator arc4random(3) is used by php and/or apache?

--
Erik


More information about the freebsd-ports mailing list