Where to define HTTP_ACCEPT_LANGUAGE=fr-fr ???
Jeremy Chadwick
freebsd at jdc.parodius.com
Fri May 20 10:50:37 UTC 2011
On Fri, May 20, 2011 at 11:38:32AM +0200, Frank Bonnet wrote:
> On 05/20/2011 11:27 AM, Jeremy Chadwick wrote:
> >On Fri, May 20, 2011 at 10:23:00AM +0200, Frank Bonnet wrote:
> >>How and WHERE to define this variable in apache22 configuration ???
> >>I need the web server to understand French characters in filenames
> >I haven't worked with this before, but what does "need the webserver to
> >understand French characters in filenames" mean exactly? More details
> >are needed, particularly technical ones. How is Apache "not working"
> >with French characters in filenames?
>
>
> Apache is working BUT if a filename contains a "french" character
> I get a 404 error from apache ( file not found)
>
> here is such error message
>
> xxx.xxx.xxx.xxx - - [20/May/2011:10:55:06 +0200] "GET /cv/ESIEE_ENGINEERING/CV_electronique/11_EE_APP_FE_CV_CISSE_Kaliss%C3%A9.docx
> HTTP/1.1" 404 1221
>
> in fact the file do exists
>
> -rw-r--r-- 1 www-data www-data 15494 20 mai 03:00
> 11_EE_APP_FE_CV_CISSE_Kaliss?.docx
> ^^^^^
> here is the problem
This looks like a character set issue of the browser vs. the filename on
the server. Specifically: the browser is requesting to download a
filename that's in utf-8 (Unicode), while what's on the actual server is
a filename encoded in iso-8859-1.
I'm also making the assumption the letter which shows up in your Email
above is actually the "é" character (latin small letter e with an
acute (raising) accent above it). I hope the below examples therefore
render correctly for you.
Let me explain the two differences:
utf-8
=======
- Filename (visually): 11_EE_APP_FE_CV_CISSE_Kalissé.docx
- Filename (literally): 11_EE_APP_FE_CV_CISSE_Kaliss<0xc3><0xa9>.docx
- Filename (as URL): 11_EE_APP_FE_CV_CISSE_Kaliss%C3%A9.docx
iso-8859-1
============
- Filename (visually): 11_EE_APP_FE_CV_CISSE_Kalissé.docx
- Filename (literally): 11_EE_APP_FE_CV_CISSE_Kaliss<0xe9>.docx
- Filename (as URL): 11_EE_APP_FE_CV_CISSE_Kaliss%E9.docx
URLs, per official RFC 1738, with regards to iso-8859-1, do not permit
characters above 0x7f to make it into the URL. So, technically
speaking, the URL of:
http://somesite/11_EE_APP_FE_CV_CISSE_Kalissé.docx
Should fail or not work. Some browsers may try and "be smart" and turn
the accented small e character into %E9, which would then become:
http://somesite/11_EE_APP_FE_CV_CISSE_Kaliss%E9.docx
Which would work just fine.
I'm not sure that HTTP_ACCEPT_LANGUAGE would fix this problem.
If you have a CGI, PHP script, web software, etc. which is generating
filenames and things like that, and is using utf-8 as it's character set
(meaning either via an HTTP header or via HTML <meta http-equiv> tag),
then that's going to mess things up. You need to be using the
iso-8859-1 character set instead. A good browser will be able to show
you what character set the page shows up as.
What's the alternative? Simple: you start using utf-8 in your
filenames. I should note, however, that FreeBSD (including 8.2-STABLE)
does not have very good Unicode support. It's hit-or-miss, and using
things like LANG/LC_CTYPE result in some serious problems with utilities
that rely on locale(7). So, I would be very careful going this route on
FreeBSD.
The short version is this: if you're going to use utf-8, you need to use
it absolutely 100% of the time. You cannot reliably mix-match character
sets like that.
Hope this helps.
--
| Jeremy Chadwick jdc at parodius.com |
| Parodius Networking http://www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, USA |
| Making life hard for others since 1977. PGP 4BD6C0CB |
More information about the freebsd-apache
mailing list