Re: pkgs contain non URL safe characters

From: Ronald Klop <ronald-lists_at_klop.ws>
Date: Tue, 01 Mar 2022 12:06:42 UTC
On 3/1/22 12:57, Ronald Klop wrote:
> On 2/17/22 03:05, Aristedes Maniatis wrote:
>> Just to check this behaviour, I used tcpdump to see what the request looked like from pkg-fetch.
>>
>>
>>      123.ish.com.au.15580 > pkg0.twn.freebsd.org.http: Flags [P.], cksum 0x80e0 (incorrect -> 0xfc82), seq 1:184, ack 1, win 1027, options [nop,nop,TS val 975600196 ecr 3136747760], length 183: HTTP, length: 183
>>      GET /FreeBSD:13:amd64/quarterly/All/openjdk11-11.0.13+8.1.pkg HTTP/1.1
>>      Host: pkgmir.geo.freebsd.org
>>      Accept: */*
>>      User-Agent: pkg/1.17.5
>>      Range: bytes=6733824-
>>      Connection: close
>>
>>
>> You can see in there that the + is not URL encoded. Is it expected that pkg uses URL standards for its repository? If not, any advice on how to host a repository on a commercial service like AWS cloudfront?
>>
>> Should we rewrite all our files with + symbols to spaces? Should pkg names only contain URL safe characters? Or should pkg-fetch be fixed to encode URLs?
>>
>>
>> I took a quick look at the source for pkg.c and where it calls fetchXGet but I can't understand where any URL encoding might happen.
>>
>>
>> Ari
>>
>>
>> On 14/2/2022 11:18am, Aristedes Maniatis wrote:
>>> Some packages contain "+" symbol which is a way of encoding spaces in a URL. This means that I'm having trouble hosting our pkg repository behind cloudfront/S3.
>>>
>>> I wasn't sure where to post this issue, so I put more details here: https://github.com/freebsd/poudriere/issues/976
>>>
>>>
>>> Is there a workaround for this issue? Could pkg-fetch escape such characters when interacting with a http repository?
>>>
>>>
>>> Cheers
>>>
>>> Ari
>>>
>>>
> 
> 
> Hi,
> 
> I looked into this a bit and did not see another answer yet on the ML.
> 
> I think this describes it pretty clearly and also points to official HTTP specifications.
> https://stackoverflow.com/questions/2678551/when-should-space-be-encoded-to-plus-or-20
> 
> TL;DR:
> The + character is not special in this part of the URL. The request send by pkg is compliant to the specs.
> 
> I'm aware of having specs and having what browsers and servers do in real life.
> Why does Cloudfront decode a + to a space in this part of the URL?
> 
> Regards,
> Ronald.
> 


Ah, I looked into the linked github issue now. I think you should look into the authentication to the cloudfront server as the error you get is "403 Forbidden".
It might have nothing to do with the plus/+ in the name.

Ronald.