Re: BSD-awk print() Behavior

From: jin guojun <jguojun_at_gmail.com>
Date: Tue, 21 Feb 2023 01:13:27 UTC
Without knowing what hidden character(s) in those files, how one can guess
what happened.

hexdump -C file_{1,2} can show what is the real difference, which may help
to understand what is going on with awk print.

-Jin

On Mon, Feb 20, 2023 at 4:25 PM Sysadmin Lists <sysadmin.lists@mailfence.com>
wrote:

> Trying to wrap my head around what BSD awk is doing here. Although the
> behavior
> is unwanted for this exercise, it seems like a possibly useful feature or
> hack
> for future projects. Either way I'd like to understand what's going on.
>
> I extracted a list of URLs from my browser's history sql file, and when
> iterating over the list with awk got some strange results.
>
> file_1 has the sql-extracted URLs, and file_2 is a copy-paste of that
> file's
> contents using vim's yank-and-paste.
>
> $ cat file_{1,2}
> https://github.com/
> https://github.com/
> https://github.com/
> https://github.com/
>
> $ diff file_{1,2}
> 1,2c1,2
> < https://github.com/
> < https://github.com/
> ---
> > https://github.com/
> > https://github.com/
>
> $ awk '{ print $0 " abc " }' file_{1,2}
>  abc ://github.com/
>  abc ://github.com/
> https://github.com/ abc
> https://github.com/ abc
>
> The sql-extracted URLs cause awk's print() to replace the front of the
> string
> with text following $0. file_2 does not. I used vim's `:set list' option to
> view hidden chars, but there's no apparent difference between the two --
> although `diff' clearly thinks so. Both files show this when `list' is set:
>
> https://github.com/$
> https://github.com/$
>
>
> Here's more background if needed:
>
> I extracted the URLs using sqlite3 like so:
> for f in History-16768665*
> do
>         sqlite3 --bail $f <<-HEREDOC
>                 .mode csv
>                 .output ${f}.csv
>                 select * from urls where url like '%github%';
> HEREDOC
> done
>
> Then tried to create a list of unique URLs using `sort -u' but it broke
> because
> of special chars in the extracted lines (so it claimed). I used awk to get
> a
> unique list instead:
>
> for f in *.csv; do [[ -s $f ]] && list="${list} $f"; done; echo $list
> awk '{ u[$0] } END { for (e in u) print e > "file_1" }' $list
>
> --
> Sent with https://mailfence.com
> Secure and private email
>
>