Re: BSD-awk print() Behavior
- Reply: Sysadmin Lists : "Re: BSD-awk print() Behavior"
- In reply to: Sysadmin Lists : "BSD-awk print() Behavior"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Tue, 21 Feb 2023 01:13:27 UTC
Without knowing what hidden character(s) in those files, how one can guess what happened. hexdump -C file_{1,2} can show what is the real difference, which may help to understand what is going on with awk print. -Jin On Mon, Feb 20, 2023 at 4:25 PM Sysadmin Lists <sysadmin.lists@mailfence.com> wrote: > Trying to wrap my head around what BSD awk is doing here. Although the > behavior > is unwanted for this exercise, it seems like a possibly useful feature or > hack > for future projects. Either way I'd like to understand what's going on. > > I extracted a list of URLs from my browser's history sql file, and when > iterating over the list with awk got some strange results. > > file_1 has the sql-extracted URLs, and file_2 is a copy-paste of that > file's > contents using vim's yank-and-paste. > > $ cat file_{1,2} > https://github.com/ > https://github.com/ > https://github.com/ > https://github.com/ > > $ diff file_{1,2} > 1,2c1,2 > < https://github.com/ > < https://github.com/ > --- > > https://github.com/ > > https://github.com/ > > $ awk '{ print $0 " abc " }' file_{1,2} > abc ://github.com/ > abc ://github.com/ > https://github.com/ abc > https://github.com/ abc > > The sql-extracted URLs cause awk's print() to replace the front of the > string > with text following $0. file_2 does not. I used vim's `:set list' option to > view hidden chars, but there's no apparent difference between the two -- > although `diff' clearly thinks so. Both files show this when `list' is set: > > https://github.com/$ > https://github.com/$ > > > Here's more background if needed: > > I extracted the URLs using sqlite3 like so: > for f in History-16768665* > do > sqlite3 --bail $f <<-HEREDOC > .mode csv > .output ${f}.csv > select * from urls where url like '%github%'; > HEREDOC > done > > Then tried to create a list of unique URLs using `sort -u' but it broke > because > of special chars in the extracted lines (so it claimed). I used awk to get > a > unique list instead: > > for f in *.csv; do [[ -s $f ]] && list="${list} $f"; done; echo $list > awk '{ u[$0] } END { for (e in u) print e > "file_1" }' $list > > -- > Sent with https://mailfence.com > Secure and private email > >