Re: BSD-awk print() Behavior

From: Sysadmin Lists <sysadmin.lists_at_mailfence.com>
Date: Tue, 21 Feb 2023 11:53:14 UTC
> ----------------------------------------
> From: Andreas Kusalananda Kähäri <andreas.kahari@abc.se>
> Date: Feb 21, 2023, 2:14:21 AM
> To: Sysadmin Lists <sysadmin.lists@mailfence.com>
> Cc: Freebsd Questions <freebsd-questions@freebsd.org>
> Subject: Re: BSD-awk print() Behavior
> 
> 
> On Tue, Feb 21, 2023 at 01:24:41AM +0100, Sysadmin Lists wrote:
> >
> > $ cat file_{1,2}
> > https://github.com/
> > https://github.com/
> > https://github.com/
> > https://github.com/
> > 
> > $ diff file_{1,2}  
> > 1,2c1,2
> > < https://github.com/
> > < https://github.com/
> > ---
> > > https://github.com/
> > > https://github.com/
> > 
> > $ awk '{ print $0 " abc " }' file_{1,2}  
> >  abc ://github.com/
> >  abc ://github.com/
> > https://github.com/ abc 
> > https://github.com/ abc 
> 
> file_1 is a DOS text file, while file_2 is a Unix text file.  The DOS
> text file, when interpreted by tools expecting Unix text, has an extra
> carriage-return character at the end of each line.  This carriage-return
> character will be part of $0 in the awk code and causes the cursor to be
> moved back to the start of the line when printing it, giving the effect
> that you are seeing.
> 
> This has nothing to do with awk's print keyword.  You would get similar
> strange result if you simply pasted the data side by side:
> 
> 	$ paste file_{1,2}
> 	https://https://github.com/
> 	https://https://github.com/
> 
> Here, "https://github.com/" is first printed from the DOS text file,
> after which the cursor is returned to the start of the line.  Then,
> paste inserts a tab character which "steps over" the eight first
> characters that had already been outputted ("https://") and then outputs
> "https://github.com/" from the Unix text file.
> 
> 
> > 
> > The sql-extracted URLs cause awk's print() to replace the front of the string
> > with text following $0. file_2 does not. I used vim's `:set list' option to
> > view hidden chars, but there's no apparent difference between the two --
> > although `diff' clearly thinks so. Both files show this when `list' is set:
> > 
> > https://github.com/$
> > https://github.com/$
> 
> Yes, because Vim automatically interprets DOS text files as ordinary
> text.  I'm asssuming that while editing file_1 in Vim, you see "[dos]"
> at the bottom of the screen?
> 
> 

Good explanation. I found the hidden character before reading your email using
`cat -e' which printed the ^M character, but didn't know awk could move the
cursor around like that. Sounds like a useful (and dangerous) hack.

$ cat -e file_{1,2} 
https://github.com/^M$
https://github.com/^M$
https://github.com/$
https://github.com/$

vim does indeed say [dos] at the bottom of file_1. Now I know sqlite3 creates
dos files even on unix-like systems.

Thank you both.

-- 
Sent with https://mailfence.com  
Secure and private email