sed - remove nul lines from file
freebsd at edvax.de
Tue Nov 7 18:37:05 UTC 2017
On Tue, 7 Nov 2017 12:12:55 -0500, James B. Byrne via freebsd-questions wrote:
> I have a data file created by an ancient proprietary scripting
> language called QTP. There is a bug in this program which, on
> occasion, manifests itself by inserting output records consisting
> entirely of nul (^@) (\x00) bytes at regular intervals. In the
> present case every 47th. record consists entirely of nuls.
If you know that the 7th line is to be removed, awk can do
$ awk '(NR != 7)' < infile.txt > outfile.txt
This will print all lines except the 7th one with the NULs.
But if it's not the 7th line, you need a more flexible solution.
> The purpose of this data file is to feed a psql COPY statement for
> loading into a PostgreSQL database. The presence of the NUL
> characters prevents this. I have previously used the tr utility to
> remove the NUL characters but this requires me to manually remove the
> residual empty lines.
In this case, awk can also help:
$ awk '(length > 0)' < infile.txt > outfile.txt
This will print all lines which are longer than 0 characters.
> I have tried various permutations of the sed invocation reproduced
> below to remove these lines directly but without success. The
> examples that I have found on StackExchange and various other
> self-help sites do not give the results claimed, at least not for me
> on FreeBSD. So, I would appreciate if anyone here can point out what I
> am doing wrong or how the sed on FreeBSD differs in behaviour for that
> used in the examples I have found.
> Given a file INFILE with records containing the following:
> . . .
> *93566000008166*,*CCTL*,*3072 49534494 *
> *93566000008166*,*CCTL*,*3072 49534493 *
> *93566000008166*,*CCTL*,*3072 49534497 *
> *93566000015962*,*CCTL*,*8156 4171000541 *
> ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@ . . .
> *93566000198850*,*CCTL*,*417 1003874 *
> *93566000010320*,*CCTL*,*8084 2601553853102 *
> . . .
> I wish to remove (all) the line(s) with the nul (^@) characters. I
> have tried this:
> sed '/^\x00*$/d' INFILE > INFILE.sed
> and this:
> sed _E '/^\x00*$/d' INFILE > INFILE.sed
> but neither these nor the many other combinations that I have tried
> remove the lines. What is the method of accomplishing this in sed or
> is it not possible?
I'd suggest using the tr utility, especially with the -d option
which does not translate, but delete characters:
$ tr -d '\000' < infile.txt > outfile.txt
This of course leaves an empty line (as the trailing \n will not
be translated), so using the awk step in combination would help:
$ tr -d '\000' < infile.txt | awk '(length > 0)' > outfile.txt
This will remove the entire lines with the NULs, no matter at
which line position they appear in the input file.
Happy FreeBSD user since 4.0
Andra moi ennepe, Mousa, ...
More information about the freebsd-questions