Sed, shell and hexadecimal character codes

Tue May 27 06:27:29 UTC 2008

Oliver Fromme wrote:
> Karel Miklav wrote:
>  > There's a tip in the FreeBSD fortunes database that says:
>  > 
>  > > Want to strip UTF-8 BOM(Bye Order Mark) from given files?
>  > > 
>  > > sed -e '1s/^\xef\xbb\xbf//' < bomfile > newfile
> 
> FreeBSD's sed(1) doesn't support hexadecimal or octal
> sequences.  I think even gnu sed doesn't support it, but
> you might try it yourself (/usr/ports/textprog/gsed).
> 
> I don't know why that fortunes entry exist.  It's wrong.

That's what I thought. Maybe we should replace the recipe with
the awk version Oliver proposed below?

>  > I can't make it work, and I can't find any other method to
>  > work with hexa codes in scripts or on the command line so
>  > I'm kind-a depressed :) I help myself with xxd now, but if
>  > it is possible to avoid it, I'd like to hear about it.
> 
> There is no standard for handling octal and hexadecimal
> sequences, unfortunately, so you have to consult the
> manual page to find out.  For example, tr(1) supports
> octal sequences only (no hexadecimal), while awk(1)
> supports both.  So the above line could be rewritten
> with awk:
> 
> awk '{if(NR==1)sub(/^\xef\xbb\xbf/, "");print}' < bomfile > newfile