editing a binary file

Fri Dec 18 09:13:16 UTC 2009

On Fri, 18 Dec 2009 01:29:18 +0000, Anton Shterenlikht <mexas at bristol.ac.uk> wrote:
>> My bet would be /usr/ports/editors/hexedit. Been a while since I've
>> used it, but AFAIR, it has a curses or a curses like interface, and
>> it's fairly simple to use, yet sufficiently powerful for most normal
>> binary editing. If you want a GUI, I believe gnome (and probably KDE
>> as well) has its own hex editor.
>
> thank you. hexedit does the job on small files, but is quite
> clunky. If I've a xGB file and I need to delete the first and the last
> record, this becomes quite hard, if at all possible.
>
> I didn't appreciate it's not that simple.
>
> Perhaps I can read a file with C and write back? I can't remember if C
> supports binary files, and whether it also writes some record
> delimiters.

Yes, C supports binary files and does not insert spurious 'record
delimiters' unless you instruct it to do so.  It may even be possible to
use one of the scripting languages (Perl or Python) to do the same work.
It's often easier to hack together a solution if you don't have to worry
about some of the details C will require.

I don't know how your record delimiters look like, but here's a small
sample of how Python can read a binary file of 32 bytes and strip the
last 2 bytes of each 16-byte record:

A binary file of two 16-byte records may look like this:

  keramida at kobe:/tmp$ hd binfile 
  00000000  b6 b0 fc 58 96 48 56 d5  e9 10 f0 55 55 67 87 5d  |...X.HV....UUg.]|
  00000010  b0 c9 8b 49 db 53 26 28  57 d6 62 0d d5 1b c4 dc  |...I.S&(W.b.....|
  00000020

Reading the file in chunks of 16 bytes and stripping the last 2 bytes of
each record from Python is only a few lines of code:

  keramida at kobe:/tmp$ python
  Python 2.6.4 (r264:75706, Dec  3 2009, 23:31:07)
  [GCC 4.2.1 20070719  [FreeBSD]] on freebsd9
  Type "help", "copyright", "credits" or "license" for more information.
  >>> ifp = file('binfile')               # open input file for reading
  >>> ofp = file('outfile', 'w')          # open output file for writing
  >>> for rec in range(2):                # we'll transfer 2 records
  ...     bytes = ifp.read(16)            # of 16 bytes each
  ...     obytes = bytes[0:14]            # strip the last two bytes of each record
  ...     ofp.write(obytes)               # push to the output file
  ...
  >>> ifp.close()                         # close input
  >>> ofp.close()                         # close output
  >>>

The output file now looks like this:

  keramida at kobe:/tmp$ hd outfile
  00000000  b6 b0 fc 58 96 48 56 d5  e9 10 f0 55 55 67 b0 c9  |...X.HV....UUg..|
  00000010  8b 49 db 53 26 28 57 d6  62 0d d5 1b              |.I.S&(W.b...|
  0000001c

This is 4 bytes smaller than the original file, and the last two bytes
of each 16-byte record are gone.  Bingo!

Now this example is really a very small and contrived sample of what you
can do.  This script lacks serious error-checking too, and it may be
slightly more involved if you have variable record sizes.  But the
general idea is that it *is* possible to hack together something that
loads and processes binary data.  As long as you know the on-disk format
of the records you are reading, anything goes.