Fast diff command for large files?

Kirk Strauser kirk at
Fri Nov 4 17:29:21 GMT 2005

On Friday 04 November 2005 10:22, Chuck Swiger wrote:

> Multigigabyte?  Find another approach to solving the problem, a text-base
> diff is going to require excessive resources and time.  A 64-bit platform
> with 2 GB of RAM & 3GB of swap requires ~1000 seconds to diff ~400MB.

There really aren't many options.  For the patient, here's what's happening:

Our legacy application runs on FoxPro.  Our web application runs on a 
PostgreSQL database that's a mirror of the FoxPro tables.

We do the mirroring by running a program that dumps the FoxPro tables out as 
tab-delimited files.  Thus far, we'd been using PostgreSQL's "copy from" 
command to read those files into the database.  In reality, though, a very, 
very small percentage of rows in those tables actually change.  So, I wrote 
a program that takes the output of diff and converts it into a series of 
"delete" and "insert" commands; benchmarking shows that this is roughly 300 
times faster in our use.

And that's why I need a fast diff.  Even if it takes as long as the database 
bulk loads, we can run it on another server and use 20 seconds of CPU for 
PostgreSQL instead of 45 minutes.  The practical upshot is that the 
database will never get sluggish, even if the other "diff server" is loaded 
to the gills.
Kirk Strauser
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 155 bytes
Desc: not available
Url :

More information about the freebsd-questions mailing list