Fast diff command for large files?
kirk at strauser.com
Mon Nov 7 15:48:41 GMT 2005
On Sunday 06 November 2005 07:39, Andrew P. wrote:
> Note, that the difference must be kept in RAM, so it won't work if there
> are multi-gig diffs, but it will work very fast if the diffs are only
> 10-100Mb, it will work at close to I/O speed if the diff is under 10Mb.
Thanks, Andrew! My Python script runs that algorithm in 17 seconds on a
400MB file with 10% CPU.
For anyone interested, here's my implementation. Note that the readline()
method in Python always returns something, even at EOF (at which point you
get an empty string). Also, empty strings evaluate as "false", which is
why the "if not (oldline or newline): break" code exits at the end.
old_records = 
new_records = 
oldline, newline = oldfile.readline(), newfile.readline()
if not (oldline or newline):
if oldline == newline:
> Hope this gives you some idea.
It did. It must've been a long work week, because that all seems so obvious
in retrospect but was completely opaque at the time. Thanks again!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 155 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-questions/attachments/20051107/028a57a8/attachment.bin
More information about the freebsd-questions