Re: diff(1) goes into cpu-hogging endless loop

From: Tom Jones <thj_at_freebsd.org>
Date: Sat, 25 Mar 2023 22:29:47 UTC
On Sat, Mar 25, 2023 at 09:55:14PM +0000, Jamie Landeg-Jones wrote:
> Hi, A "diff" of 2 files:
> 
> 1  77,933,904 bytes
> 2  63,013,818 bytes
> 
> , goes into an endless loop, whilst "gdiff" completes the operation in
> about 5 seconds.
> 
> I tested using the latest "diff" from current, and get the same result.
> 
> Splitting both files into 10Mb chunks, and diffing these was successful.
> 
> A ktrace of the "diff" actually stops producing any output after about
> 5 seconds, whilst the cpu looping continues.
> 
> Any ideas on what to do next? Does anyone else get the same result?
> 
> The files are just utf-8 freebsd git logs, and are available here if
> anyone would like to test:
> 
> http://www.catflap.org/jamie/1.xz (13,282,864 bytes)
> http://www.catflap.org/jamie/2.xz (12,221,164 bytes)
> 
> Cheers, Jamie

My guess is that you are hitting a worst case in the stone algorithm. I
have a WIP review to integrate the Myers algorithm from libdiff here:

https://reviews.freebsd.org/D36860

- Tom