OK. This clog really seems to work. x*x + y*y - 1 is computed with a ULP less than 0.8. The rest of the errors seem to be due to the implementation of log1p. The ULP of the final answer seems to be never bigger than a little over 2.