git: f887d0215fb4 - main - msun: fix cbrt iterations from Newton to Halley method

Go to: [ bottom of page ] [ top of archives ] [ this month ]
From: Warner Losh <imp_at_FreeBSD.org>
Date: Mon, 05 May 2025 05:06:02 UTC
The branch main has been updated by imp:

URL: https://cgit.FreeBSD.org/src/commit/?id=f887d0215fb48e682acccf4cb95f3794974e1a9d

commit f887d0215fb48e682acccf4cb95f3794974e1a9d
Author:     Clément Bœsch <u@pkh.me>
AuthorDate: 2025-05-01 17:19:36 +0000
Commit:     Warner Losh <imp@FreeBSD.org>
CommitDate: 2025-05-05 04:52:49 +0000

    msun: fix cbrt iterations from Newton to Halley method
    
    Since we're inverting a cube, we have:
        f(Tₙ)=Tₙ³-x    (1)
    
    Its first and second derivatives are:
        f'(Tₙ)=3Tₙ²    (2)
        f"(Tₙ)=6Tₙ     (3)
    
    Halley iteration[1] uses:
        Tₙ₊₁=Tₙ-2f(Tₙ)f'(Tₙ)/(2f'(Tₙ)²-f(Tₙ)f"(Tₙ))    (4)
    
    Replacing the terms of (4) using (1), (2) and (3):
    
    Tₙ₊₁ = Tₙ-2f(Tₙ)f'(Tₙ)/(2f'(Tₙ)²-f(Tₙ)f"(Tₙ))
         = Tₙ-2(Tₙ³-x)3Tₙ²/(2(3Tₙ²)²-(Tₙ³-x)6Tₙ)
         = <snip, see WolframAlpha[2] alternate forms>
         = Tₙ(2x+Tₙ³)/(x+2Tₙ³)
    
    This formula corresponds to the exact expression used in the code.
    
    Newton formula is Tₙ-f(Tₙ)/f'(Tₙ) which would have simplified to
    (2Tₙ³+x)/(3Tₙ²) instead.
    
    [1] https://en.wikipedia.org/wiki/Halley's_method
    [2] https://www.wolframalpha.com/input?i=T-2%28T%5E3-x%293T%5E2%2F%282%283T%5E2%29%5E2-%28T%5E3-x%296T%29
    
    Note: UTF8 in commit message due to the heavy math being hard to
    recreate w/o it. -- imp
    
    Signed-off-by: Clément Bœsch <u@pkh.me>
    Reviewed by: imp
    Pull Request: https://github.com/freebsd/freebsd-src/pull/1684
---
 lib/msun/src/s_cbrt.c  | 4 ++--
 lib/msun/src/s_cbrtf.c | 4 ++--
 lib/msun/src/s_cbrtl.c | 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/lib/msun/src/s_cbrt.c b/lib/msun/src/s_cbrt.c
index 6bf84243adcd..568a36545216 100644
--- a/lib/msun/src/s_cbrt.c
+++ b/lib/msun/src/s_cbrt.c
@@ -90,7 +90,7 @@ cbrt(double x)
      * the result is larger in magnitude than cbrt(x) but not much more than
      * 2 23-bit ulps larger).  With rounding towards zero, the error bound
      * would be ~5/6 instead of ~4/6.  With a maximum error of 2 23-bit ulps
-     * in the rounded t, the infinite-precision error in the Newton
+     * in the rounded t, the infinite-precision error in the Halley
      * approximation barely affects third digit in the final error
      * 0.667; the error in the rounded t can be up to about 3 23-bit ulps
      * before the final error is larger than 0.667 ulps.
@@ -99,7 +99,7 @@ cbrt(double x)
 	u.bits=(u.bits+0x80000000)&0xffffffffc0000000ULL;
 	t=u.value;
 
-    /* one step Newton iteration to 53 bits with error < 0.667 ulps */
+    /* one step Halley iteration to 53 bits with error < 0.667 ulps */
 	s=t*t;				/* t*t is exact */
 	r=x/s;				/* error <= 0.5 ulps; |r| < |t| */
 	w=t+t;				/* t+t is exact */
diff --git a/lib/msun/src/s_cbrtf.c b/lib/msun/src/s_cbrtf.c
index a225d3edb982..c69e0fa5be12 100644
--- a/lib/msun/src/s_cbrtf.c
+++ b/lib/msun/src/s_cbrtf.c
@@ -50,7 +50,7 @@ cbrtf(float x)
 	    SET_FLOAT_WORD(t,sign|(hx/3+B1));
 
     /*
-     * First step Newton iteration (solving t*t-x/t == 0) to 16 bits.  In
+     * First step Halley iteration (solving t*t-x/t == 0) to 16 bits.  In
      * double precision so that its terms can be arranged for efficiency
      * without causing overflow or underflow.
      */
@@ -59,7 +59,7 @@ cbrtf(float x)
 	T=T*((double)x+x+r)/(x+r+r);
 
     /*
-     * Second step Newton iteration to 47 bits.  In double precision for
+     * Second step Halley iteration to 47 bits.  In double precision for
      * efficiency and accuracy.
      */
 	r=T*T*T;
diff --git a/lib/msun/src/s_cbrtl.c b/lib/msun/src/s_cbrtl.c
index f1950e2d4cef..ff527cc5e5e7 100644
--- a/lib/msun/src/s_cbrtl.c
+++ b/lib/msun/src/s_cbrtl.c
@@ -126,7 +126,7 @@ cbrtl(long double x)
 #endif
 
 	/*
-     	 * Final step Newton iteration to 64 or 113 bits with
+	 * Final step Halley iteration to 64 or 113 bits with
 	 * error < 0.667 ulps
 	 */
 	s=t*t;				/* t*t is exact */