LAProof.accuracy_proofs.float_acc_lems
Floating-Point Operation Accuracy
t .
The mutual exclusion condition \(\delta \cdot \varepsilon = 0\) reflects
the standard decomposition: in the normal range only relative error is incurred, while in the
subnormal range only absolute error is incurred.
Structure
- Signed zero lemmas: behavior of operations when one argument is zero.
- Signed zero facts: FT2R evaluations at signed zeros.
- No-overflow predicates: fma_no_overflow, Bmult_no_overflow, Bplus_no_overflow, Bminus_no_overflow.
- Generic rounding lemma: generic_round_property, which extracts the \((\delta, \varepsilon)\) decomposition from the Flocq library for an arbitrary real number.
- Per-operation accuracy and finiteness lemmas for BFMA, BMULT,
BPLUS, and BMINUS, each in two forms:
- a form assuming a no-overflow hypothesis on real-valued arguments;
- a primed form (fma_accurate', BMULT_accurate', etc.) assuming only that the floating-point result is finite, from which the no-overflow condition is derived automatically.
- Correctness lemmas (BFMA_correct, BMULT_correct, BPLUS_correct): these combine the finiteness of inputs and the rounding identity into a single conclusion.
From LAProof.accuracy_proofs Require Import preamble common.
Section GenFloat.
Context {NAN : FPCore.Nans} {t : type}.
Signed Zero Lemmas
Lemma Bmult_0R (a f : ftype t) :
Binary.is_finite (BMULT a f) →
FT2R a = 0 →
(BMULT a f) = neg_zero ∨ (BMULT a f) = pos_zero.
Lemma Bplus_0R (a f : ftype t) :
Binary.is_finite (BPLUS a f) →
FT2R f = 0 →
FT2R (BPLUS a f) = FT2R a.
Lemma Bfma_mult_0R (a f s : ftype t) :
Binary.is_finite (BFMA a f s) →
FT2R a = 0 →
FT2R (BFMA a f s) = FT2R s.
Signed Zero Values
Fact neg_zero_is_finite :
Binary.is_finite (@neg_zero t).
Fact FT2R_neg_zero :
FT2R (@neg_zero t) = 0.
Fact FT2R_pos_zero :
FT2R (@pos_zero t) = 0.
Fact FT2R_Zconst_0 :
FT2R (Zconst t 0) = 0.
Notation fmax := (@fmax t).
No-Overflow Predicates
Definition fma_no_overflow (x y z : R) : Prop :=
(Rabs (rounded t (x × y + z)) < fmax)%R.
Definition Bmult_no_overflow (x y : R) : Prop :=
(Rabs (rounded t (x × y)) < fmax)%R.
Definition Bplus_no_overflow (x y : R) : Prop :=
(Rabs (Generic_fmt.round Zaux.radix2
(SpecFloat.fexp (fprec t) (femax t))
(BinarySingleNaN.round_mode BinarySingleNaN.mode_NE)
(x + y)) < fmax)%R.
Definition Bminus_no_overflow (x y : R) : Prop :=
(Rabs (Generic_fmt.round Zaux.radix2
(SpecFloat.fexp (fprec t) (femax t))
(BinarySingleNaN.round_mode BinarySingleNaN.mode_NE)
(x - y)) < fmax)%R.
Notation D := (@default_rel t).
Notation E := (@default_abs t).
Generic Rounding Error Decomposition
- In the normal range, error is purely relative: \(\varepsilon = 0\).
- In the subnormal range, error is purely absolute: \(\delta = 0\)
Lemma generic_round_property :
∀ (x : R),
∃ delta epsilon : R,
delta × epsilon = 0 ∧
(Rabs delta ≤ D)%R ∧
(Rabs epsilon ≤ E)%R ∧
Generic_fmt.round Zaux.radix2
(SpecFloat.fexp (fprec t) (femax t))
(BinarySingleNaN.round_mode BinarySingleNaN.mode_NE)
x = (x × (1 + delta) + epsilon)%Re.
Fused Multiply-Add
Lemma fma_accurate :
∀ (x : ftype t) (FINx : Binary.is_finite x)
(y : ftype t) (FINy : Binary.is_finite y)
(z : ftype t) (FINz : Binary.is_finite z)
(FIN : fma_no_overflow (FT2R x) (FT2R y) (FT2R z)),
∃ delta, ∃ epsilon,
delta × epsilon = 0 ∧
Rabs delta ≤ D ∧
Rabs epsilon ≤ E ∧
(FT2R (BFMA x y z) = (FT2R x × FT2R y + FT2R z) × (1 + delta) + epsilon)%Re.
is_finite_fma_no_overflow shows that finiteness of the result of an FMA
implies the no-overflow condition on the exact value \(x \cdot y + z\).
This allows the primed form fma_accurate' to work directly from
a finiteness hypothesis.
Lemma is_finite_fma_no_overflow :
∀ (x y z : ftype t)
(HFINb : Binary.is_finite (BFMA x y z)),
fma_no_overflow (FT2R x) (FT2R y) (FT2R z).
BFMA_finite_e extracts finiteness of each individual argument from
finiteness of the FMA result. This is a standard regularity
property: the IEEE 754 FMA result is non-finite if any input is
non-finite (with the exception of a zero-times-infinity addend, which
produces a NaN).
Lemma BFMA_finite_e :
∀ (a f u : ftype t)
(Hfin : Binary.is_finite (BFMA a f u)),
Binary.is_finite a ∧
Binary.is_finite f ∧
Binary.is_finite u.
is_finite_fma_no_overflow': If all three inputs to an FMA are finite
and no overflow occurs, then the FMA result is finite.
Lemma is_finite_fma_no_overflow' :
∀ (x y z : ftype t)
(Hfinx : Binary.is_finite x = true)
(Hfiny : Binary.is_finite y = true)
(Hfinz : Binary.is_finite z = true)
(Hov : fma_no_overflow (FT2R x) (FT2R y) (FT2R z)),
Binary.is_finite (BFMA x y z) = true.
fma_accurate' is the finiteness-hypothesis form of fma_accurate:
it requires only that the result is finite, deriving the
no-overflow condition and input finiteness internally.
Lemma fma_accurate' :
∀ (x y z : ftype t)
(FIN : Binary.is_finite (BFMA x y z)),
∃ delta, ∃ epsilon,
delta × epsilon = 0 ∧
Rabs delta ≤ @default_rel t ∧
Rabs epsilon ≤ @default_abs t ∧
(FT2R (BFMA x y z) = (FT2R x × FT2R y + FT2R z) × (1 + delta) + epsilon)%Re.
BFMA_correct combines BFMA_finite_e and is_finite_fma_no_overflow
to give the full correctness statement: finiteness of the result
implies finiteness of all inputs and a rounding identity.
Lemma BFMA_correct (a b s : ftype t) :
Binary.is_finite (BFMA a b s) →
(Binary.is_finite a ∧ Binary.is_finite b ∧ Binary.is_finite s) ∧
FT2R (BFMA a b s) =
Generic_fmt.round Zaux.radix2 (SpecFloat.fexp (fprec t) (femax t))
(BinarySingleNaN.round_mode BinarySingleNaN.mode_NE)
(FT2R a × FT2R b + FT2R s).
Floating-Point Multiplication
Lemma BMULT_accurate :
∀ (x y : ftype t)
(FIN : Bmult_no_overflow (FT2R x) (FT2R y)),
∃ delta, ∃ epsilon,
delta × epsilon = 0 ∧
Rabs delta ≤ @default_rel t ∧
Rabs epsilon ≤ @default_abs t ∧
(FT2R (BMULT x y) = (FT2R x × FT2R y) × (1 + delta) + epsilon)%Re.
is_finite_BMULT_no_overflow shows that finiteness of BMULT x y
implies that the exact product \(x \cdot y\) does not overflow.
Lemma is_finite_BMULT_no_overflow :
∀ (x y : ftype t)
(HFINb : Binary.is_finite (BMULT x y)),
Bmult_no_overflow (FT2R x) (FT2R y).
BMULT_accurate' is the finiteness-hypothesis form of
BMULT_accurate.
Lemma BMULT_accurate' :
∀ (x y : ftype t)
(FIN : Binary.is_finite (BMULT x y)),
∃ delta, ∃ epsilon,
delta × epsilon = 0 ∧
Rabs delta ≤ @default_rel t ∧
Rabs epsilon ≤ @default_abs t ∧
(FT2R (BMULT x y) = (FT2R x × FT2R y) × (1 + delta) + epsilon)%Re.
BMULT_finite_e extracts finiteness of each factor from finiteness
of the product.
Lemma BMULT_finite_e :
∀ (a b : ftype t)
(Hfin : Binary.is_finite (BMULT a b)),
Binary.is_finite a ∧ Binary.is_finite b.
BMULT_correct gives the full correctness statement for multiplication:
finiteness of the result implies finiteness of the operands and a
rounding identity.
Lemma BMULT_correct (a b : ftype t) :
Binary.is_finite (BMULT a b) →
(Binary.is_finite a ∧ Binary.is_finite b) ∧
FT2R (BMULT a b) =
Generic_fmt.round Zaux.radix2 (SpecFloat.fexp (fprec t) (femax t))
(BinarySingleNaN.round_mode BinarySingleNaN.mode_NE)
(FT2R a × FT2R b).
Floating-Point Addition
BPLUS_accurate establishes the standard relative rounding error
model for floating-point addition BPLUS x y.
Lemma BPLUS_accurate :
∀ (x : ftype t) (FINx : Binary.is_finite x)
(y : ftype t) (FINy : Binary.is_finite y)
(FIN : Bplus_no_overflow (FT2R x) (FT2R y)),
∃ delta,
Rabs delta ≤ @default_rel t ∧
(FT2R (BPLUS x y) = (FT2R x + FT2R y) × (1 + delta))%Re.
is_finite_sum_no_overflow shows that finiteness of the result
implies the no-overflow condition on the exact sum \(x + y\).
Lemma is_finite_sum_no_overflow :
∀ (x y : ftype t)
(HFINb : Binary.is_finite (BPLUS x y)),
Bplus_no_overflow (FT2R x) (FT2R y).
no_overflow_sum_is_finite is the converse direction: if both
summands are finite and the exact sum does not overflow, then
the result is finite.
Lemma no_overflow_sum_is_finite :
∀ (x y : ftype t)
(H1 : Binary.is_finite x)
(H2 : Binary.is_finite y)
(Hov : Bplus_no_overflow (FT2R x) (FT2R y)),
Binary.is_finite (BPLUS x y).
BPLUS_accurate' is the finiteness-hypothesis form of
BPLUS_accurate.
Lemma BPLUS_accurate' :
∀ (x y : ftype t)
(FIN : Binary.is_finite (BPLUS x y)),
∃ delta,
Rabs delta ≤ @default_rel t ∧
(FT2R (BPLUS x y) = (FT2R x + FT2R y) × (1 + delta))%Re.
BPLUS_finite_e extracts finiteness of each summand from finiteness
of the sum.
Lemma BPLUS_finite_e :
∀ (a b : ftype t)
(Hfin : Binary.is_finite (BPLUS a b)),
Binary.is_finite a ∧ Binary.is_finite b.
BPLUS_correct gives the full correctness statement for
floating-point addition:
finiteness of the result implies finiteness of each summand and
relates the floating-point result to the rounded real result.
Lemma BPLUS_correct (a b : ftype t) :
Binary.is_finite (BPLUS a b) →
(Binary.is_finite a ∧ Binary.is_finite b) ∧
FT2R (BPLUS a b) =
Generic_fmt.round Zaux.radix2 (SpecFloat.fexp (fprec t) (femax t))
(BinarySingleNaN.round_mode BinarySingleNaN.mode_NE)
(FT2R a + FT2R b).
Floating-Point Subtraction
Lemma BMINUS_accurate :
∀ (x : ftype t) (FINx : Binary.is_finite x)
(y : ftype t) (FINy : Binary.is_finite y)
(FIN : Bminus_no_overflow (FT2R x) (FT2R y)),
∃ delta,
Rabs delta ≤ @default_rel t ∧
(FT2R (BMINUS x y) = (FT2R x - FT2R y) × (1 + delta))%Re.
is_finite_minus_no_overflow shows that finiteness of the floating-point
result implies that the exact difference does not overflow.
Lemma is_finite_minus_no_overflow :
∀ (x y : ftype t)
(HFINb : Binary.is_finite (BMINUS x y)),
Bminus_no_overflow (FT2R x) (FT2R y).
no_overflow_minus_is_finite is the converse direction for
subtraction: if both operands are finite and the exact difference
does not overflow, then the result is finite.
Lemma no_overflow_minus_is_finite :
∀ (x y : ftype t)
(H1 : Binary.is_finite x)
(H2 : Binary.is_finite y)
(Hov : Bminus_no_overflow (FT2R x) (FT2R y)),
Binary.is_finite (BMINUS x y).
BMINUS_accurate' is the finiteness-hypothesis form of
BMINUS_accurate.
Lemma BMINUS_accurate' :
∀ (x y : ftype t)
(FIN : Binary.is_finite (BMINUS x y)),
∃ delta,
Rabs delta ≤ @default_rel t ∧
(FT2R (BMINUS x y) = (FT2R x - FT2R y) × (1 + delta))%Re.
BMINUS_finite_sub extracts finiteness of each operand from
finiteness of the difference.