LAProof.accuracy_proofs.vecnorm_acc
Floating-Point Vector Norm Accuracy Bounds
Error Factors
Main Results
- bfVNRM2: Shows that the floating-point two-norm can be expressed as
the square root of a mixed-error dot product: one copy of the input
appears componentwise perturbed, and a small absolute residual accounts
for underflow.
- bfVNRM1: Shows that the floating-point one-norm equals the exact one-norm of a slightly perturbed input vector.
Dependencies
- preamble, common: basic setup and shared definitions
- dotprod_model, sum_model: relational models of dot product and summation
- float_acc_lems: elementary floating-point accuracy lemmas
- dot_acc, sum_acc: dot product and summation accuracy theorems
From LAProof.accuracy_proofs Require Import
preamble
common
dotprod_model
sum_model
float_acc_lems
dot_acc
sum_acc.
The floating-point two-norm: the square root of the floating-point dot
product of a vector with itself, coerced to a real number.
The exact real-valued two-norm: the square root of the real dot product
of a vector with itself.
Definition two_normR (x : list R) : R :=
sqrt (dotprodR x x).
Variable (x : list (ftype t)).
Notation xR := (map FT2R x).
Notation n := (size x).
Notation g := (@g t).
Notation g1 := (@g1 t).
Notation neg_zero := (@common.neg_zero t).
We assume the floating-point dot product dotprodF x x is finite,
which is necessary for two_normF x to be well-defined.
bfVNRM2 expresses the floating-point two-norm as the square root of
a mixed-error dot product. One copy of the input appears componentwise
perturbed by a relative factor bounded by \(g(n)\), and an
absolute residual bounded by \(g_1(n,n)\) accounts for
underflow.
Lemma bfVNRM2 :
∃ (x' : list R) (eta : R),
two_normF x = sqrt (dotprodR x' xR + eta) ∧
(∀ m, (m < n)%nat →
∃ delta,
nth 0 x' m = FT2R (nth neg_zero x m) × (1 + delta) ∧
Rabs delta ≤ g n) ∧
Rabs eta ≤ g1 n n.
End TwoNorm.
The floating-point one-norm: the real value of the floating-point
left-to-right accumulation of the vector entries.
The exact real-valued one-norm: the sum of a list of real numbers.
We assume the floating-point sum sumF x is finite, which is necessary
for one_normF x to be well-defined.
Hypothesis Hfin : Binary.is_finite (sumF x) = true.
Notation xR := (map FT2R x).
Notation n := (size x).
Notation g := (@g t).
Notation neg_zero := (@common.neg_zero t).
bfVNRM1 shows that the floating-point one-norm equals the exact
one-norm of a perturbed input vector, where each component is perturbed
by a relative factor bounded by \(g(n-1)\).