FPBench Community Meetings

April 4, 2024

Rigorous Error Analysis for Logarithmic Number Systems

Thanh Son Nguyen, University of Utah

Logarithmic Number Systems (LNS) hold considerable promise in helping reduce the number of bits needed to represent a high dynamic range of real-numbers with finite precision, and also efficiently support multiplication and division. However, under LNS, addition and subtraction turn into non-linear functions that must be approximated—typically using precomputed table-based functions. Additionally, multiple layers of error correction are typically needed to improve result accuracy. Unfortunately, previous efforts have not characterized the resulting error bound. We provide the first rigorous analysis of LNS, covering detailed techniques such as co-transformation that are crucial to implementing subtraction with reasonable accuracy. We provide theorems capturing the error due to table interpolations, the finite precision of pre-computed values in the tables, and the error introduced by fix-point multiplications involved in LNS implementations. We empirically validate our analysis using a Python implementation, showing that our analytical bounds are tight, and that our testing campaign generates inputs diverse-enough to almost match (but not exceed) the analytical bounds. We close with discussions on how to adapt our analysis to LNS systems with different bases and also discuss many pragmatic ramifications of our work in the broader arena of scientific computing and machine learning.

March 7, 2024

Low-bit Quantization for Efficient and Accurate LLM Serving

Chien-Yu Lin

The growing demand for Large Language Models (LLMs) in applications such as content generation, intelligent chatbots, and sentiment analysis poses considerable challenges for LLM service providers. To efficiently use GPU resources and boost throughput, batching multiple requests has emerged as a popular paradigm; to further speed up batching, LLM quantization techniques reduce memory consumption and increase computing capacity. However, prevalent quantization schemes (e.g., 8-bit weight-activation quantization) cannot fully leverage the capabilities of modern GPUs, such as 4-bit integer operators, resulting in sub-optimal performance. To maximize LLMs' serving throughput, we introduce Atom, a low-bit quantization method that achieves high throughput improvements with negligible accuracy loss. Atom significantly boosts serving throughput by using low-bit operators and considerably reduces memory consumption via low-bit quantization. It attains high accuracy by applying a novel mixed-precision and fine-grained quantization process. We evaluate Atom on 4-bit weight-activation quantization setups in the serving context. Atom improves end-to-end throughput by up to 7.73× compared to the FP16 and by 2.53× compared to INT8 quantization, while maintaining the same latency target.

February 1, 2024

Towards optimized multiplierless arithmetic circuits

Dr. Rémi Garcia, University of Rennes

Multiplication is a basic operator used in many applications. When implemented in embedded systems, e.g. FPGAs, these algorithms require highly optimized hardware. Improving the multiplication implementation is an important part of the final hardware cost reduction. This talk will expose the recent advances on the automatic design of multiplication of a variable with multiple a priori known constants, this problem is called the Multiple Constant Multiplication (MCM) problem. Using Integer Linear Programming (ILP) permits to significantly reduce the hardware cost when MCM is implemented using additions and bit-shifts. The ILP approach has been efficiently used for various hardware design problems but other approaches exists, such as SAT/SMT, Constraint Programming, etc. This presentation will introduce a few concepts relating to this ecosystem.

December 7, 2023

Application-specific arithmetic

Florent de Dinechin (INSA-Lyon/Inria) and Martin Kumm (Fulda University of Applied Sciences)

A software designer has to carefully select the arithmetic that best suits each step of her application, but the choice is constrained: only a handful of operators are supported in hardware, in only in a handful of precisions. Conversely, when designing for hardware or FPGAs, these constraints become degrees of freedom. Any precision can be designed, any number encoding, but also any operation and function, provided you are clever enough to implement it efficiently as a circuit.
This talk will expose some of the opportunities offered by this freedom, and some of the methodologies and tools that allow a designer to build and compose application-specific operators. It may include some inadvertent advertising for our upcoming 800-page book dedicated to these topics.

November 2, 2023

Overview of numerics on the Java Platform: past, present, and possible futures

Joseph D. Darcy, Oracle

Come hear an overview of numeric support of the Java platform, an on-going story ranging over more than a quarter century and spanning standards, specifications, virtual machines, libraries, and compilers, including resolving details as small as 2^-1074. The talk does not assume audience members will already be familiar with the details of the Java platform; attendees are welcome to ask questions about such details.
Joe is a long-time member on the JDK engineering team, first at Sun and later Oracle. As “Java Floating-Point Czar” he has looked after Java numerics including contributing to the design of strictfp, adding numerous floating-point math library methods, and adding hexadecimal floating-point literals to the Java language and library. Joe was a participant in and interim editor of the 2008 revision to the IEEE 754 floating-point standard. Outside of numerics, Joe has done other foundational work on the Java platform including core libraries development, Java language changes including Project Coin, infrastructure improvements, and reviewing platform interface updates, together with a smattering of project management and release management along the way.

October 5, 2023

Customizing Elementary Function Approximations for Hardware Accelerators

Benjamin Carleton and Adrian Sampson, Cornell University

This talk is a work-in-progress report about our efforts to generate customized hardware implementations of fixed-point function approximations for application-specific accelerators. We built a new compiler from FPCore to Calyx, our lab’s IR and compiler for translating high-level descriptions to hardware designs. The interesting part is the need to generate efficient polynomial approximations for elementary functions—and the freedom to customize these implementations for the specific accelerator’s context. We have preliminary (but encouraging) comparisons to a commercial high-level synthesis (HLS) compiler. We will seek feedback on the many possibilities for next steps.

September 7, 2023

A PVS Formalization of Taylor Error Expressions

Jonas Katona, Yale University

Due to the limits of finite precision arithmetic, as one works with floating-point representations of real numbers in computer programs, round-off errors tend to accumulate under mathematical operations and may become unacceptably large. Symbolic Taylor Error Expressions is a technique used in the tool FPTaylor to bound the round-off errors accumulated under differentiable mathematical operations which provides a good trade-off between efficiency and accuracy. In this presentation, I will present a formally verified implementation of Symbolic Taylor Error Expressions in a specification language, automated theorem prover, and typechecker called Prototype Verification System (PVS). I will also go over how this formal specification in PVS will enable the integration of Symbolic Taylor Expressions in PRECiSA, a prototype static analysis tool that can compute verified round-off error bounds for floating-point programs.

August 3, 2023

Automated Reasoning over the Reals

Soonho Kong, Amazon Web Services

In cyber-physical systems such as autonomous driving systems and robotics components, ordinary differential equations and nonlinear functions such as the exponential function and the trigonometric functions appear almost universally. To adopt automated reasoning techniques in these domains, it is necessary to develop a tool that can handle logical encodings with those functions. In this talk, I will present the design and implementation of a delta-decision procedure, dReal. First, we will discuss the theory of delta-decidability briefly. Then I will explain 1) how the tool handles nonlinear functions and ordinary differential equations, 2) how it can solve optimization and synthesis problems (∃∀-formulas), and 3) how the tool utilizes multiple cores to speed up the solving process in parallel. I will show that the tool helps tackle instances from real-world applications including bounded model-checking for cyber-physical systems, nonlinear global optimization, and Lyapunov-stability analysis in robotics.

July 6, 2023

FPTalks 2023

June 1, 2023

Floating-Point Education Roundtable

Mike Lam, James Madison University

Members of the FPBench community are well-aware of the pitfalls of floating-point representation, but awareness among the wider population of software authors is often painfully inadequate. There is a surprising dearth of quality educational materials and techniques for teaching these issues to students (both undergraduate and graduate). In this roundtable discussion, we’ll look at some of the current approaches and brainstorm possible approaches to improve the state of the art.

May 4, 2023

Floating Point Accuracy Anecdotes from Real-world Products

Shoaib Kamil, Adobe Research

Floating point issues must often be surmounted when building software products that rely on precise computations while still demanding performance. In this talk, I describe two scenarios where existing tools fall short. First, I will describe our efforts to utilize interval computation to better bound quantities representing real numbers, and the rabbit hole we encountered when trying to find an existing off-the-shelf cross-platform interval computation library. From this exploration, we built a cross-platform interval benchmark suite using hand-crafted expressions from real-world code and from FPBench, and evaluated four existing interval libraries for correctness, interval width, and speed. Second, I will describe our experiences using GPU floating point math to emulate integer computations, and the pitfalls even when computing with exactly-representable floating point numbers.

April 6, 2023

Formal and Semi-Formal Verification of Floating-Point Computations in C Programs

Samuel D. Pollard and Ariel Kellison, Sandia National Laboratories

For better or worse, the world runs on C, and to a lesser extent, C with IEEE 754 floating point. At Sandia Labs, we often are tasked with the formal verification of high-consequence programs in C. Two methods we use at Sandia to accomplish this verification are: fully constructive proofs of correctness (in the Coq theorem prover), and deductive verification (using Frama-C). In both cases, the addition of numerical computations adds complexity to the programs, via numerical error (e.g., discretization or roundoff error), or run-time errors (such as floating-point exceptions). We discuss our efforts at Sandia Labs to both make numerical models of C programs using Frama-C and its ANSI C specification language (ACSL), as well as functional models in Coq, and how to check whether the relevant C code implements these specifications.

March 2, 2023

Automatically generating numerical PDE solvers with Finch

Eric Heisler, University of Utah

Finch is a domain-specific language for numerically solving differential equations. It employs a variety of numerical techniques, generates code for a number of different target frameworks and languages, and allows almost arbitrary equation input. In addition, it accepts unrestricted modification of the generated code to provide maximal flexibility and hand-optimization. One downside to this (perhaps excessive) flexibility is that numerical instability and the occasional typo can easily turn a solution into a pile of NaNs.

Eric will introduce Finch, and demonstrate some of the features relevant to the numerics. We will look at some examples where things turn out just the way we hoped, and some where they don't.

Dec 8, 2022

Low Precision FP8 Floating-Point Formats for Machine Learning – Background and Industry Status

Marius Cornea, Intel

Marius Cornea, a senior principal engineer at Intel working on math libraries and floating-point (and one of the coauthors of the IEEE Standard 754-2008), will be presenting about new industry standards for low-precision formats. The presentation will examine the reasons for which FP8 formats have emerged, definition details, and variations proposed by industry and academia. We will enumerate a few known (mostly planned) implementations of FP8. We will also look at current standardization work for FP8 formats.

November 3, 2022

A Function-Based Approach to Interactive High-Precision Volumetric Design and Fabrication

Chris Uchytil, Mechanical Engineering, University of Washington

We present a novel computer-aided design (CAD) modeling system designed to support a modeling range that matches the fabrication range of modern additive manufacturing (AM) systems that are capable of producing large-scale, high-resolution parts. To be useful to designers and fabricators, a modeling system must perform essential functions (such as execution of modeling operations, visualization, and slicing) at interactive rates, and achieving the necessary performance depends on efficient use of both memory and computing capacity. Our approach to achieving necessary performance levels is to implement an implicit function-based representation (f-rep) modeling system, not using a computer algebra system, but instead using just-in-time (JIT) compilation of user-defined functions. Efficient memory usage is achieved via a sparse volume data structure that builds on previous work by Hoetzlein [Hoetzlein 2016]. Computational efficiency is achieved through a combination of interval arithmetic (IA) and massively parallel evaluation on the GPU. We follow [Keeter 2020] and employ IA as the basis for local pruning of the function evaluation tree to minimize the required number of function evaluations, and we take advantage of GPU-parallelism to significantly enhance computational throughput.

October 6, 2022

Finding Inputs that Trigger Floating-Point Exceptions in GPUs via Bayesian Optimization

Ignacio Laguna, Center for Applied Scientific Computing, Lawrence Livermore National Laboratory

Testing code for floating-point exceptions is crucial as exceptions can quickly propagate and produce unreliable numerical answers. The state-of-the-art to test for floating-point exceptions in GPUs is quite limited and solutions require the application's source code, which precludes their use in accelerated libraries where the source is not publicly available. We present an approach to find inputs that trigger floating-point exceptions in black-box GPU functions, i.e., functions where the source code and information about input bounds are unavailable. Our approach is the first to use Bayesian optimization (BO) to identify such inputs and uses novel strategies to overcome the challenges that arise in applying BO to this problem. We implement our approach in the Xscope framework and demonstrate it on 58 functions from the CUDA Math Library and functions from ten HPC programs. Xscope is able to identify inputs that trigger exceptions in about 72% of the tested functions.

September 2, 2022

Birds-of-a-feather

In this session, community members discussed challenges connecting various floating point analysis and optimization tools.

August 4, 2022

Database Workbench: an Interface for Mixed-initiative Design Space Search

Edward Misback, Computer Science & Engineering, University of Washington

The Herbie web demo is intended as an accessible tool for making a rough guess at the best way to improve the accuracy of a floating-point expression. It aims to support students learning about floating point error, casual users who just want a high-accuracy replacement expression, and expert users who are comfortable interpreting its results. However, it is designed as a single-step tool and leaves little room for users to iterate or adapt it to numerical analysis tasks, even though its underlying functions have broader applications. We describe how an alternative “database workbench” interface will allow users to leverage Herbie’s power while exploring the design space of replacement expressions more flexibly, including support for the development of third-party expression analysis plugins. We would like to discuss with the FPBench community whether the workflow we describe is compatible with other numerical analysis tools and needs, either as plugins or as separate workbenches with a similar interface. Slides

July 6, 2022

FPTalks 2022!

A great day of talks from folks across the broader community!

May 5, 2022

Automatic Datapath Optimization using E-Graphs

Samuel Coward, Intel & Imperial College London

RTL development is hampered by low design space exploration and long debug times. High Level Synthesis and ML techniques are not tackling the nature of optimization RTL teams are doing in complex Graphics type IP. This talk provides initial results on: automatically transforming RTL , exploring the design space, how Intel GFx knowhow is exploited, associated formal verification and how the foundation of the system is using the latest e-graph technology.

April 7, 2022

Choosing Mathematical Function Implementations for Speed and Accuracy

Ian Briggs, Computer Science, University of Utah

Standard implementations of functions like sin and exp optimize for accuracy, not speed, because they are intended for general-purpose use. But just like many applications tolerate inaccuracy from cancellation, rounding error, and singularities, many applications could also tolerate less-accurate function implementations. This raises an intriguing possibility: speeding up numerical code by using different function implementations. Enter OpTuner, an automated tool for selecting the best implementation for each mathematical function call site. OpTuner uses error Taylor series and integer linear programming to compute optimal assignments of 297 function implementations to call sites and presents the user with a speed-accuracy Pareto curve. In a case study on the POV-Ray ray tracer, OpTuner speeds up a critical computation by 2.48×, leading to a whole program speedup of 1.09× with no change in the program output; human efforts result in slower code and lower-quality output. On a broader study of 36 standard benchmarks, OpTuner demonstrates speedups of 2.05× for negligible decreases in accuracy and up to 5.37× for error-tolerant applications.

May 3, 2022

CORE-MATH Correctly Rounded Mathematical Functions ... Up to the End User

Paul Zimmerman, Research Fellow at INRIA

The mission of the CORE-MATH project is to provide off-the-shelf open-source mathematical functions with correct rounding that can be integrated into current mathematical libraries (GNU libc, Intel Math Library, AMD Libm, Newlib, OpenLibm, Musl, Apple Libm, llvm-libc, CUDA libm, ROCm). This presentation covers first results in single-precision (binary32). Slides

February 3, 2022

Formal Verification of Numerical Methods for Hamiltonian Systems

Ariel Kellison, Computer Science, Cornell University

January 6, 2022

Birds-of-a-feather

In this session, community members discussed challenges in training folks to tackle numerical challenges, both in university classrooms and in engineering contexts.

December 2, 2021

Automated Discovery of Invertibility Conditions via Syntax-Guided Synthesis

Andrew Reynolds, Computer Science, University of Iowa

Satisfiability Modulo Theories (SMT) solvers have gained widespread popularity as reasoning engines in numerous formal methods applications, including in syntax-guided synthesis (SyGuS) applications. In this talk, we focus on an emerging class of applications for syntax-guided synthesis, namely, the use of a SyGuS solver to (partially) automate further development and improvements to the SMT solver itself. We describe several recent features in the SMT solver cvc5 that follow this paradigm. These include syntax-guided enumeration of rewrite rules, selection of test inputs, and discovery of solved forms for quantified constraints. For the latter, we describe how syntax-guided synthesis recently played a pivotal role in the development of a new algorithm for quantified bit-vector constraints based on invertibility conditions, and how this technique can be extended for similar algorithms that target quantified constraints in other theories, including the theory of floating points.

November 4, 2021

Birds-of-a-feather

In this session, community members discussed the challenge of porting applications across diverse number systems: "I have software written in number system X, and hardware with accelerator calls that use number system Y. How should I handle adapting my workload, from all implemented in number system X, to a mixed format with some operations in number system Y?"

October 7, 2021

Lazy Exact Real Arithmetic Using Floating Point Operations

Ryan McCleeary, Research Engineer, Numerica

Exact real arithmetic systems can specify any amount of precision on the output of the computations. They are used in a wide variety of applications when a high degree of precision is necessary. Some of these applications include: differential equation solvers, linear equation solvers, large scale mathematical models, and SMT solvers. This talk describes a new exact real arithmetic system which uses lazy list of floating point numbers to represent the real numbers. It details algorithms for basic arithmetic computations on these structures and proves their correctness. This proposed system has the advantage of algorithms which can be supported by modern floating point hardware, while still being a lazy exact real arithmetic system

August 5, 2021

State of FPBench

In this meeting, the community discussed what the community might most productively focus on in the coming year. The discussion was lively and landed on three major thrusts: (1) categorizing benchmarks by domain and improving licensing, (2) developing "FP course modules" that instructors can easily incorporate into their courses, and (3) increasing industrial outreach to encourage FPCore adoption as a formal, vendor-neutral specification language.

July 1, 2021

Scaling Error Analysis to Billions of FLOPs

Sam Pollard, Sandia National Laboratories

Sam Pollard shared his work on floating-point error analysis of kernels with many FLOPs (10^9 or more), by combining FPTaylor and analytical error bounds on inner products.

May 6, 2021

RLIBM-32: Fast and Correct 32-bit Math Libraries

Jay Lim, Computer Science, Rutgers University

RLIBM-32 provides a set of techniques to develop correctly rounded math libraries for 32-bit float and posit types. It enhances our RLibm approach that frames the problem of generating correctly rounded libraries as a linear programming problem in the context of 16-bit types to scale to 32-bit types. Specifically, this talk with detail new algorithms to (1) generate polynomials that produce correctly rounded outputs for all inputs using counterexample guided polynomial generation, (2) generate efficient piecewise polynomials with bit-pattern based domain splitting, and (3) deduce the amount of freedom available to produce correct results when range reduction involves multiple elementary functions. The resultant math library for the 32-bit float type is faster than state-of-the-art math libraries while producing the correct output for all inputs. We have also developed a set of correctly rounded elementary functions for 32-bit posits.

April 1, 2021

SHAMAN: Evaluating Numerical Error for Real C++ Code

Nestor Demeure, Computer Science, University Paris Saclay

Shaman is a C++11 library that use operator overloading and a novel method to evaluate the numerical accuracy of an application. It has been designed to target high-performance simulations and, thus, we insured that it is not only accurate but also: is fast enough to be tested on very large simulations, is compatible with all of C++ mathematical functions, and is threadsafe and compatible with both OpenMP and MPI. In this talk, Nestor gave a live demo of Shaman's capabilities and discuss its novel design features.

March 4, 2021

POP: Fast and Efficient Bit-level Precision Tuning

Dorra Ben Khalifa, Informatics, Perpignan University

Dorra gave a great demo of POP, a tool from her thesis work on precision tuning. Summarizing from her abstract: POP establishes novel techniques for precision tuning. The main idea of our approach is based on semantic modeling of the propagation of the numerical errors throughout the program source. This yields a system of constraints whose minimal solution gives the best tuning of the program. Based on a static analysis approach, we formulate the problem of precision tuning with two different methods. The first method combines a forward and a backward error analysis which are two popular paradigms of error analysis. Next, our analysis is expressed as a set of linear constraints, made of propositional logic formulas and relations between integer elements only, checked by a SMT solver. The second method consists of generating an Integer Linear Problem (ILP) from the program. This is done by reasoning on the most significant bit and the number of significant bits of the values which are integer quantities. The integer solution to this problem, computed in polynomial time by a classical linear programming solver, gives the optimal data types at bit-level. A finer set of semantic equations is also proposed which does not reduce directly to an ILP problem. We use the policy iteration technique to find a solution. The POP tool implements both methods, and in this session Dorra demonstrateed it across several benchmarks coming from various application domains such as embedded systems, Internet of Things (IoT), physics, etc.

February 4, 2021

Keeping Science on Keel When Software Moves

Allison Baker, National Center for Atmospheric Research

Allison lead a great discussion of her recent CACM article Keeping Science on Keel When Software Moves.

High performance computing (HPC) is central to solving large problems in science and engineering through the deployment of massive amounts of computational power. The development of important pieces of HPC software spans years or even decades, involving dozens of computer and domain scientists. During this period, the core functionality of the software is made more efficient, new features are added, and the software is ported across multiple platforms. Porting of software in general involves the change of compilers, optimization levels, arithmetic libraries, and many other aspects that determine the machine instructions that actually get executed. Unfortunately, such changes do affect the computed results to a significant (and often worrisome) extent. In a majority of cases, there are not easily definable a priori answers one can check against. A programmer ends up comparing the new answer against a trusted baseline previously established or checks for indirect confirmations such as whether physical properties such as energy are conserved. However, such non-systematic efforts might miss underlying issues, and the code may keep misbehaving until these are fixed.

In this session, Allison presented real-world evidence to show that ignoring numerical result changes can lead to misleading scientific conclusions. She presented techniques and tools that can help computational scientists understand and analyze compiler effects on their scientific code. These techniques are applicable across a wide range of examples to narrow down the root-causes to single files, functions within files, and even computational expressions that affect specific variables. The developer may then rewrite the code selectively and/or suppress the application of certain optimizations to regain more predictable behavior.

January 7, 2021

Community Demos

Tanmay Tirpankar https://github.com/arnabd88/Satire

Mike Lam https://github.com/crafthpc/floatsmith

December 3, 2020

Rival: An Interval Arithmetic for Robust Error Estimation

Oliver Flatt, Computer Science, University of Utah

Interval arithmetic is a simple way to compute a mathematical expression to an arbitrary accuracy, widely used for verifying floating-point computations. Yet this simplicity belies challenges. Some inputs violate preconditions or cause domain errors. Others cause the algorithm to enter an infinite loop and fail to compute a ground truth. Plus, finding valid inputs is itself a challenge when invalid and unsamplable points make up the vast majority of the input space. These issues can make interval arithmetic brittle and temperamental.

Rival introduces three extensions to interval arithmetic to address these challenges. Error intervals express rich notions of input validity and indicate whether all or some points in an interval violate implicit or explicit preconditions. Movability flags detect futile recomputations and prevent timeouts by indicating whether a higher-precision recomputation will yield a more accurate result. And input search restricts sampling to valid, samplable points, so they are easier to find. We compare these extensions to the state-of-the-art technical computing software Mathematica, and demonstrate that our extensions are able to resolve 60.3% more challenging inputs, return 10.2× fewer completely indeterminate results, and avoid 64 cases of fatal error.

November 2, 2020

FPY: The future of FPCore

Bill Zorn, Computer Science & Engineering, University of Washington

FPY is a new language for specifying benchmarks that is meant to be easier to write for larger programs while retaining compatibility with existing FPCore-based tools. FPY repackages the precise semantics of FPCore under more familiar and convenient Python-like syntax. This talk will discuss the prototype design and highlight new conveniences FPY affords over FPCore. Slides