Knowledge format fundamentals — single precision (FP32) and half precision (FP16)
Now, allow us to take a more in-depth take a look at FP32 and FP16 codecs. FP32 and FP16 are IEEE codecs that characterize floating-point numbers utilizing 32-bit and 16-bit binary storage respectively. Each codecs include three elements: a) signal bit, b) exponent bits, and c) mantissa bits. The distinction between FP32 and FP16 is, The variety of bits allotted to the exponent and mantissa,Because of this, the vary and precision of values fluctuate.
convert FP16 and FP32 to actual values? In response to the IEEE-754 customary, FP32 decimal worth = (-1)^(signal) × 2^(decimal exponent – 127) × (implied main 1 + decimal mantissa), the place 127 is the biased exponent worth. For FP16, the method is (-1)^(signal) × 2^(decimal exponent – 15) × (implied main 1 + decimal mantissa), the place 15 is the corresponding biased exponent worth. For extra details about biased exponent values, see here.
On this sense, the vary of values for FP32 is roughly [-2¹²⁷, 2¹²⁷] ~[-1.7*1e38, 1.7*1e38]The FP16 worth vary is roughly [-2¹⁵, 2¹⁵]=[-32768, 32768]Word that FP32 decimal exponents vary from 0 to 255, excluding the utmost worth 0xFF as a result of it represents NAN, so the utmost decimal exponent is 254 – 127 = 127. Related guidelines apply to FP16.
For precision, each the exponent and the mantissa are restricted to the precision restrict (or Denormalizationlook More detailed discussion here), so FP32 can characterize a precision as much as 2^(-23)*2^(-126)=2^(-149), whereas FP16 can characterize a precision as much as 2^(10)*2^(-14)=2^(-24).
The distinction between FP32 and FP16 representations poses a serious concern for blended precision coaching. Totally different layers/operations of a deep studying mannequin are both insensitive or delicate to worth vary and precision and should be addressed individually..

