FP32
Memory requirement: 4 bytes
$\left(-1\right)^{S} \times 2^{E-127} \times \left(1 + \sum\limits_{i=1}^{23} b_{23-i} 2^{-i}\right)$
range precision
Can represent $[2^k, 2^k(1+\varepsilon), 2^k(1+2\varepsilon), \dots, 2^{k+1}]$, where $\varepsilon = 2^{-23}$
Possible solution: Use FP16!
Still use FP16, but use FP32 for neural network updates!
High-level sketch:
Update all model parameters
Update a small subset of model parameters
Source: Strubell et al. 2019, Energy and Policy Considerations for Deep Learning in NLP
Source: https://lightning.ai/pages/community/article/lora-llm/