1

Performing -ln(ε) in NumPy returns relatively small values like this:

print(-np.log(np.finfo(np.float32).eps))
print(-np.log(np.finfo(np.float64).eps))

Output:

15.942385
36.04365338911715

Compared to -log2(ε) which has a greater range compared to base e:

print(-np.log2(np.finfo(np.float16).eps))
print(-np.log2(np.finfo(np.float32).eps))
print(-np.log2(np.finfo(np.float64).eps))

Output:

10.0
23.0
52.0

So, why don't softmax/sigmoid functions use 2 instead of e as the base for the exponential function? Using base 2 for the logarithm when converting back to logits would have more precision compared to base e in IEEE 754 representation.

3 Answers3

2

The base of the logarithm change the range, but it not necessarily affect the precision.

There exists the following formula:

$\log_b(x) = \log(x) \cdot \frac{1}{\log(b)}$

In which b is the base.

Then, the selection of the base affect the range. Larger bases generate smaller ranges.

Also, computing gradients with log with base e is more easy that when using other bases. Gradients are needed for backpropagation.

allo
  • 312
  • 1
  • 9
2

IEEE 754 provides sufficient precision for most ML tasks even with base $e$, and the choice in softmax and sigmoid functions is rooted in mathematical and practical considerations.

Notice the function $e^x$ has a unique property that its derivative is itself, therefore this simplifies gradient calculations during backpropagation of the common CE loss. Changing the base $2$ doesn't gain much precision but complicates existing math formulations in machine learning field and optimization computations potentially across many places. Therefore both softmax and sigmoid functions often used in probability and information theory typically use natural logarithm and exponential function based on $e$ in practice, such as the CE loss implemented in PyTorch. Often in information-theoretic discussions the base $2$ for the unit of bits is preferred such as Shanon's original paper about entropy.

cinch
  • 11,000
  • 3
  • 8
  • 17
1

$\log_2$ results in larger numbers than $\ln$, but that doesn't make them more precise. Floating point arithmetic has the same relative precision for numbers of any size.

Tomek Czajka
  • 150
  • 4