Performing -ln(ε) in NumPy returns relatively small values like this:
print(-np.log(np.finfo(np.float32).eps))
print(-np.log(np.finfo(np.float64).eps))
Output:
15.942385
36.04365338911715
Compared to -log2(ε) which has a greater range compared to base e:
print(-np.log2(np.finfo(np.float16).eps))
print(-np.log2(np.finfo(np.float32).eps))
print(-np.log2(np.finfo(np.float64).eps))
Output:
10.0
23.0
52.0
So, why don't softmax/sigmoid functions use 2 instead of e as the base for the exponential function? Using base 2 for the logarithm when converting back to logits would have more precision compared to base e in IEEE 754 representation.