I have come to notice that the most commonly used activation functions are continuous. Is there any specific reason behind this? Results such as this paper have worked on training networks with discontinuous activations yet this does not seem to have taken off. Does anybody have insight into why this happens, or better yet an article talking about this?
Asked
Active
Viewed 318 times
4