Highest Voted 'tanh' Questions - Artificial Intelligence Stack Exchange

9

votes

1 answer

When to use Tanh?

When and why would you not use Tanh? I just replaced ReLU with Tanh and my model trains about 2x faster, reaching 90% acc within 500 steps. While using ReLU it reached 90% acc in >1000 training steps. I believe the reason it trained faster was due…

asked May 12 '24 at 23:09

vxnuaj

125
1
6

3

votes

3 answers

Why is there tanh(x)*sigmoid(x) in a LSTM cell?

CONTEXT I was wondering why there are sigmoid and tanh activation functions in an LSTM cell. My intuition was based on the flow of tanh(x)*sigmoid(x) and the derivative of tanh(x)*sigmoid(x) It seems to me that authors wanted to choose such a…

neural-networks long-short-term-memory activation-functions sigmoid tanh

asked Nov 24 '21 at 07:59

MASTER OF CODE

242
2
9

3

votes

1 answer

Why is tanh a "smoothly" differentiable function?

The sigmoid, tanh, and ReLU are popular and useful activation functions in the literature. The following excerpt taken from p4 of Neural Networks and Neural Language Models says that tanh has a couple of interesting properties. For example, the…

neural-networks terminology math activation-functions tanh

asked Jul 07 '21 at 22:32

hanugm

4,102
3
29
63

2

votes

1 answer

Why and when do we use ReLU over tanh activation function?

I was reading LeCun Efficient Backprop and the author repeated stressed the importance of average the input patterns at 0 and thus justified the usage of tanh sigmoid. But if tanh is good then how come ReLU is very popular in most NNs (which is even…

neural-networks comparison activation-functions relu tanh

asked Jul 04 '22 at 23:57

Struggling_In_Final

21
2

1

vote

0 answers

Could we add clipping in the output layer of the actor in DDPG?

I have a doubt about how clipping affects the training of the RL agents. In particular, I have come across a code for training DDPG agents, the pseudo-code is the following: 1 for i in training iterations 2 action = clip(ddpg.prediction(state)…

reinforcement-learning deep-rl ddpg tanh

asked Sep 10 '21 at 15:31

Leibniz

69
5

0

votes

1 answer

Producing nan when calculating log probability of sampled action from tanh distribution

policy.eval(); critic.eval() # BN eval mode for rollout with torch.no_grad(): mean, std = policy(actor_critic_input) dist = TransformedDistribution(Normal(mean, std), [TanhTransform()]) …

proximal-policy-optimization normal-distribution tanh

asked May 21 '25 at 08:40

Khushal Badhan

29
5

Questions tagged [tanh]

When to use Tanh?

Why is there tanh(x)*sigmoid(x) in a LSTM cell?

Why is tanh a "smoothly" differentiable function?

Why and when do we use ReLU over tanh activation function?

Could we add clipping in the output layer of the actor in DDPG?

Producing nan when calculating log probability of sampled action from tanh distribution