Backpropagation on a network with dropout works just as it does normally, it calculates the gradients and updates the weights.
Longer explanation
Dropout is a regularization technique which drops nodes in the forward pass.
In the backward pass, the 'influence' of every weight on the end result is calculated (the gradient). If a node was dropped by the Dropout layer, then its influence of the outgoing weights is therefore also 0 (as 0 * weight = 0). In short, backpropagation works just as it always does.
So how does it learn, if those nodes are never updated? What you need to understand is that in every forward pass, the nodes which are dropped are random. And hence, in every forward pass, different nodes are dropped. As a result, in every backward pass, different nodes are updated.
Because nodes are dropped, the network cannot rely solely on specific nodes to make predictions, and hence the network (should) generalize better. But I think you already understood that part.
For an even more elaborate explanation.