Reading papers on stereo Slam and visual odometry it appears a dense stereo vision algorithm is run first which is then used to calculus depth for trackable features.
Why not directly run the sparse feature detector on each image and match up the points? It seems a dense stereo computation will always be worse to predict depth than features which have a strong response.