0

Since pose estimation is often a task where spatial-temporal context should be helpful in finding subsequent key points, I thought there should be many papers on it. However, I could not find any work that deals with 2D pose estimation in videos.

Am I missing something, or am I just hindered by a large number of papers on 3D position estimation?

nbro
  • 42,615
  • 12
  • 119
  • 217
Bert Gayus
  • 645
  • 1
  • 5
  • 12

2 Answers2

2

Update:

I misread the question; It seems like OP is more interested in pose tracking. So, I'll have to point OP to papers on that, like this one. Using multiple frames becomes especially important when there are multiple people in the frame, and it's desired to track which pose belongs to which person.

For more papers on pose tracking, look here.


Reading this immediately reminded me of OpenPose. The 2D poses are based on this paper; They also refer to a few other papers and code repositories. There are several other 2D pose models at ModelZoo as well. They do refer to the papers they are based off of (or, at least, the ones I have seen do).

Later work focuses on extracting even more 3D data from 2D images like this one by Facebook. You can find much more on Papers with Code.

Avatrin
  • 556
  • 5
  • 11
1

I don't think there is a great difference between pose estimation in pictures and in videos. Do you know MediaPipe? https://google.github.io/mediapipe/ MediaPipe does perform an tracking from the keypoints through time. So the temporal information is used. Apart from that it is the same problem as the estimation from keypoints in static images.