I have an ESP32 with OV2640 camera and a GY-MAX4466 amplified electret microphone module and I would be able to connect it to ESP32 to stream voice and video in realtime when I connect to its IP, ideally at same time if hardware is capable enough or alternatively switching between voice and video if it cannot handle both at same time.
I have been able to stream video with an OV2640 camera and ESP32, but unfortunately I haven't found resources regarding adding realtime audio streaming.