Real time image processing for object recognition using security cameras

Question

I was thinking, what if we could combine Artificial Intelligence (Neural network for image recognition), computer hardware and a security camera for identify any breaking into our backyard at 12:00am - 8:00am? Of course my current knowledge leads me to only a simple question. So, in order to have a general idea:

¿Have been this already solved using a commercial or free software?
¿Can this be done using TensorFlow?
¿Is there any free set of images with millions of them to teach any AI distinguish between a man and another moving object?
¿Approximate hardware requirements for doing this?

If this question could be silly please mark it as off-topic. I based this idea on autonomous driving car, they can both recognize images and drive at the same time. Unless they have within a super computer I guess maybe the previous idea can be fulfill.

Update 1: I found this Can ConvNets be used for real-time object recognition from video feed? but I guess it could be outdated. Right now I'm in the land of "maybe" (lack of knowledge).

score 2 · Accepted Answer · edited Dec 10 '17 at 21:04

Here is some general info on how NNs work in relation to this specific problem. Hopefully it will provide some insight:

To identify target object, you can just train an NN to perform classification of images. Now, you will run into potential problems where this image can be located anywhere in given frame of input from the camera.

Let's say; you train a human NN that's able to act on 60px x 120px (so you will have 10800 input in your NN for the input layer). Under the assumption you can train the NN to identify whether or not the target object is showing in the image provided, you can use this trained NN to locate the object on the input frame.

Let's say the input frame from the camera is 640px x 480px. You will run into several potential problems 1) The object that's closer to the camera will be larger while the object that's further away from the camera will be smaller 2) The object may be located anywhere in the frame.

To overcome this problem with your already trained NN, I would start with a different mask size, for example, the first one I will probably have my first mask convolute all pixels with 240px x 480px. The first step is, scale it to 60px x 120px (which you can tell it is pretty much 4x) and capture the subset of pixels from pixels located in the area of (0,0) and (240, 480) and run it through the NN, If it yields true, then I will typically draw a box around the region to indicate the target object has been found.

Next, you shift over to 1-3 pixel and rerun it. For example, if we choose to move by 3px, then the region we are testing will be (3, 0) and (243, 480).

Once you finish this scale, you will want to choose a smaller mask, maybe something like 200px x 400px and do the same thing. When finished, go even smaller until you believe there's no point to search for an even smaller region because it is not going to be sufficient resolution for NN.

That's just my thought; maybe there's an even more efficient algorithm. Ultimately, I am sure there's a lot more can be optimized than what I've mentioned above!

Real time image processing for object recognition using security cameras

1 Answers1