Classification is the process of classifying an image to indicate the probabilities of a particular object. This depends on the classifications that the machine learning model was trained with. The picture shows an Inception v3 model.
Inception v3 machine learning model
An image containing different pixels and color values is used as input. Inside the model there are different layers and the weights are calculated during training. These weights are used to calculate the result for the classification. For the Inception V3 model there are 1000 classifications.
By training the Machine Learning Model, the different layers are calculated accordingly and “recognize” different things within an image. For example, the first layers only recognize dots and dashes, the next layers recognize circles, arcs and boxes. At the end several objects are recognized.
The last layers in the machine learning model are responsible for classification and probability. Feature Extraction, on the other hand, takes the values within the model. Unlike classification, these values are not directly related to the image or result. Instead, similar images (or documents) return similar values.
For our Guardian prototype we first use PoseNet to recognize a person and their head within a video stream. Afterwards the head is cut out and it is recognized whether the person wears a helmet or not. We use the image classification of the MobileNet model, which has been trained with different types of helmets.
My idea is to use Feature Extraction for this. By reading the values of the Conv_1/BatchNorm/FusedBatchNorm layer within the MobileNet model. I do this for 25 images with and without helmet and save the values. For the similarity calculation the cosine distance calculation calculates the similarity of the head with the values. The picture with the greatest similarity indicates whether the person is wearing a helmet or not.
Classification vs. Feature Extraction
For the classification, the yellow helmet must be positioned in such a way that it is not recognized as a lemon or banana. Lighting conditions can also influence the result.
With Feature Extraction, the values for the images with and without helmet are read first. These must then be loaded additionally.
Overall, Feature Extraction recognition has proven to be more stable in tests, especially in darker rooms. Furthermore, the 25 values could also be reduced. The picture shows that groups of 4 images already have a very high similarity.