Classification, Localization and Captioning of Dangerous Situations using Inception-v3 Network and CAM
Abstract
An early situation assessment is an important aspect during emergency missions and provides useful information for fast decision making. However, many situations can be dangerous and visually hard to analyze due to the complexity. With the recent development in the field of artificial intelligence and computer vision there exists a wide range of application possibilities including automatic situation detection. However, many related works focused either on event captioning or on dangerous object detection. Therefore in this paper, a novel approach for simultaneous recognition and localization of dangerous situation is proposed: Two different CNN architectures are used, whereas one of the CNN, the Inception-v3, is modified to generate Class Activation Map (CAM). With CAM it is possible to generate bounding boxes for recognized objects without being explicitly trained for it. This eliminates the need for large image dataset with manually annotated boxes. The information about the detected objects from both networks, their spatial-relationships and the severity of the situation are then analyzed in the situation detection module. The detected situation is finally summarized in a short description and made available for the emergency managers to support them in fast decision makings.