Guess Where? Actor-Supervision for Spatiotemporal Video Action Localization
Under review for Computer Vision and Pattern Recognition (CVPR) 2018.
This paper strives for spatiotemporal localization of actions in video. Different from the leading approaches, who all learn to localize based on carefully annotated boxes on traning video frames, we adhere to a weakly-supervised solution that only requires a video class label. We introduce an actor-supervised architecture that exploits the inherent compositionanlity of actions in terms of actor transformations, to localize actions. We make two contributions. First, we propose actor proposals derived from a detector for human and non-human actors intended for images, which is linked over time by Siamese similarity matching to account for actor deformations. Second, we proposa and actor-based attention mechanism that enables the localization of the actions from action class labels and actor proposals and is end-to-end trainable. Experiments on three human and non-human action datasets show actor supervision is state-of-the-art for weakly-supervised action localization and is even competitive to some fully-supervised alternatives.
A Novel Framework for Robustness Analysis of Visual QA Models
Under review for The 56th Annual Meeting of the Association for Computational Linguistics (ACL 2018)
Visual Question Answering (VQA) models should be both robust and accurate. Unfortunately, most of the current research focuses only on the accuracy because there is a lack of proper methods to measure the robustness of VQA models. In this work, we propose a new framework which uses the semantically relevant questions, called basic questions, acting as noise to evaluate the robustness of VQA models. Additionally, we exploit the LASSO modeling to rank the basic question candidates and show that the LASSO basic question ranking performance is the best among most of the popular text similarity metrics. Finally, we propose a novel robustness measure, R score, and two large scale datasets, General Basic Question Dataset and Yes/No Basic Question Dataset in order to analyze the robustness of VQA models. We believe that our proposed framework will serve as a benchmark for to measure the robustness of VQA models, so as to help the community build accurate and robust VQA models
Factors Influencing The Performance of Image Captioning Model: An Evaluation
In Proceedings of the 14th International Conference on Advances in Mobile Computing and Multi Media MoMM, Pages 235-243, Singapore, November 28 — 30, 2016.
Recently, neural network-based methods have shown impressive performances in captioning task. There have been numerous attempts with many proposed architectures to solve this captioning problem. In this paper, we present the evaluation of different alternatives in architecture and optimization algorithms for a neural image captioning model. First, we present the study of a image captioning model that is comprised of two modules — a convolutional neural network which encodes the input image into a fixed-dimensional feature vector and a recurrent neural network to decode that representation into a sequence of words describing the input image. After that, we consider different alternatives regarding architecture and optimization algorithm to train the model. We conduct a set of experiments on standard benchmark datasets to evaluate different aspects of the captioning system using standard evaluation methods that are utilized in image captioning literatures. Based on the results of those experiments, we propose several suggestions on architecture and optimization algorithm of the image captioning model that is balanced in terms of the performance and the feasibility to be deployed on real-world problems with commodity hardware.
Maritime Vessel Images Classification Using Deep Convolutional Neural Networks
In Proceedings of the Sixth International Symposium on Information and Communication Technology (SoICT), Pages 276-281, Hue City, Viet Nam, December 03 — 04, 2015.
The ability to identify maritime vessels and their type is an important component of modern maritime safety and security. In this work, we present the application of deep convolutional neural networks to the classification of maritime vessel images. We use the AlexNet deep convolutional neural network as our base model and propose a new model that is twice smaller then the AlexNet. We conduct experiments on different configurations of the model on commodity hardware. We comparatively evaluate and analyse the performance of different configurations the model. We measure the top-1 and top-5 accuracy rates. The contribution of this work is the implementation, tuning and evaluation of automatic image classifier for the specific domain of maritime vessels with deep convolutional neural networks under the constraints imposed by commodity hardware and size of the image collection.