World Scientific
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at customercare@wspc.com for any enquiries.
Special Issue — Selected Papers from the 19th IEEE International Symposium on Multimedia (ISM 2017); Guest Editors: Han C. W. Hsiao and Robert MertensNo Access

Convolutional Nonlinear Differential Recurrent Neural Networks for Crowd Scene Understanding

    https://doi.org/10.1142/S1793351X18400196Cited by:3 (Source: Crossref)

    With the growth of crowd phenomena in the real world, crowd scene understanding is becoming an important task in anomaly detection and public security. Visual ambiguities and occlusions, high density, low mobility, and scene semantics, however, make this problem a great challenge. In this paper, we propose an end-to-end deep architecture, convolutional nonlinear differential recurrent neural networks (CNDRNNs), for crowd scene understanding. CNDRNNs consist of GoogleNet Inception V3 convolutional neural networks (CNNs) and nonlinear differential recurrent neural networks (RNNs). Different from traditional non-end-to-end solutions which separate the steps of feature extraction and parameter learning, CNDRNN utilizes a unified deep model to optimize the parameters of CNN and RNN hand in hand. It thus has the potential of generating a more harmonious model. The proposed architecture takes sequential raw image data as input, and does not rely on tracklet or trajectory detection. It thus has clear advantages over the traditional flow-based and trajectory-based methods, especially in challenging crowd scenarios of high density and low mobility. Taking advantage of CNN and RNN, CNDRNN can effectively analyze the crowd semantics. Specifically, CNN is good at modeling the semantic crowd scene information. On the other hand, nonlinear differential RNN models the motion information. The individual and increasing orders of derivative of states (DoS) in differential RNN can progressively build up the ability of the long short-term memory (LSTM) gates to detect different levels of salient dynamical patterns in deeper stacked layers modeling higher orders of DoS. Lastly, existing LSTM-based crowd scene solutions explore deep temporal information and are claimed to be “deep in time.” Our proposed method CNDRNN, however, models the spatial and temporal information in a unified architecture and achieves “deep in space and time.” Extensive performance studies on the Violent-Flows, CUHK Crowd, and NUS-HGA datasets show that the proposed technique significantly outperforms state-of-the-art methods.

    Remember to check out the Most Cited Articles!

    Check out our titles in Semantic Computing!