E-Locus - Institutional Repository of the University of Crete

Home Search

Results - Details

Search command : Author="Στεφανίδης" And Author="Κωνσταντίνος"

Current Record: 3 of 75

[Add to Basket]

Identifier

000462988

Title

Human action prediction and forecasting based on visual data

Alternative Title

Πρόβλεψη και πρόγνωση ανθρώπινων δραστηριοτήτων με βάση οπτική πληροφορία

Author

Μανουσάκη, Βικτωρία Ε

Thesis advisor

Αργυρός, Αντώνης

Reviewer

Κοσμόπουλος, Δημήτριος
Ρούσσος, Αναστάσιος
Πλεξουσάκης, Δημήτριος
Τραχανιάς, Παναγιώτης
Στεφανίδης, Κωνσταντίνος
Παναγιωτάκης, Κώστας

Abstract

The ability to observe human movements and predict their actions is a developmental skill acquired by humans early in life. When witnessing a person performing a task, we can easily forecast their subsequent actions based on contextual cues and past experiences. In this work, we aim at developing such abilities for machines, focusing on the tasks of vision-based action prediction, action anticipation and next-active-object prediction. Action prediction is defined as the inference of an action label while the action is still ongoing. Such a capability is useful for early response and further action planning. We consider the problem of action prediction in scenarios involving humans interacting with objects. We formulate an approach that builds time series representations of the performance of the humans and the objects. Such a representation of an ongoing action is then compared to prototype actions. This is achieved by a Dynamic Time Warping (DTW)-based time series alignment framework which identifies the best match between the ongoing action and the prototype ones. We predict actions in trimmed and untrimmed action sequences with the use of the DTW algorithm. In the same vein, for the prediction of actions we propose two new alignment algorithms called OBE-S-DTW and OE-S-DTW that show superior results on the task of action prediction compared to DTW. Following, we propose a graph-based methodology for the visual prediction of human-object interactions in videos. Rather than forecasting the human and object motion, we aim at predicting (a) the class of the on-going human-object interaction and (b) the class(es) of the next active object(s) (NAOs), i.e., the object(s) that will be involved in the interaction in the near future as well as the time the interaction will occur. Finally, we address the problem of action anticipation by taking into consideration the history of all executed actions throughout long, procedural activities. A novel approach noted as Visual-Linguistic Modeling of Action History (VLMAH) is proposed that fuses the immediate past in the form of visual features as well as the distant past based on a cost-effective form of linguistic constructs (semantic labels of the nouns, verbs, or actions). Our approach generates accurate near-future action predictions during procedural activities by leveraging information on the long- and short-term past. The proposed methods constitute solutions for the problems of action prediction and anticipation and next-active-object prediction. The aforementioned methodologies have been evaluated on challenging datasets and showcase results superior to the current state-of-art.

Language

English

Subject

Action forecasting

Activity forecasting

Deep neural networks

Graphs

Next-active-object prediction

Temporal alignment

Γράφοι

Νευρωνικά δίκτυα

Πρόβλεψη δράσης