Your browser does not support JavaScript!

Home    Collections    Type of Work    Doctoral theses  

Doctoral theses

Search command : Author="Καλαντίδης"  And Author="Κρίτων"

Current Record: 5 of 2427

Back to Results Previous page
Next page
Add to Basket
[Add to Basket]
Identifier 000463783
Title Deception detection from text in a multilingual and multicultural context
Alternative Title Ανίχνευση εξαπάτησης από κείμενο σε πολυγλωσσικό και πολυπολιτισμικό πλαίσιο
Author Παπαντωνίου, Αικατερίνη Χ.
Thesis advisor Πλεξουσάκης, Δημήτριος
Reviewer Φλουρής, Γεώργιος
Τζίτζικας, Γιάννης
Αργυρός, Αντώνιος
Κομοδάκης, Νίκος
Ανδρουτσόπουλος, Ίων
Σταματάτος, Ευστάθιος
Abstract Automatic deception detection is a crucial task that has many applications both in direct physical and in computer-mediated human communication. In this thesis, we focus on automatic deception detection in text across cultures and on different languages. In this context, we view culture through the prism of the individualism/collectivism dimension and we approximate culture by using country as a proxy. Having as a starting point recent conclusions drawn from the social psychology discipline, we explore if differences in the usage of specific linguistic deception cues across cultures can be confirmed and attributed to cultural norms with respect to the individualism/collectivism divide. In addition, we investigate if a universal feature set for cross-cultural text deception detection tasks exists. For these goals, we performed a thorough statistical analysis (Mann-Whitney tests and Multiple Logistic Regression) over eleven datasets from five languages (English, Dutch, Russian, Spanish and Romanian), from six countries (United States of America, Belgium, India, Russia, Mexico and Romania). The analysis showed the absence of a universal feature set and also the volatility and sensitivity of the deception cues even across domains and genres in the same culture/language. Furthermore, the analysis revealed some differences in deception cues across cultures and languages e.g., in the expression of sentiment and at the same time the cross-cultural validity of some others. To evaluate the predictive power of different feature sets and approaches we created culture/language-aware classifiers by experimenting with a wide range of n-gram features from several levels of linguistic analysis, namely phonology, morphology and syntax, other linguistic cues like word and phoneme counts, pronouns use, etc., and token embeddings. We also experimented with the combination of these features while the aforementioned datasets were employed for training/testing. We applied two classification methods, namely logistic regression and fine-tuned BERT models both monolingual and crosslingual. Overall the fine-tuning of the BERT model outperforms other approaches but interestingly there are cases in the combination of BERT embeddings with linguistic features is beneficial. The experimentation with multilingual embeddings, as a case of zero-shot transfer learning, also showed promising results. We introduce a new dataset in the context of April Fools’ Day articles for the Greek language. To the best of our knowledge, this is the first publicly available deception dataset for Greek. The conclusion based on a similar analysis to the above and in comparison with an English April Fools’ Day Dataset mainly aligns with the results of the first part of the thesis. Lastly, we focus on how well various automatic deception detection models can generalize in unseen distributions and domains. Using a rich set of diverse testing data in English and in Spanish, we explore the performance gap between cue-based models and BERT-type models and their combination. Generalization techniques from the literature are also considered in an effort to enhance the generalization capabilities of the models. Transformer-based approaches overall outperform cue-only-based approaches, but both the infusion of explicit cues of deception and the generalization techniques are beneficial.
Language English
Subject Culture
Machine learning
NLP
Επεξεργασία φυσικής γλώσσας
Κουλτούρα
Μηχανική μάθηση
Issue date 2024-03-22
Collection   School/Department--School of Sciences and Engineering--Department of Computer Science--Doctoral theses
  Type of Work--Doctoral theses
Permanent Link https://elocus.lib.uoc.gr//dlib/2/5/3/metadata-dlib-1712296616-817333-1577.tkl Bookmark and Share
Views 7

Digital Documents
No preview available

No permission to view document.
It won't be available until: 2027-03-22