TY - JOUR
T1 - Multilingual stance detection in social media political debates
AU - Lai, Mirko
AU - Cignarella, Alessandra Teresa
AU - Hernández Farías, Delia Irazú
AU - Bosco, Cristina
AU - Patti, Viviana
AU - Rosso, Paolo
N1 - Publisher Copyright:
© 2020 Elsevier Ltd
PY - 2020/9
Y1 - 2020/9
N2 - Stance Detection is the task of automatically determining whether the author of a text is in favor, against, or neutral towards a given target. In this paper we investigate the portability of tools performing this task across different languages, by analyzing the results achieved by a Stance Detection system (i.e. MultiTACOS) trained and tested in a multilingual setting. First of all, a set of resources on topics related to politics for English, French, Italian, Spanish and Catalan is provided which includes: novel corpora collected for the purpose of this study, and benchmark corpora exploited in Stance Detection tasks and evaluation exercises known in literature. We focus in particular on the novel corpora by describing their development and by comparing them with the benchmarks. Second, MultiTACOS is applied with different sets of features especially designed for Stance Detection, with a specific focus to exploring and combining both features based on the textual content of the tweet (e.g., style and affective load) and features based on contextual information that do not emerge directly from the text. Finally, for better highlighting the contribution of the features that most positively affect system performance in the multilingual setting, a features analysis is provided, together with a qualitative analysis of the misclassified tweets for each of the observed languages, devoted to reflect on the open challenges.
AB - Stance Detection is the task of automatically determining whether the author of a text is in favor, against, or neutral towards a given target. In this paper we investigate the portability of tools performing this task across different languages, by analyzing the results achieved by a Stance Detection system (i.e. MultiTACOS) trained and tested in a multilingual setting. First of all, a set of resources on topics related to politics for English, French, Italian, Spanish and Catalan is provided which includes: novel corpora collected for the purpose of this study, and benchmark corpora exploited in Stance Detection tasks and evaluation exercises known in literature. We focus in particular on the novel corpora by describing their development and by comparing them with the benchmarks. Second, MultiTACOS is applied with different sets of features especially designed for Stance Detection, with a specific focus to exploring and combining both features based on the textual content of the tweet (e.g., style and affective load) and features based on contextual information that do not emerge directly from the text. Finally, for better highlighting the contribution of the features that most positively affect system performance in the multilingual setting, a features analysis is provided, together with a qualitative analysis of the misclassified tweets for each of the observed languages, devoted to reflect on the open challenges.
KW - Contextual features
KW - Multilingual
KW - Political debates
KW - Stance detection
KW - Twitter
UR - https://www.scopus.com/pages/publications/85083681815
U2 - 10.1016/j.csl.2020.101075
DO - 10.1016/j.csl.2020.101075
M3 - Article
SN - 0885-2308
VL - 63
JO - Computer Speech and Language
JF - Computer Speech and Language
M1 - 101075
ER -