TY - GEN
T1 - Voices of the great war
T2 - 12th International Conference on Language Resources and Evaluation, LREC 2020
AU - Lenci, Alessandro
AU - Montemagni, Simonetta
AU - Boschetti, Federico
AU - de Felice, Irene
AU - dei Rossi, Stefano
AU - Dell'Orletta, Felice
AU - Giorgio, Michele Di
AU - Miliani, Martina
AU - Passaro, Lucia C.
AU - Puddu, Angelica
AU - Venturi, Giulia
AU - Labanca, Nicola
N1 - Publisher Copyright:
© European Language Resources Association (ELRA), licensed under CC-BY-NC
PY - 2020
Y1 - 2020
N2 - Voci della Grande Guerra (“Voices of the Great War”) is the first large corpus of Italian historical texts dating back to the period of First World War. This corpus differs from other existing resources in several respects. First, from the linguistic point of view it gives account of the wide range of varieties in which Italian was articulated in that period, namely from a diastratic (educated vs. uneducated writers), diaphasic (low/informal vs. high/formal registers) and diatopic (regional varieties, dialects) points of view. From the historical perspective, through a collection of texts belonging to different genres it represents different views on the war and the various styles of narrating war events and experiences. The final corpus is balanced along various dimensions, corresponding to the textual genre, the language variety used, the author type and the typology of conveyed contents. The corpus is annotated with lemmas, part-of-speech, terminology, and named entities. Significant corpus samples representative of the different “voices” have also been enriched with meta-linguistic and syntactic information. The layer of syntactic annotation forms the first nucleus of an Italian historical treebank complying with the Universal Dependencies standard. The paper illustrates the final resource, the methodology and tools used to build it, and the Web Interface for navigating it.
AB - Voci della Grande Guerra (“Voices of the Great War”) is the first large corpus of Italian historical texts dating back to the period of First World War. This corpus differs from other existing resources in several respects. First, from the linguistic point of view it gives account of the wide range of varieties in which Italian was articulated in that period, namely from a diastratic (educated vs. uneducated writers), diaphasic (low/informal vs. high/formal registers) and diatopic (regional varieties, dialects) points of view. From the historical perspective, through a collection of texts belonging to different genres it represents different views on the war and the various styles of narrating war events and experiences. The final corpus is balanced along various dimensions, corresponding to the textual genre, the language variety used, the author type and the typology of conveyed contents. The corpus is annotated with lemmas, part-of-speech, terminology, and named entities. Significant corpus samples representative of the different “voices” have also been enriched with meta-linguistic and syntactic information. The layer of syntactic annotation forms the first nucleus of an Italian historical treebank complying with the Universal Dependencies standard. The paper illustrates the final resource, the methodology and tools used to build it, and the Web Interface for navigating it.
KW - Historical Corpora
KW - Information Extraction
KW - Linguistic and Meta-linguistic Annotation
UR - https://www.scopus.com/pages/publications/85096527776
M3 - Conference contribution
AN - SCOPUS:85096527776
T3 - LREC 2020 - 12th International Conference on Language Resources and Evaluation, Conference Proceedings
SP - 911
EP - 918
BT - LREC 2020 - 12th International Conference on Language Resources and Evaluation, Conference Proceedings
A2 - Calzolari, Nicoletta
A2 - Bechet, Frederic
A2 - Blache, Philippe
A2 - Choukri, Khalid
A2 - Cieri, Christopher
A2 - Declerck, Thierry
A2 - Goggi, Sara
A2 - Isahara, Hitoshi
A2 - Maegaard, Bente
A2 - Mariani, Joseph
A2 - Mazo, Helene
A2 - Moreno, Asuncion
A2 - Odijk, Jan
A2 - Piperidis, Stelios
PB - European Language Resources Association (ELRA)
Y2 - 11 May 2020 through 16 May 2020
ER -