Istituto

Formazione

Dottorato

Ricerca

Eventi e notizie

Chi siamo Chi siamo
Collaboratori Collaboratori
Contatti Contatti

Bachelor in Lingua, letteratura e civiltà italiana Bachelor in Lingua, letteratura e civiltà italiana
Master in Lingua, letteratura e civiltà italiana Master in Lingua, letteratura e civiltà italiana
Giornate e settimane residenziali Giornate e settimane residenziali

Il dottorato all'ISI Il dottorato all'ISI
Offerta formativa Offerta formativa
Ammissione e candidature Ammissione e candidature
Tesi di dottorato Tesi di dottorato
Dottorandi visiting Dottorandi visiting

Corsi per dottorandi
Scuole dottorali
Corsi USI

Scuola dottorale confederale in Civiltà italiana

Struttura
Membri – docenti
Membri – dottorande/i
Attività primo ciclo (2012-2016)
Attività secondo ciclo (2017-2021)
Attività terzo ciclo (2021-2025)

Aree di competenza Aree di competenza
Presentazione dei progetti Presentazione dei progetti
Progetti di ricerca Progetti di ricerca
Post-doc Post-doc
Pubblicazioni Pubblicazioni
Pubblicazioni recenti Pubblicazioni recenti
Collane d’Istituto Collane d’Istituto
Convegni e giornate di studio Convegni e giornate di studio

La «civiltà dell’anatomia»: il genere delle Anatomie letterarie nell’Italia del Seicento
Il corpo del testo e le sue parti: anatomia, linguaggio e accademie nella prima età moderna e oltre
Fable, Emblem, Poem, Performance : Renaissance Word and Image Tales (FNS 218082)

{DiMa} Initiative
{DiMa} Magazine

Calendario eventi Calendario eventi
Cicli e incontri Cicli e incontri
Notizie Notizie
Newsletter Newsletter

Paradiso
Purgatorio
Elogio della follia
Personaggi
Lectura Boccaccii
Lettura collodiana
Lettura manzoniana
Inferno
Geografia e storia della letteratura italiana
Lettura manzoniana
Settimana della lingua italiana nel mondo
Archivi del Novecento
Classici italiani
Da Carlo a Carlo. La linea lombarda
Qui e ora
Per voce sola
Storie e confini
I mercoledì dell’ISI
Altre conferenze
Un libro per la vita
Lettura manzoniana - III ciclo
Lettura collodiana

Paradiso
Purgatorio

Digitisation of the TIGR participant questionnaires

ShareTIGR

21 marzo 2024

One of the first steps in the ShareTIGR project has been to digitise 115 questionnaires which include data about the participants in the TIGR recordings, specifically their age, sex, place of primary school, place of residence, place of work in Switzerland, language skills, profession, and educational qualifications. They do not contain any personal names, addresses or dates of birth and refer to the recordings by means of informant identifiers and event identifiers. The questionnaires had been filled in on paper right before the recordings were made and the first step of digitisation has been to manually transfer the data into an Excel table - a structured digital format that will facilitate further processing. While transferring the data, we made some decisions and adjustments that we will reflect upon in this blog post.

Let's start with two basic categories that the future user of the TIGR corpus may want to be informed about since they are relevant to interpret the recorded discourse at multiple levels: età ('age') and sesso ('sex').

The age of interlocutors is important metadata as it may influence the language variety spoken, the content and progression of the talk as well as the demeanour and participant role of a speaker. It is also relevant for quantitative research questions. In the questionnaires, the InfinIta project team asked for the age in years at the time of the recording. However, this format will probably not be seen in the published FAIR version of the corpus, as it is safer to give an age range rather than a specific number for data protection reasons.

When asking the participants to indicate their sex, the InfinIta team had offered the options “F” for female / femminile and “M” for male / maschile. While the majority of participants answered the question straightforwardly by ticking one of the two boxes, two people manually added the box “altro” (other). What is interesting about this behaviour is that none of the participants who had edited the questionnaires by hand actually ticked the third option. This suggests that no one needed the third option for self-categorisation purposes, but instead used it as a means to show criticism of the binary system employed by us. This way of engaging with the questionnaire in a manner that was not explicitly requested shows a form of negotiating normative social categories on a larger social level. Interestingly, the category “sex” was the only one that had received manual corrections, thus showing and repairing the set of social categories that were relevant to the participants. Since no one had ticked the third option, the comments had no technical or conceptual impact on our work, so they were not included in the Excel table (and hence the metadata of the corpus). However, to do justice to their social relevance, they could be mentioned in the corpus description.

The questionnaire further requested information about places, in particular the municipality of the participants’ primary school and the municipalities of their current residence and work or study. Such geographical information might be relevant to interpret the corpus data because language can be expected to vary depending on where people were brought up and live. The information given by the participants made up only a few words in the paper questionnaire but was expanded to multiple columns in the Excel table. We decided to enrich the place names by their province or canton, their region and country so that they could be filtered and grouped according to various factors. The result are three columns for Swiss locations (place, canton, country) and four for Italian locations (place, province, region, country). The Excel sheet shows a relatively high number of primary school and residence locations in Italy, which is due to the fact that many participants of TIGR are either cross-border commuters or Swiss residents of Italian origin.

Another item of the questionnaire concerned language skills. As other items, it had been phrased quite succinctly (lingue conosciute 'languages known') and gave the participants some liberty as to possible responses. Some informants who had declared to be fluent Italian speakers before accepting to participate in the research did not mention Italian in the questionnaire, thereby displaying an understanding of the prompt as regarding foreign language skills only. We then added Italian to the list of known languages in the table since this information is essential metadata for TIGR as a corpus of spoken Italian. Further variation arose around the interpretation of the category lingue: most informants named standard varieties, while some included both standard languages and regional varieties (dialetto ticinese, Svizzero tedesco) or local varieties (dialetto grosino). In addition, a small number of participants used parentheses or the explicit mention of comprehension skills to indicate a less than full competence in some variety. In all these cases, we kept the original statements at this stage of data processing.

Next came the answers about profession, which showed some variation in the case of apprentice and student participants. Some, but not all, indicated their branch of study or professional field; single students mentioned a part-time job as profession. To increase uniformity, and in the interest of data protection, we decided to keep only the most generic information about the type of training (apprendista, studente/studentessa), which was given by virtually all student participants, and to ignore any additional information provided only by some (apprendista muratore).

The last column of the questionnaire was istruzione (education). It offered some options to be ticked (scuole elementari, licenza di scuola media, formazione professionale, diploma di scuola media superiore, laurea triennale, laurea magistrale, dottorato) and a free text field (altro). Since the one-word prompt did not specify whether to indicate all qualifications or only the highest, we obtained both types of answers. As a general rule, we only retained the highest educational qualification in the table, but in the case of a skilled trade, people could have done a vocational training, a high school diploma or both, so we decided to list both qualifications when so declared. Also, some participants obtained their educational qualifications before the so-called Bologna reform of higher education in Europe was completed and, accordingly, provided answers under altro such as laurea vecchio ordinamento, which is formally equivalent to today's laurea magistrale. We decided to keep their original answers to avoid anachronistic categories. On the other hand, in single cases someone provided an answer under other that would have had a corresponding match in our scheme but had been overseen or misunderstood (e.g., istituto tecnico instead of diploma di scuola media superiore). We then changed the originally given answers to make them fit our scheme.

The supposedly simple task of digitising paper questionnaires confronted us with a number of unexpected issues. None of them were serious or challenging to overcome, but they showed that even “simple” tasks in the endeavour of making FAIR spoken language data and metadata available require careful and foresighted consideration.

Nina Profazi & Johanna Miecznikowski

Istituto

Formazione

Dottorato

Ricerca

Eventi e notizie

Digitisation of the TIGR participant questionnaires

Condividi

Stampa

Informazioni

Resta in contatto

Di più sull’USI