Computer-mediated communication corpora

Item

This is a list of computer-mediated communication corpora that are available as part of the CLARIN Resource Families initiative.

Computer-mediated communication (CMC) constitutes public and private communication on-line, such as posts on blogs, forums, comments on online news sites, social media and networking sites such as Twitter and Facebook, instant chat rooms such as, mobile phone applications such as WhatsApp and e-mail. Because corpora that compile computer-mediated communication often include very informal styles of writing, they are interesting for a wide range of research fields, such as language variation, pragmatics, media and communication studies, etc. They are also very important for the development of robust NLP tools that can deal with non-standard spelling, vocabulary and grammar. Compilation and dissemination of such corpora are hindered by the unclear legal status of CMC data when distributed as resource to the scientific community, which is further exacerbated by the rapidly changing terms of service by content providers.

Format
Language
License
Created
Last updated
Source of item
Collections
Training Discovery Toolkit