The SSH Training Discovery Toolkit provides an inventory of training materials relevant for the Social Sciences and Humanities.

Use the search bar to discover materials or browse through the collections. The filters will help you identify your area of interest.


Digital humanities

all the domains of the Humanities, from literature to heritage science, including history, social sciences, linguistics, etc.
Title Body
Introduction to Digital Humanities

Through a series of videos featuring a variety of voices and perspectives and discussing a range of methodologies and theoretical approaches, this course aims to explore the history, practice and people involved in the evolving, highly diverse, and interdisciplinary field of Digital Humanities.

Voices of the Parliament: A Corpus Approach to Parliamentary Discourse Research

While corpus methods are widely used in linguistics, including gender analysis, this tutorial shows the potential of richly annotated language corpora for research of the socio-cultural context and changes over time that are reflected through language use. The tutorial encourages students and scholars of modern languages, as well as users from other fields of digital humanities and social sciences who are interested in the study of socio-cultural phenomena through language, to engage with user-friendly digital tools for the analysis of large text collections. The tutorial is designed in such a way that it takes full advantage of both linguistic annotations and the available speaker and text metadata to formulate powerful quantitative queries that are then further extended with manual qualitative analysis in order to ensure adequate framing and interpretation of the results.

The tutorial demonstrates the potential of parliamentary corpora research via concordancers without the need for programming skills. No prior experience in using language corpora and corpus querying tools is required in order to follow this tutorial. While the same analysis could be carried out on any parliamentary corpus with similar annotations and metadata, in this tutorial we will use the siParl 2.0 corpus which contains parliamentary debates of the National Assembly of the Republic of Slovenia from 1990 to 2018. Knowledge of Slovenian is not required to follow the tutorial. To reproduce the analyses in other languages, we invite you to explore a parliamentary corpus of your choice from those available through CLARIN.


Taken from: Teaching with CLARIN: 

Introduction to Digital Humanities

The aim of the course is to introduce digital humanities and to describe various aspects of digital content processing. The practical aims consist of introducing current data sources, annotation, pre-processing methods, software tools for data analysis and visualisation, and evaluation methods.

Currently, we identified that students are somewhat aware of digital humanities but it is difficult for them to dive in and, mainly, to anticipate what they should learn for their future research. A more detailed goal of this course is to present some current projects, show the datasets and technologies behind, and encourage students to explore the datasets and use the technologies on data they already know. A high level goal is to set the knowledge of the technologies and available datasets into the research iteration loop (create hypotheses -> design instruments -> collect data -> analyze and evaluate).


Taken from: Teaching with CLARIN:

GATE Training Course

The training materials are all based around teaching the use of GATE, a freely available open-source toolkit for Natural Language Processing that has been widely used in both academia and industry for many different tasks.

The modules provide instruction on how to get to grips with the GATE toolkit for basic language processing, as well as more advanced techniques, and include a number of different scenarios, such as processing social media, hate speech and misinformation detection. They include modules both for programmers who want to further develop their own tools within the toolkit, and for non-programmers who want to just make use of existing tools. The modules teach not only the use of GATE itself, but also how to adapt it to one’s own needs (for example, to adapt English tools to a different language, or how to customise existing tools), and also the basic concepts around a number of language processing tasks including both low-level (tokenisation, POS tagging, parsing) to more sophisticated (information extraction, social media analysis, hate speech detection, misinformation detection), as well as how to interpret and integrate the results of the processing. Finally, it teaches programmers how to extend the toolkit itself, by adding new tools or integrating it into other systems.


Taken from Teaching with CLARIN: 

Copyright & Related rights

This section is an introduction to copyright notions and related rights:

CLARIN Centre Vienna

ARCHE (A Resource Centre for the HumanitiEs) is a service that offers stable and persistent hosting as well as the dissemination of digital research data and resources for the Austrian humanities community. ARCHE welcomes data from all humanities fields.

Newspaper corpora

This is a list of newspaper corpora that are available as part of the CLARIN Resource Families initiative.

Collections of newspapers in digital form are a rich source of information for researchers in a number of disciplines in the Humanities and Social Sciences and are especially valuable for synchronic as well as diachronic studies, ranging from history, media and communication studies to lexicography for which newspapers are a rich source of neologisms and other lexicographic phenomena.

Literary corpora

This is a list of literary corpora that are available as part of the CLARIN Resource Families initiative.

Literary corpora comprise poetry and fictional prose texts, such as novels, short stories and plays. They bring together the collected works of a single author or representative from a specific literary period. Since the literary corpora are often available through powerful concordancers, they are especially well suited for a quantitative and qualitative approach to comparative literary analysis, within or across different genres and historical periods.

Historical corpora

This is a list of historical corpora that are available as part of the CLARIN Resource Families initiative.

The CLARIN ERIC infrastructure offers access to historical corpora that cover almost all of the languages spoken in countries that are either members or observers in CLARIN ERIC. In the vast majority of cases, the corpora can be directly downloaded from the national repositories or queried through easy-to-use online search environments. They are also richly tagged and mostly available under public licences.


Title Body

edX is a global nonprofit platform for education and learning. Fulfilling the demand for people to learn on their own terms, edX delivers courses on topics ranging from data and computer science to leadership and communications.