The SSH Training Discovery Toolkit provides an inventory of training materials relevant for the Social Sciences and Humanities.

Use the search bar to discover materials or browse through the collections. The filters will help you identify your area of interest.


Natural Language Processing

Title Body
GATE Training Course

The training materials are all based around teaching the use of GATE, a freely available open-source toolkit for Natural Language Processing that has been widely used in both academia and industry for many different tasks.

The modules provide instruction on how to get to grips with the GATE toolkit for basic language processing, as well as more advanced techniques, and include a number of different scenarios, such as processing social media, hate speech and misinformation detection. They include modules both for programmers who want to further develop their own tools within the toolkit, and for non-programmers who want to just make use of existing tools. The modules teach not only the use of GATE itself, but also how to adapt it to one’s own needs (for example, to adapt English tools to a different language, or how to customise existing tools), and also the basic concepts around a number of language processing tasks including both low-level (tokenisation, POS tagging, parsing) to more sophisticated (information extraction, social media analysis, hate speech detection, misinformation detection), as well as how to interpret and integrate the results of the processing. Finally, it teaches programmers how to extend the toolkit itself, by adding new tools or integrating it into other systems.


Taken from Teaching with CLARIN: 

Copyright & Related rights

This section is an introduction to copyright notions and related rights:

Tools for named entity recognition

This is a list of tools for named entity recognition that are available as part of the CLARIN Resource Families initiative.

Named entity recognition (NER) is an information extraction task which identifies mentions of various named entities in unstructured text and classifies them into predetermined categories, such as person names, organisations, locations, date/time, monetary values, and so forth. They can, for example, help with the classification of news content, content recommentations and search algorithms.

Tools for normalization

This is a list of tools for text normalization that are available as part of the CLARIN Resource Families initiative.

Text normalization is the process of transforming parts of a text into a single canonical form. It represents one of the key stages of linguistic processing for texts in which spelling variation abounds or deviates from the contemporary norm, such as in texts published in historical documents or on social media. After text normalization, standard tools for all further stages of text processing can be used. Another important advantage of text normalization is improved search which can be performed with querying a single, standard variant but takes into account all its spelling variants, be it historical, dialectal, colloquial or slang.

Title Body
CLARIN Knowledge Sharing

The aim of the CLARIN Knowledge Sharing Initiative is to ensure ensure that the available knowledge and expertise provided by CLARIN consortia does not exist as a fragmented collection of unconnected bits and pieces, but is made accessible in an organized way to the CLARIN community and to the Social Sciences and Humanities research community at large. 

One central step in building the Knowledge Sharing Infrastructure is the establishment of Knowledge Centres. Most existing CLARIN centres are able to get the status of a Knowledge Centre right away; the K-Centres rather formalize and centrally register the existing expertise but does usually not require much additional effort from an institute except that the knowledge-sharing services have to be reliable and their skope has to be made explicit on a dedicated web-page of the respective institute(s).

The list of CLARIN Knowledge Centres is available here: