The SSH Training Discovery Toolkit provides an inventory of training materials relevant for the Social Sciences and Humanities.

Use the search bar to discover materials or browse through the collections. The filters will help you identify your area of interest.


Manually annotated corpora


This is a list of manually annotated corpora that are available as part of the CLARIN Resource Families initiative.

Manual corpora are collections of texts containing manually validated or manually assigned linguistic information, such as morphosyntactic tags, lemmas, syntactic parses, named entities etc. These corpora can be used to train new language annotation tools as well as to test the accuracy of existing annotation tools. 

The corpora and corpus collections are classified into 6 categories based on the type of manual annotation:

Last updated
Source of item
Training Discovery Toolkit