The SSH Training Discovery Toolkit provides an inventory of training materials relevant for the Social Sciences and Humanities.

Use the search bar to discover materials or browse through the collections. The filters will help you identify your area of interest.

 

Computational Linguistics

Item
Title Body
Newspaper corpora

This is a list of newspaper corpora that are available as part of the CLARIN Resource Families initiative.

Collections of newspapers in digital form are a rich source of information for researchers in a number of disciplines in the Humanities and Social Sciences and are especially valuable for synchronic as well as diachronic studies, ranging from history, media and communication studies to lexicography for which newspapers are a rich source of neologisms and other lexicographic phenomena.

Literary corpora

This is a list of literary corpora that are available as part of the CLARIN Resource Families initiative.

Literary corpora comprise poetry and fictional prose texts, such as novels, short stories and plays. They bring together the collected works of a single author or representative from a specific literary period. Since the literary corpora are often available through powerful concordancers, they are especially well suited for a quantitative and qualitative approach to comparative literary analysis, within or across different genres and historical periods.

Historical corpora

This is a list of historical corpora that are available as part of the CLARIN Resource Families initiative.

The CLARIN ERIC infrastructure offers access to historical corpora that cover almost all of the languages spoken in countries that are either members or observers in CLARIN ERIC. In the vast majority of cases, the corpora can be directly downloaded from the national repositories or queried through easy-to-use online search environments. They are also richly tagged and mostly available under public licences.

 

Corpora of academic texts

This is a list of academic corpora that are available as part of the CLARIN Resource Families initiative.

Corpora of academic texts contain scholarly writing, which includes research papers, essays and abstracts published in academic journals, conference proceedings, and edited volumes, theses written by students at the undergraduate and graduate levels, and scientific monographs.

 

Source
Title Body
CLARIN Legal Information Platform

The platform aims to introduce researchers with basic notions related to the legislative and licensing framework in Europe on Copyright and Data Protection:

  • Introduction to Copyright and Related Rights
  • Licensing Practice
  • Personal Data Protection

It also includes proposals for:

  • Further reading/Bibliography on Legal and Ethical Issues
  • Useful links on Legal and Ethical Issues
CLARIN Depositing Services

One of the fundamental services of the CLARIN infrastructure is making sure that language resources can be archived and made available to the community in a reliable manner. To help researchers to store their resources (e.g. corpora, lexica, audio and video recordings, annotations, grammars, etc.) in a sustainable way, many of the CLARIN centres offer a depositing service. They are willing to store the resources in their repository and assist with the technical and organisational details. This has a wide range of advantages:

  • Long-term archiving: a storage guarantee can be given for a long period (up to 50 years in some cases)
  • Resources can be cited easily with a persistent identifier.
  • The resources and their metadata will be integrated into the infrastructure, making it possibe to search them efficiently.
  • Password-protected resources can be made available via an institutional login.
  • Once resources are integrated in the CLARIN infrastructure, they can be analyzed and enriched more easily with various linguistic tools (e.g. automated part-of-speech taggingphonetic alignment or audio/video analysis).
CLARIN Resource Families

The aim of the CLARIN Resource Families initiative is to provide a user-friendly overview of the available language resources in the CLARIN infrastructure for researchers from digital humanities, social sciences and human language technologies. The overviews are organized according to the types of data in the resources and include listings sorted by language.

The listings include the most important metadata and brief descriptions, such as resource size, text sources, time periods, annotations and licences as well as links to download pages and concordancers, whenever available. In addition to the resources found in the CLARIN infrastructure, CLARIN Resource Families provides an overview of other existing valuable language resources which have not yet been integrated in the infrastructure.

CLARIN Resource Families also provides hyperlinks to other relevant materials such as the thematic CLARIN workshops and tutorials and their accompanying videolectures, as well as a list of key publications on the resources surveyed.

CLARIN Knowledge Sharing

The aim of the CLARIN Knowledge Sharing Initiative is to ensure ensure that the available knowledge and expertise provided by CLARIN consortia does not exist as a fragmented collection of unconnected bits and pieces, but is made accessible in an organized way to the CLARIN community and to the Social Sciences and Humanities research community at large. 

One central step in building the Knowledge Sharing Infrastructure is the establishment of Knowledge Centres. Most existing CLARIN centres are able to get the status of a Knowledge Centre right away; the K-Centres rather formalize and centrally register the existing expertise but does usually not require much additional effort from an institute except that the knowledge-sharing services have to be reliable and their skope has to be made explicit on a dedicated web-page of the respective institute(s).

The list of CLARIN Knowledge Centres is available here: 

https://www.clarin.eu/content/knowledge-centres