The SSH Training Discovery Toolkit provides an inventory of training materials relevant for the Social Sciences and Humanities.

Use the search bar to discover materials or browse through the collections. The filters will help you identify your area of interest.

 

Text encoding

Source
Title Body
#dariahTeach

#dariahTeach is a platform for Open Educational Resources (OER) for Digital Arts and Humanities educators and students, but also beyond this aiming at Higher Education across a spectrum of disciplines, at teachers and trainers engaged in the digital transformation of programme content and learning methods. #dariaTeach has two key objectives: sharing and reuse, thus developing a place for people to publish their teaching material and for others to use it in their own teaching.

Wikiwho

The core functionality of wikiwho is to parse the complete set of all historical revisions (versions) of a wikipedia article in order to find out who wrote and/or removed which exact text at what point in time. this means that given a specific revision of an article (e.g., the current one) wikiwho can determine for each word and special character which user first introduced that word and if and how it was deleted/reintroduced afterwards. this functionality is not offered by wikipedia as such and wikiwho was shown to perform this task with very high accuracy (~95%) and very efficiently, being the only tool that has been scientifically proven to perform this task that well.

Item
Title Body
Voices of the Parliament: A Corpus Approach to Parliamentary Discourse Research

While corpus methods are widely used in linguistics, including gender analysis, this tutorial shows the potential of richly annotated language corpora for research of the socio-cultural context and changes over time that are reflected through language use. The tutorial encourages students and scholars of modern languages, as well as users from other fields of digital humanities and social sciences who are interested in the study of socio-cultural phenomena through language, to engage with user-friendly digital tools for the analysis of large text collections. The tutorial is designed in such a way that it takes full advantage of both linguistic annotations and the available speaker and text metadata to formulate powerful quantitative queries that are then further extended with manual qualitative analysis in order to ensure adequate framing and interpretation of the results.

The tutorial demonstrates the potential of parliamentary corpora research via concordancers without the need for programming skills. No prior experience in using language corpora and corpus querying tools is required in order to follow this tutorial. While the same analysis could be carried out on any parliamentary corpus with similar annotations and metadata, in this tutorial we will use the siParl 2.0 corpus which contains parliamentary debates of the National Assembly of the Republic of Slovenia from 1990 to 2018. Knowledge of Slovenian is not required to follow the tutorial. To reproduce the analyses in other languages, we invite you to explore a parliamentary corpus of your choice from those available through CLARIN.

 

Taken from: Teaching with CLARIN: 

wikiwho api.

You can get word/token-wise information from which revision what content originated (and thereby which editor originally authored the word) as well as all changes a token was ever subject to.

Oxford Text Archive

The Oxford Text Archive (OTA) provides repository services for literary and linguistic datasets. In that role the OTA collects, catalogues, preserves and distributes high-quality digital resources for research and teaching. We currently hold thousands of texts in more than 25 languages, and are actively working to extend our catalogue of holdings. The OTA relies upon deposits from the wider community as the primary source of materials. The OTA is part of the CLARIN European Research Infrastructure; it is registered as a CLARIN centre, and OTA services are part of the University of Oxford's contribution to the CLARIN-UK Consortium.

CLARIN:EL

clarin:el is the Greek national network of language resources, a nation-wide Research Infrastructure devoted to the sustainable storage, sharing, dissemination and preservation of language resources.

The Central Aggregator is the central Repository of the clarin:el Infrastructure, which is responsible for the harvesting of metadata from the local Repositories, the organisation and the presentation of the metadata descriptions in a uniform catalogue and the provision of access to the Language Resources to the network members and to the public.

ORTOLANG

This is a depositing service for spoken and linguistic data.

CLARIN.SI

Depositing service for any linguistic/NLP data and tools.

CLARIN-PL

Depositing service for Polish language resources and services.

Dutch Language Institute

The INT is one of the four CLARIN B Centres in The Netherlands and it serves as an exclusive CLARIN B Centre for Flanders (Belgium). In fulfilling this role the INT provides researchers, (assistant) professors and students with (advice about) data and tools for linguistic research.

The INT also offers assistance and an infrastructure to researchers or institutions that want to share data or tools that were developed in research projects in the social sciences and humanities.

More data and tools for Dutch can be found at CLAPOP, the portal of the Dutch CLARIN community and by means of the CLARIN Virtual Language Observatory, a metadata-based portal for all CLARIN language resources and tools.