The SSH Training Discovery Toolkit provides an inventory of training materials relevant for the Social Sciences and Humanities.
Use the search bar to discover materials or browse through the collections. The filters will help you identify your area of interest.
Text encoding and TEI
You can get word/token-wise information from which revision what content originated (and thereby which editor originally authored the word) as well as all changes a token was ever subject to.
|Oxford Text Archive||
The Oxford Text Archive (OTA) provides repository services for literary and linguistic datasets. In that role the OTA collects, catalogues, preserves and distributes high-quality digital resources for research and teaching. We currently hold thousands of texts in more than 25 languages, and are actively working to extend our catalogue of holdings. The OTA relies upon deposits from the wider community as the primary source of materials. The OTA is part of the CLARIN European Research Infrastructure; it is registered as a CLARIN centre, and OTA services are part of the University of Oxford's contribution to the CLARIN-UK Consortium.
clarin:el is the Greek national network of language resources, a nation-wide Research Infrastructure devoted to the sustainable storage, sharing, dissemination and preservation of language resources.
The Central Aggregator is the central Repository of the clarin:el Infrastructure, which is responsible for the harvesting of metadata from the local Repositories, the organisation and the presentation of the metadata descriptions in a uniform catalogue and the provision of access to the Language Resources to the network members and to the public.
This is a depositing service for spoken and linguistic data.
Depositing service for any linguistic/NLP data and tools.
Depositing service for Polish language resources and services.
|Dutch Language Institute||
The INT is one of the four CLARIN B Centres in The Netherlands and it serves as an exclusive CLARIN B Centre for Flanders (Belgium). In fulfilling this role the INT provides researchers, (assistant) professors and students with (advice about) data and tools for linguistic research.
The INT also offers assistance and an infrastructure to researchers or institutions that want to share data or tools that were developed in research projects in the social sciences and humanities.
More data and tools for Dutch can be found at CLAPOP, the portal of the Dutch CLARIN community and by means of the CLARIN Virtual Language Observatory, a metadata-based portal for all CLARIN language resources and tools.
|University of Tübingen repository||
The CLARIN repository at the University of Tübingen offers long-term preservation of digital resources, along with their descriptive metadata.
The mission of the repository is to ensure the availability and long-term preservation of resources, to preserve knowledge gained in research, to aid the transfer of knowledge into new contexts, and to integrate new methods and resources into university curricula.
The repository is part of the eScience infrastructure of the University of Tübingen, which is a core facility that strongly cooperates with the library and computing center of the university.
Integration of the repository into the national CLARIN-D and international CLARIN infrastructures gives it wide exposure, increasing the likelihood that the resources will be used and further developed beyond the lifetime of the projects in which they were developed.
Among the resources currently available in the Tübingen Center Repository, researchers can find widely used treebanks of German (e.g. TüBa-D/Z), the German wordnet (GermaNet), the first manually annotated digital treebank (Index Thomisticus), as well as descriptions of the tools used by the WebLicht ecosystem for natural language processing.
The HZSK is a CLARIN centre that accepts corpora and other linguistic resources from research projects and other contexts in order to make these available mainly to the academic community for research and teaching purposes. The focus of the HZSK is on spoken, multilingual and multimodal corpora, and (spoken) corpora in other languages than German, especially of lesser-recourced or endangered languages.
The core functionality of wikiwho is to parse the complete set of all historical revisions (versions) of a wikipedia article in order to find out who wrote and/or removed which exact text at what point in time. this means that given a specific revision of an article (e.g., the current one) wikiwho can determine for each word and special character which user first introduced that word and if and how it was deleted/reintroduced afterwards. this functionality is not offered by wikipedia as such and wikiwho was shown to perform this task with very high accuracy (~95%) and very efficiently, being the only tool that has been scientifically proven to perform this task that well.