CLARIN Depositing Services

Source

One of the fundamental services of the CLARIN infrastructure is making sure that language resources can be archived and made available to the community in a reliable manner. To help researchers to store their resources (e.g. corpora, lexica, audio and video recordings, annotations, grammars, etc.) in a sustainable way, many of the CLARIN centres offer a depositing service. They are willing to store the resources in their repository and assist with the technical and organisational details. This has a wide range of advantages:

  • Long-term archiving: a storage guarantee can be given for a long period (up to 50 years in some cases)
  • Resources can be cited easily with a persistent identifier.
  • The resources and their metadata will be integrated into the infrastructure, making it possibe to search them efficiently.
  • Password-protected resources can be made available via an institutional login.
  • Once resources are integrated in the CLARIN infrastructure, they can be analyzed and enriched more easily with various linguistic tools (e.g. automated part-of-speech taggingphonetic alignment or audio/video analysis).
Responsible organisation
Format
Language
Contact
CLARIN Office
office@clarin.eu
Created
Last updated
Collections
Training Toolkit

Items

Title Description Collections
Bavarian Archive for Speech Signals

Depositing service for corpora of spoken languages which contain a minimum of at least one measured signal that is based on the physical processes of speech production (e.g. acoustic signals, videos, series of measurements, series of pictures).

Training Toolkit
CLARIN Centre Vienna

ARCHE (A Resource Centre for the HumanitiEs) is a service that offers stable and persistent hosting as well as the dissemination of digital research data and resources for the Austrian humanities community. ARCHE welcomes data from all humanities fields.

Training Toolkit
CLARIN-DK-UCPH

The mission of CLARIN-DK is to provide easy and sustainable access for scholars in the humanities and social sciences to digital language data (in written, spoken, video or multimodal form) and to provide advanced tools for discovering, exploring, exploiting, annotating, and analyzing them. CLARIN-DK also shares knowledge on Danish language technology and resources and is the Danish node in the European CLARIN-ERIC. The objective of the CLARIN Centre at the University of Copenhagen is to fulfill the CLARIN-DK mission. The centre provides data management consultation and support in connection with depositing and reuse of research data.

Training Toolkit
CLARIN-PL

Depositing service for Polish language resources and services.

Training Toolkit
CLARIN.SI

Depositing service for any linguistic/NLP data and tools.

Training Toolkit
CLARIN:EL

clarin:el is the Greek national network of language resources, a nation-wide Research Infrastructure devoted to the sustainable storage, sharing, dissemination and preservation of language resources.

The Central Aggregator is the central Repository of the clarin:el Infrastructure, which is responsible for the harvesting of metadata from the local Repositories, the organisation and the presentation of the metadata descriptions in a uniform catalogue and the provision of access to the Language Resources to the network members and to the public.

Training Toolkit
Dutch Language Institute

The INT is one of the four CLARIN B Centres in The Netherlands and it serves as an exclusive CLARIN B Centre for Flanders (Belgium). In fulfilling this role the INT provides researchers, (assistant) professors and students with (advice about) data and tools for linguistic research.

The INT also offers assistance and an infrastructure to researchers or institutions that want to share data or tools that were developed in research projects in the social sciences and humanities.

More data and tools for Dutch can be found at CLAPOP, the portal of the Dutch CLARIN community and by means of the CLARIN Virtual Language Observatory, a metadata-based portal for all CLARIN language resources and tools.

Training Toolkit
FIN-CLARIN

Depositing service for language resources related to Finnish, Finland Swedish and the Fenno-Ugric languages, as well as other language resources created in Finland.

Training Toolkit
HZSK

The HZSK is a CLARIN centre that accepts corpora and other linguistic resources from research projects and other contexts in order to make these available mainly to the academic community for research and teaching purposes. The focus of the HZSK is on spoken, multilingual and multimodal corpora, and (spoken) corpora in other languages than German, especially of lesser-recourced or endangered languages.

Training Toolkit
LINDAT/CLARIAH-CZ

Depositing service for any linguistic and/or NLP data and tools: corpora, treebanks, lexica, but also trained language models, parsers, taggers, machine translation systems, web services, etc.

Training Toolkit
ORTOLANG

This is a depositing service for spoken and linguistic data.

Training Toolkit
Oxford Text Archive

The Oxford Text Archive (OTA) provides repository services for literary and linguistic datasets. In that role the OTA collects, catalogues, preserves and distributes high-quality digital resources for research and teaching. We currently hold thousands of texts in more than 25 languages, and are actively working to extend our catalogue of holdings. The OTA relies upon deposits from the wider community as the primary source of materials. The OTA is part of the CLARIN European Research Infrastructure; it is registered as a CLARIN centre, and OTA services are part of the University of Oxford's contribution to the CLARIN-UK Consortium.

Training Toolkit
University of Tübingen repository

The CLARIN repository at the University of Tübingen offers long-term preservation of digital resources, along with their descriptive metadata.

The mission of the repository is to ensure the availability and long-term preservation of resources, to preserve knowledge gained in research, to aid the transfer of knowledge into new contexts, and to integrate new methods and resources into university curricula.

The repository is part of the eScience infrastructure of the University of Tübingen, which is a core facility that strongly cooperates with the library and computing center of the university.

Integration of the repository into the national CLARIN-D and international CLARIN infrastructures gives it wide exposure, increasing the likelihood that the resources will be used and further developed beyond the lifetime of the projects in which they were developed.

Among the resources currently available in the Tübingen Center Repository, researchers can find widely used treebanks of German (e.g. TüBa-D/Z), the German wordnet (GermaNet), the first manually annotated digital treebank (Index Thomisticus), as well as descriptions of the tools used by the WebLicht ecosystem for natural language processing.

Training Toolkit