The SSH Training Discovery Toolkit provides an inventory of training materials relevant for the Social Sciences and Humanities.

Use the search bar to discover materials or browse through the collections. The filters will help you identify your area of interest.


Survey data

Title Body
UK Data Service: Secure Lab

The UK Data Service SecureLab has enabled secure access to the most sensitive and confidential data in the collection since 2011. SecureLab provides controlled access to data that are too detailed, sensitive or confidential to be made available under less restrictive access levels. 

UK Data Service: Teaching with Data

This is a collection of resources dedicated to teachers, trainers and their students but could also be useful to researchers and the general public. It includes guides, e-books, slides and webinars covering a wide range of topics: quantitative methods, statistical software, teaching data analysis, data visualisation, qualitative methods and psychosocial approaches.

UK Data Service: Data Skills

This source includes interactive modules designed for users who want to get to grips with key aspects of survey, longitudinal and aggregate data as well as tools that can be used to assess and improve data quality. 

Modules can be conducted in your own time and you are able to dip in and out when needed. The modules give an introduction to key aspects of the data using short instructional videos, interactive quizzes and activities using open access software where possible. Tools include guides, documentation and exercises.

Title Body
Voices of the Parliament: A Corpus Approach to Parliamentary Discourse Research

While corpus methods are widely used in linguistics, including gender analysis, this tutorial shows the potential of richly annotated language corpora for research of the socio-cultural context and changes over time that are reflected through language use. The tutorial encourages students and scholars of modern languages, as well as users from other fields of digital humanities and social sciences who are interested in the study of socio-cultural phenomena through language, to engage with user-friendly digital tools for the analysis of large text collections. The tutorial is designed in such a way that it takes full advantage of both linguistic annotations and the available speaker and text metadata to formulate powerful quantitative queries that are then further extended with manual qualitative analysis in order to ensure adequate framing and interpretation of the results.

The tutorial demonstrates the potential of parliamentary corpora research via concordancers without the need for programming skills. No prior experience in using language corpora and corpus querying tools is required in order to follow this tutorial. While the same analysis could be carried out on any parliamentary corpus with similar annotations and metadata, in this tutorial we will use the siParl 2.0 corpus which contains parliamentary debates of the National Assembly of the Republic of Slovenia from 1990 to 2018. Knowledge of Slovenian is not required to follow the tutorial. To reproduce the analyses in other languages, we invite you to explore a parliamentary corpus of your choice from those available through CLARIN.


Taken from: Teaching with CLARIN: 

Oral Archives for Sociolinguistic Research

The goal of the course in sociolinguistics is to show students the possibilities and challenges offered by oral history archives for (socio)linguistic research. The course is intended as a research framework that will guide students during their future research work. The lectures allow students to become acquainted with the CLARIN infrastructure, and to present them with software tools that will allow them to carry out their own thesis research independently. The course offers guidance for the following steps that must be addressed during a (research) project dealing with oral archives: i) reviewing ethical and legal issues arising from using and reusing legacy data; ii) use of metadata to provide the appropriate level of description for the dataset; iii) automatic and manual transcription of the speech material, using the CLARIN infrastructure; iv) the selection and use of the appropriate CLARIN software and tools depending on the research goals (phonetic, lexical, discourse analysis, etc.).


Taken from: Teaching with CLARIN:

Introduction to Speech Analysis

This course offers a general picture of managing speech corpora and of the methods that are available for the acoustic-phonetic study of speech. During the course, students use a speech analysis program called Praat and learn to apply the main features of the program in their own work with speech recordings. In addition, students will learn the basics of another program called ELAN that can be used for transcribing and annotating audio as well as video material.

Taken from: Teaching with CLARIN:

Teaching ideas: Guides for teaching data analysis

This resource is a collection of short guides designed to make lesson planning more efficient for those teaching data analysis skills. Drawing on real classroom experiences, each guide includes suggested research questions, dataset and exercises:

  • Gender differences in sexual attitudes (PDF) (using the National Survey of Sexual Attitudes and Lifestyles).
  • Risk factors associated with increased levels of systolic blood pressure (PDF) (using the Health Survey for England).
  • The gender gap in life satisfaction (PDF) (using the Opinions and Lifestyle Survey).
  • Public confidence in the police (PDF) (using the Crime Survey for England and Wales).
Building skills in quantitative methods and statistical software

A collection of quantitative methods e-books and accompanying quizzes for direct use in teaching students or for self-study. E-books aim to build skills in quantitative methods and statistical software and use the Living Costs and Food Survey.

The e-books have been developed through a collaboration of the UK Data Service, National Centre for Research Methods (NCRM), and the Centre for Multi-Level Modelling at the University of Bristol and were created using the StatJR software based on original outputs from the project Using Statistical E-books to teach undergraduate students quantitative methods and statistical software funded by the British Academy.


QAMyData is an easy-to-use, open source tool that provides a health check for numeric data. The tool uses automated methods to detect and report on some of the most common problems in survey or numeric data, such as missingness, duplication, outliers and direct identifiers.

The tool offers a number of configurable tests that have been categorised into four types: file, metadata, data integrity, and identifiers, which can be run on popular file formats, including SPSS, Stata, SAS and CSV. A standard config file has default settings for each test, such as a threshold for pass or fail on various tests (e.g. detect value label that are truncated, email addresses identified as a string, or undefined missing values) which can be easily adapted to meet the user’s own desired thresholds. The configuration feature allows the creation of a unique Data Quality Profile. The software creates a ‘data health check’ that details errors and issues as both a summary and detailed report, providing a location of the failed test. New tests can easily be added. Data depositors and publishers can act on the results and resubmit the file until a clean bill of health is produced.

Data Skills Modules

These introductory level interactive modules are designed for users who want to get to grips with key aspects of survey, longitudinal and aggregate data.

Modules can be conducted in your own time and you are able to dip in and out when needed. The modules give an introduction to key aspects of the data using short instructional videos, interactive quizzes and activities using open access software where possible.

Each module stands alone but those with little experience of surveys may find it useful to start with the Survey Data Module before moving on to the Longitudinal Data Module.

Modules include: Survey Data, Longitudinal Data, Aggregate Data