The following workshops will take place at the International Institute  for Social History (Cruquiusweg 31, Amsterdam) on Wednesday 6 June.

Full day workshops:

Digital Tool Criticism
Introduction to Born-Digital Heritage: from harvesting to analysing Web Archives
Oral History and Speech Technology

Morning half-day workshops:

Network analysis and new approaches to music history
LIWC in the Digital Humanities

Afternoon half-day workshops:

Using Nederlab for humanities research
Mining Delpher with a Jupyter Notebook

Registration:

Registrations is free but compulsory: https://goo.gl/forms/erRc1o91uyFkjlVl2

 

Workshop descriptions:

Digital Tool Criticism

Organisers: Marijn Koolen (KNAW Humanities Cluster, marijn.koolen@huygens.knaw.nl), Jasmijn van Gorp (Utrecht University, J.vanGorp@uu.nl), Jacco van Ossenbruggen (CWI, VU University Amsterdam, j.r.van.Ossenbruggen@vu.nl)

The aim of this workshop is to bring together people with an interest in Digital Humanities research to collaboratively establish guidelines for tool criticism in DH research and is a follow-up the the DH Benelux 2017 workshop on Tool Criticism. With tool criticism we mean the reflection on the role of digital tools in research methodology and the evaluation of the suitability of a given digital tool for a specific research goal. Our aim is to understand the impact of any limitation of the tool on the specific goal, not to improve its performance. While source criticism is common practice in many academic fields, the awareness for biases inherent in digital tools and their influence on research tasks needs to be increased. This requires researchers, data custodians and tool providers to understand issues from different perspectives. Researchers need to be trained to anticipate and recognize tool bias and its impact on their research results.
Data custodians and tool providers, on the other hand, have to make information about the potential biases of the underlying processes more transparent. This includes processes such as collection policies, digitization procedures, OCR, data enrichment and linking, quality assessment, error correction and search technologies.


The workshop will be highly interactive, combining hands-on experimentation with discussion and reflection on the impact of data and tool limitations on the research outcomes. Participants work on their assignments in groups, as previous experiences have shown that collaborative experimentation and documentation encourages them to reflect on this process. To ensure that the outcome of the assignments can be compared across groups, we limit the range of tools and datasets that are used, but that are relevant to a broad range of humanities disciplines,allowing criticism from different perspectives and methodologies. We will use a model for digital tool criticism that we developed in response the 2017 workshop, that will guide participants in structuring the process of exploratory research, documentation and reflection.


At the end of the workshop, groups will present and compare their findings, and we intend to collaboratively draft a set of guidelines and a checklist for DH tool criticism that is relevant to be both creators and users of digital research tools. Participants are invited to co-author a paper based on the workshop that will be submitted to an international digital humanities journal.

 

Introduction to Born-Digital Heritage: from harvesting to analysing Web Archives

Organisers: Sally Chambers, Digital Humanities Research Coordinator, GhentCDH, Ghent University (Sally.Chambers@UGent.be) and Valérie Schafer (Professor of Contemporary European History, C2DH, University of Luxembourg, valerie.schafer@uni.lu)

“For historians, and researchers in many other humanities disciplines, web archives remain largely an unknown, and certainly underused, primary source. Even within digital humanities, web archives as a focus for study have remained on the fringe. “  Jane Winters (2017)

The history of the World Wide Web already spans more than a quarter of a century. The Internet Archive, as well as several National Libraries and Archives, have already been archiving the web for over 20 years. Yet, the use of the archived web as an object of research remains at the fringes of (digital) humanities research.  Although most researchers in the humanities still need to begin to explore the potential of these archives, some projects have already investigated their potential. The Big UK Domain Data for the Arts and Humanities (BUDDAH) project or the research being undertaken by member of the RESAW network (Research Infrastructure for the Study of Archived Web Materials), being particular examples.

The aim of this one day workshop at DH Benelux is to introduce digital humanities researchers to the possibilities of using the archived web as a research resource. The workshop will be start out with an introduction to web archiving, in which participants will be introduced to the web as a historical resource. This section of the programme will include presentations of some existing web-archives, as well as an introduction to the technicalities of web-archiving.

In the second part of the workshop, participants will have the opportunity to explore some of the existing research using the archived web. In this session, researchers, from a range of disciplines across the humanities and social sciences will present the research that they undertake using archived web resources. The third session of the day will enable participants to delve deeper into the data-level of of archived web resources. This part of the workshop will include an introduction to the Web ARCHived (WARC) file format as well as an introduction to the Archives unleashed toolkit. Furthermore, a number of tools to analyse archived web materials will be presented.

The final session of the workshop will be dedicated to a hands-on web archiving challenge, in which participants will have the opportunity to work directly with archived web content. In small groups, participants will be confronted with a Benelux-themed challenge related to “web archives to study Benelux”.

Confirmed speakers: Els Breedstraet (Coordinator Web Preservation, Publications Office of the European Union), Sally Chambers (Ghent Centre for Digital Humanities, Ghent University), Frédéric Clavert (C2DH, University of Luxembourg), Friedel Geeraert (Royal Library / State Archives of Belgium), Arnoud Goos (Netherlands Institute for Sound and Vision), Anne Helmond (University of Amsterdam), Olga Holownia (International Internet Preservation Consortium, IIPC),  Kees Teszelszky (National Library of the Netherlands), Valérie Schafer (C2DH, University of Luxembourg), Jesse de Vos (Netherlands Institute for Sound and Vision) Eveline Vlassenroot (imec-mict Ghent University), Jane Winters (School of Advanced Study and Senate House Library University of London)  

 

Network analysis and new approaches to music history

Organisers: Marnix van Berchum (Utrecht University, m.vanberchum@uu.nl) Stephen Rose (Royal Holloway, University of London, stephen.rose@rhul.ac.uk) Thomas Delpeut (Radboud University, t.delpeut@let.ru.nl)

This workshop explores the opportunities and challenges arising when network analysis is applied to the study of music history. It is aimed at both digital humanities specialists who wish to gain new perspectives from the domain of music, and musicologists who wish to learn more about the concepts and techniques of network analysis. By the end of the workshop, attendees should have gained a clearer understanding of: how network analysis can illuminate musicology (particularly the branches of musicology using large quantities of data), how network conceptualisations can be translated into data-driven methodologies, and how other disciplines might benefit from approaches developed specifically for studying music. Music has a particularly multi-layered, multimodal nature, involving compositions, performances, audiences, critical discourses, social contexts, etc. The workshop will discuss how these elements can be modelled and analysed as networks, and how music’s multimodal nature can offer new perspectives for network analysis in other disciplines (for instance, how literary studies can address the performative networks surrounding texts and authors).

Following initial introductions, the workshop will begin with a conceptual discussion, tackling such questions as:

  • How can the approaches of Social Network Analysis and Actor-Network Theory be reconciled with a data-driven approach to the study of music history?
  • How can the analysis of musical networks promote a more relational approach to music history?
  • What domain-specific challenges arise when analysing networks in music?

The second part of the workshop introduces specific music datasets that are suitable for modelling as networks, including:

  • Music catalogue data from the British Library, covering published music from 1500 to the present day;
  • Data from Dutch 19th-century concert life (programmes, audiences, finances, texts);
  • Data for early music manuscripts and prints 1400-1600.

Participants are encouraged to bring examples of other musicological datasets suitable for network analysis; please contact m.vanberchum@uu.nl if you would like to discuss a dataset or submit a sample of it.

The above-mentioned datasets and those submitted by participants will be discussed in a plenary setting, showing the relevance of these approaches to a broader range of fields, including other humanities disciplines, music information retrieval, and the library/heritage sector.

 

LIWC in the Digital Humanities

Organisers: Peter Boot, peter.boot@huygens.knaw.nl, Hennie Brugman, hennie.brugman@meertens.knaw.nl, Erika Kuijpers, h.m.e.p.kuijpers@vu.nl, Inger Leemans, i.b.leemans@vu.nl, Isa Maks, isa.maks@vu.nl

LIWC (Linguistic Inquiry and Word Count; Pennebaker, Boyd, Jordan, & Blackburn, 2015) is a tool  developed by social psychologists but increasingly being used in a DH context (e.g. Boyd, 2017; Leemans, Van der Zwaan, Maks, Kuijpers, & Steenbergh, 2017; Piper, 2016). The tool counts words in texts from a number of psychologically meaningful categories. The outcome is, for each submitted text, the percentage of words in that text in each of the categories.

Categories include, among others, emotions, cognitions, senses, biology, function words, and ‘life concerns’. At the basis of LIWC is a dictionary that assigns words to categories. A Dutch version of the LIWC 2001 dictionary has been available since 2004 (Zijlstra, Van Meerveld, Van Middendorp, Pennebaker, & Geenen, 2004). Recently, Dutch versions of the 2007 and 2015 dictionaries became available (Boot, Zijlstra, & Geenen, 2017; Van Wissen & Boot, 2017). There is also an experimental historicized version of the dictionary, for doing research into historic texts. In French, there exists a translation of the LIWC 2007 dictionary (Piolat, Booth, Chung, Davids, & Pennebaker, 2011).

The aim of the workshop at DH Benelux 2018 is to bring together DH researchers from the Benelux countries who use or might use LIWC in their research. The content of the workshop will be decided based on the interest and knowledge level of the participants, but may include:

  • Instruction in using LIWC (this will be a minor component, as the tool is largely self explanatory);
  • Mini-presentations of DH projects that have been using LIWC (by workshop participants as well as organizers);
  • A discussion of the differences between the dictionary versions (2007 was translated
    manually, 2015 was translated in an automated procedure);
  • Discussion about how to evaluate the outcome of the tool and be sure that it is a reliable
    method to answer specific research questions (beyond its original field of application);
  • Discussion of shortcomings of the dictionary (versions) and possible ways to address the
    Shortcomings;
  • Discussion of possible extensions to the dictionary;
  • Discussion of the copyright status of the dictionary in relation to an open source and open access scientific landscape;
  • Discussion on the limitations of the historicized version of the dictionary and ways to
    address them;
  • Discussion of software that can be used in conjunction with the LIWC dictionaries.

The hoped-for results of the workshop will be increased cooperation between LIWC users, an
increased understanding of LIWC possibilities in a humanities context and a shared agenda for
addressing limitations of the current dictionary translations.

 

Title: Mining Delpher with a Jupyter Notebook

Organisers: Steven Claeyssens (steven.claeyssens@kb.nl) & Willem Jan Faber (willemjan.faber@kb.nl) (cf. http://lab.kb.nl/)

Join us for this half-day workshop on the basics of using Python within a Jupyter Notebook to analyse Delpher data. We assume no prior experience working with Delpher (meta)data nor any other significant technical knowledge or skills, although basic computer skills and some familiarity with Python is recommended.

Most humanities researchers in the Netherlands are familiar with Delpher, the online gateway to millions of pages of historical text (newspapers, books, journals & radio bulletins), mostly in Dutch. Delpher allows you to search and browse all documents in full text, making it a good resource for close reading. For distant reading, on the other hand, Delpher is far from the ideal tool.

If you want to analyse large amounts of textual data, the national library of the Netherlands (KB) already allows researchers access to both the digital images, metadata, and full text in bulk via our API’s, but we are also experimenting with additional ways of giving access to the Delpher collections. During this workshop experts of the KB will introduce you to a Jupyter Notebook that is being developed that allows you to (re)search and visualize Delpher data using Python.

The workshop will be in English. All data that we will work with, will be in Dutch. Please bring your own laptop.

 

Oral History and Speech Technology

Organiser: Arjan van Hessen, a.j.vanhessen@utwente.nl

The objective of this workshop is to showcase to researchers in the field of Oral History, media studies, and social scientists working with spoken narratives how the use of language and speech technology (HLT) can facilitate their research in their daily practice. The focus of the workshop will be on automatic transcription of audio files through speech recognition, and on convenient tools for manual post-hoc correction.

We will demonstrate various types of (partly open source) HLT-software, and will give an overview of the technologies that become available in the near future. Part of the workshop will consist of “hands-on-experience” where participants can use their own data and “our” tools.

The workshop is an extension of the CLARIN-EU workshop (Arezzo, May 2017) and current effort at the Ludwig Maximillian University to build an transcription portal for Oral Historians.

Please have a look at this website for additional information about the OH-Transcription chain: http://oralhistory.eu/workshops/transcription-chain.

In the workshop, we will address the following issues:

  1. Digitisation:

How to digitize analogue recordings (such as (cassette) tapes). What are the important issues, which software is available (and useful), what are the pitfalls and much more that is important in this often-crucial step from analogue to digital.

  1. Speech Recognition:

Which digital tools are available to efficiently make your own transcriptions? We will discuss the advantages and disadvantages of different types of Automatic Speech Recognition (ASR), of re-speaking and audio-text alignment. How to make a basic transcription with ASR that can be manually improved with the help of others.

  1. Forced Alignment:

Forced Alignment (FA, the process to align audio with written text) can be done in most languages. We will show the use of WebMAUS for the forced alignment. Visitors are invited to bring their own digital text and audio.

  1. Transcription (Correction):

In the workshop, we will show you software for making transcriptions and for correcting transcriptions that resulted from ASR.

  1. Metadata:

Which formats are available and how to choose “the right one” for your own OH-projects?

 

Using  Nederlab for humanities research

Organisers : Nederlab project team,  Hennie Brugman, project coordinator Nederlab,  hennie.brugman@meertens.knaw.nl

Nederlab (www.nederlab.nl) is a five year long ‘NWO-groot’  project that started at the beginning of 2013 with the ambition to collect a  large part of the production of published Dutch texts from about 800 until present and create a linguistically enriched diachronic   research corpus from it. To date, we collected and processed 20+ subcorpora with a total approximating 30 billion words.
   
The main aim of Nederlab is to present this enriched  text material as one uniform corpus via a virtual research  environment aimed primarily at historians, literary scholars and linguists,   but   also at  a wider audience.
   
Nederlab is currently in its final stage and will deliver its corpus, research environment and text   processing pipeline in summer 2018.
   
During 2017 Nederlab tested  its research infrastructure and collections with ten scientific pilot projects, six of which were executed as part of a public call for projects. In May 2017   representatives from these pilots gathered for an internal workshop to discuss their respective projects and intermediate results. Since then, all of the pilot projects have finished.               
   
We propose a DHBenelux 2018 workshop about  ‘Using Nederlab for humanities research’. Presenters at the workshop will report on their experiences with the Nederlab platform,  to each other, but especially to the wider scholarly audience.
   
Our aims for the workshop are:

In part one of the workshop we will present the ten Nederlab pilot projects that finished in 2017 (examples of some of the projects below).  Presentations by our pilot researchers will demonstrate and discuss a wide variety of practical use cases for Nederlab. The wide scope of these use cases will show the potential and also the limitations of Nederlab and may provide new ideas or solutions concerning the research interests of workshop participants.  
  
Topics from the pilots that may find their way into the workshop programme:
   
•   Construction of a Dutch corpus for the 15th and 16th century. Well organised, broadly available material  for this period was lacking but becomes available via Nederlab.
•   Geographic patterns  in syntactic change. Especially older documents have metadata about  both locations and dates. This metadata can be used to investigate linguistic changes over location and time.
•   Detecting text reuse in Nederlab collections.
•   Intra-author variation in negation in the letters of P.C. Hooft. Double negation almost completely disappeared from the Dutch language over time. Letters of Hooft  were used to study this process.   
•   Respelling text using VARD2. Often it is desirable to respell older  texts before doing further research.
•   LIWC-type of search and  analysis in Nederlab.
•   Usage  of pietistic language in 18th century  Dutch. A lexicon for the ‘Tale Kanaäns’ was extracted from texts contained in Nederlab and can be used to discover candidate  documents of ‘pietistic nature’.
•   Construction  of a lexicon of  emotion words and use it for semantic expansion of queries on Nederlab texts.  
•   Using “small” word embeddings to track semantic shifts.  Application of word2vec to the Staten Generaal Digitaal subcollection of Nederlab.    
•   Spreading of the  language that is characteristic for Spinoza, between 1640-1800.
•   Nederlab now and in  the future.

In part two of the workshop we invite attendees to explicitly share and discuss these ten projects and tools in breakaway groups where we will be able to provide hands-on and specific suggestions for researchers interested in using Nederlab. Participants may also send their own questions about and potential use cases for Nederlab via email to hennie.brugman@meertens.knaw.nl before the workshop.

 

Close Menu