Big data

Big data refers to extremely large and/or complex datasets, and the methods used to manage and analyse them

Data Ethics in Cultural Heritage
EN
This resource aims to introduce the main aspects of data ethics in the cultural heritage domain. It also examines how data management can be supported to become more ethical, while also addressing topical discourse about data ethics in the sector. The resource also aims to support in critically reflecting on some case studies with evident digital data ethics considerations.
Authors
Myrsini Samaroudi
Read more →
Digitisation Methods for Material Culture
EN
This resource is an introduction to Digitisation Methods for Material Culture. The resource explores basic topics with regards to the study of material culture, while also looking at types of media as means to communicate and share information about it, as well as digitisation methods to capture material culture data.
Authors
Karina Rodriguez Echavarria
Myrsini Samaroudi
Nicola Schiavottiello
Read more →
Creating Stories with 3D Data on the Web
EN
This resource provides guidance on how to use digital storytelling, deploying 3D data, annotations and combining media to enable users to access and explore information about digital heritage assets over the web.
Authors
Karina Rodriguez Echavarria
Nicola Schiavottiello
Read more →
Text Analysis - Linguistics Meets Data Science
EN
What are the differences between a data scientist and a corpus linguist? This course provides an overview of the different perspectives on language and different types of tools that can be used for text analytics. It also introduces topic modelling and sentiment analysis as approaches to textual data.
Authors
Jukka Tyrkkö
Daniel Ocic Ihrmark
Read more →
The Learning Curve in Sharing Data with the EHRI Project
EN
A partnership between Kazerne Dossin and EHRI was established to enable sharing of metadata with a broader audience. This partnership resulted in changes to the practices of cataloguing archival materials within Kazerne Dossin. Using the example of the Lewkowicz family collection, this article focuses on the revolution Kazerne Dossin went through while standardising descriptions, and on the tools EHRI provided to optimise the workflow for collection holding institutes.
Authors
Dorien Styven
Marius Caragea
Veerle Vanden Daelen
Read more →
Data Journalism and AI: New frontiers in investigation and storytelling
EN
Data is now an indispensable part of investigative work and storytelling for journalists and newsrooms. Computational methods and artificial intelligence are making their way to newsrooms more than ever before, and promise to open up new opportunities for journalists, as well as new challenges. This talk provides an overview of how data and Artificial Intelligence can be used in the journalism workflow, investigative reporting and storytelling.
Authors
Bahareh Heravi
Read more →
What Can I Do With This Messy Spreadsheet? Converting from Excel Sheets to Fully Compliant EAD-XML files
EN
Many Galleries, Libraries, Archives, and Museums (GLAMs) face difficulties sharing their collections metadata in standardised and sustainable ways, meaning that staff rely on more familiar general purpose office programs such as spreadsheets. However, while these tools offer a simple approach to data registration and digitisation they don’t allow for more advanced uses. This blogpost from EHRI explains a procedure for producing EAD (Encoded Archival Description) files from an Excel spreadsheet using OpenRefine.
Authors
Herminio Garcia González
Read more →
More Watching, Less Searching: Repurposing Fortunoff Archive Metadata for Visual Searching
EN
The Fortunoff Visual Search is a tool for both data visualisation and collection discovery from the Fortunoff Video Archive for Holocaust Tesimonies. This blogpost demonstrates the Visual Search tool in the Fortunoff Video Archive, including the search and filtering interface, as well as interpreting the resulting visualisations
Authors
Stephen Naron
Jake Kara
Read more →
Using Named Entity Recognition to Enhance Access to a Museum Catalog
EN
This blog discusses the applicability of services such as automatic metadata generation and semantic annotation for automatic extraction of person names and locations from large datasets. This is demonstrated using Oral History Transcripts provided by the United States Holocaust Memorial Museum (USHMM).
Authors
Ivelina Nikolova
Michael Levy
Read more →
Spatial Queries and the First Deportations from Slovakia
EN
In the late 1930s, just before war broke in Europe, a series of chaotic deporations took place expelling thousands of Jews from what is now Slovakia. As part of his research, Michel Frankl investigates the backgrounds of the deported people, and the trajectory of the journey they were taken on. This practical blog describes the tools and processes of analysis, and shows how a spatially enabled database can be made useful for answering similar questions in the humanities, and Holocaust Studies in particular.
Authors
Michal Frankl
Read more →
quod: A Tool for Querying and Organising Digitised Historical Documents
EN
This blog post from EHRI introduces 'quod' (querying OCRed documents), a prototype Python-based command line tool for OCRing and querying digitised historical documents, which can be used to organise large collections and improve information about provenance. To demonstrate its use in context, this blog takes the reader through a case study of the International Tracing Service, showing workflows and the steps taken from start to finish.
Authors
Reinier De Valk
Read more →
EHRI in TEITOK
EN
This blog examines TEITOK, which is a corpus framework used as an alternative to Omeka. TEITOK is centered around texts and is similar to the Omeka interface – both allow you to search through the documents, and display the transcription. The main difference is that Omeka treats the transcription as an object description, whereas TEITOK not only shows that a word appears in a document, but also where it appears and how it is used.
Authors
Maarten Janssen
Read more →

Big data

Resources