Novel Digital History Project developed by MIT and Czech historians

18. 10. 2023

In cooperation with specialists from MIT, the Masaryk Institute and Archives of the CAS is developing unique software that will enable researchers to search through and contextualise a tremendous number of digitised historical sources and evaluate them using artificial intelligence. The Novel Digital History Project web app should be made available online to the public within the next year. The researchers’ goal is to integrate the new software with archives, libraries, and other institutions around the world.

Picture this: a researcher in Argentina gets hold of the original of a German-language letter from the Protectorate of Bohemia and Moravia, say from the Kladno District. After carefully scanning the letter and inserting it into the specialised software, it will read and translate the letter for her. The information reveals, among other things, that a certain official was sanctioned for not complying with a decree banning the domestic slaughter of pigs. The historian reads the decree, clicks on a map of the Kladno District, and looks up how common penalties for domestic slaughter were in Kladno, who, where, and how they were sanctioned, the biography of the official, whether the sender of the letter was also involved, etc.

This comprises a likely scenario of what the work of historians – or history aficionados – will look like in the coming years. Weeks and months of visiting archives and the tens of hours of sifting and sorting through primary sources and creating databases will be replaced by sophisticated computer work.

The first such complex software is being developed by researchers from the Masaryk Institute and Archives of the CAS, who presented it at the prestigious three-day annual conference of the German Studies Association in Montréal, attended by over 1,000 researchers from all over the world. “The software is able to interconnect digitized documents like decrees, laws, official protocols, and letters from one region, specifically the Kladno District, and its tools will include a machine translation engine as well as artificial intelligence services that can help clarify a topic and evaluate specific queries,” explains Jan Vondráček, who led the team of Czech historians and American developers from MIT involved in the project, funded by the MISTI–Czech Seed Fund program, which gave rise to the unique software.

MIT Masaryk digital history software, photo by Kurt Fendt
The project team during a workshop at MIT: Jean Billa, MIT; Jan Vondráček, Czech Academy of Sciences; Max Frischknecht, University of Basel & Bern; Lucie Kalabisova and Mirandá Patrica Martinez Elton, Charles University. Photo by Kurt Fendt (MIT).

An archive on your desktop

The software, which is expected to be released in 2024, combines several hot-topic technological innovations: machine learning, which allows a computer to read a hand-written manuscript and convert it into text, and ChatGPT artificial intelligence, which the historians can “disconnect” from the internet and “feed” its database with historical documents. The next step is to sort the data and create a visualisation of it – for instance, in map form.

“In the future, we would like to add a machine translation engine to make the data accessible in all languages, and the big dream is to integrate the software into a network of other archives and libraries around the world,” Vondráček explains.

An additional goal, which the researcher intends to focus on in a future project, is to improve the software across the board and eventually digitise sources originating from other regions as well. “For me, the Information Age, and especially the advancement of ChatGPT, is really the equivalent of the Industrial Revolution that we currently find ourselves in. It will allow us historians to work in a completely different style and format – to ask different questions and learn the answers radically faster,” the researcher emphasises.

Opportunities for students and researchers

The MISTI–Czech Seed Fund programme is a collaboration between MIT, the Czech Academy of Sciences, the Institute of Organic Chemistry and Biochemistry of the CAS, and its subsidiary, IOCB Tech. It is currently funded by the IOCB Tech Endowment Fund.

Jan Vondráček’s project Institutional Data Shaping Digital History was the fund’s first collaborative project involved in the social sciences. The Czech side digitized the archive while MIT developed the code. The team worked together on the basis of regular online meetings and a one-week research fellowship held at MIT.

The main research interests of Dr. phil. Jan Vondráček are the history of World War II and the history of everyday life and political administration of the Protectorate of Bohemia and Moravia. He is a graduate of the Technical University of Darmstadt, the Ludwig Maximilian University of Munich, and the Chemnitz University of Technology. He has been working at the Masaryk Institute and Archives of the CAS since 2019, focusing on new approaches in digital humanities by means of digitization, database creation, and digital analysis.

More information:

Dr. phil. Dr. Jan Vondráček
Masaryk Institute and Archives of the CAS

Prepared by: Press Department, Division of External Relations, CAO of the CAS
Translated by: Tereza Novická, Division of External Relations, CAO of the CAS
Photo: Kurt Fendt, MIT

The Czech Academy of Sciences (the CAS)

The mission of the CAS

The primary mission of the CAS is to conduct research in a broad spectrum of natural, technical and social sciences as well as humanities. This research aims to advance progress of scientific knowledge at the international level, considering, however, the specific needs of the Czech society and the national culture.

President of the CAS

Prof. Eva Zažímalová has started her second term of office in May 2021. She is a respected scientist, and a Professor of Plant Anatomy and Physiology.

She is also a part of GCSA of the EU.