Historical sources sorted and linked by unique software created by Czechs & MIT

18. 10. 2023

In cooperation with specialists from the Massachusetts Institute of Technology (MIT), the Masaryk Institute and Archives of the CAS is developing unique software that will enable researchers to search through and contextualise a tremendous number of digitised historical sources and evaluate them using artificial intelligence. The software will be made available online to the public within the next year. The researchers’ goal is to integrate the new software with archives, libraries, and other institutions around the world.

Picture this: a researcher in Argentina gets hold of the original of a German-language letter from the Protectorate of Bohemia and Moravia, say from the Kladno District. After carefully scanning the letter and inserting it into the specialised software, it will read and translate the letter for her. The information reveals, among other things, that a certain official was sanctioned for not complying with a decree banning the domestic slaughter of pigs. The historian reads the decree, clicks on a map of the Kladno District, and looks up how common penalties for domestic slaughter were in Kladno, who, where, and how they were sanctioned, the biography of the official, whether the sender of the letter was also involved, etc.

This comprises a likely scenario of what the work of historians – or history aficionados – will look like in the coming years. Weeks and months of visiting archives and the tens of hours of sifting and sorting through primary sources and creating databases will be replaced by sophisticated computer work.

The first such complex software is being developed by researchers from the Masaryk Institute and Archives of the CAS, who presented it at the prestigious three-day annual conference of the German Studies Association in Montréal, attended by over 1,000 researchers from all over the world. “The software is able to interconnect digitized documents like decrees, laws, official protocols, and letters from one region, specifically the Kladno District, and its tools will include a machine translation engine as well as artificial intelligence services that can help clarify a topic and evaluate specific queries,” explains Jan Vondráček, who led the team of Czech historians and American developers from MIT involved in the project, funded by the MISTI–Czech Seed Fund program, which gave rise to the unique software.

An archive on your desktop

The software, which is expected to be released in 2024, combines several hot-topic technological innovations: machine learning, which allows a computer to read a hand-written manuscript and convert it into text, and ChatGPT artificial intelligence, which the historians can “disconnect” from the internet and “feed” its database with historical documents. The next step is to sort the data and create a visualisation of it – for instance, in map form.

“In the future, we would like to add a machine translation engine to make the data accessible in all languages, and the big dream is to integrate the software into a network of other archives and libraries around the world,” Vondráček explains.

An additional goal, which the researcher intends to focus on in a future project, is to improve the software across the board and eventually digitise sources originating from other regions as well. “For me, the Information Age, and especially the advancement of ChatGPT, is really the equivalent of the Industrial Revolution that we currently find ourselves in. It will allow us historians to work in a completely different style and format – to ask different questions and learn the answers radically faster,” the researcher emphasises.

Opportunities for students and researchers

The MISTI–Czech Seed Fund programme is a collaboration between MIT, the Czech Academy of Sciences, the Institute of Organic Chemistry and Biochemistry of the CAS, and its subsidiary, IOCB Tech. It is currently funded by the IOCB Tech Endowment Fund.

Jan Vondráček’s project Institutional Data Shaping Digital History was the fund’s first collaborative project involved in the social sciences. The Czech side digitized the archive while MIT developed the code. The team worked together on the basis of regular online meetings and a one-week research fellowship held at MIT.

The main research interests of Dr. phil. Jan Vondráček are the history of World War II and the history of everyday life and political administration of the Protectorate of Bohemia and Moravia. He is a graduate of the Technical University of Darmstadt, the Ludwig Maximilian University of Munich, and the Chemnitz University of Technology. He has been working at the Masaryk Institute and Archives of the CAS since 2019, focusing on new approaches in digital humanities by means of digitization, database creation, and digital analysis.

More information:

Dr. phil. Dr. Jan Vondráček
Masaryk Institute and Archives of the CAS

Contacts for Media

Markéta Růžičková
Public Relations Manager
 +420 777 970 812

Eliška Zvolánková
 +420 739 535 007

Martina Spěváčková
+420 733 697 112

Logos of the CAS for download

Annual Reports of the CAS

Press Releases