In 2018 the Dutch Research Council (NWO) allocated the Huygens Institute funding for the REPUBLIC project, which will provide access to over half a million pages with handwritten and printed resolutions of the Dutch States General (1576-1796). Those decisions do not only reflect the ‘invention’ of the Dutch Republic as a new state and its development into a great power. They also provide insight into the day-to-day work of early modern politicians. Here we would like to introduce the first version of our intended web-edition, which provides access to the printed decisions of the eighteenth century.
Please bear in mind that this is meant as a prototype, a model, which is meant as an intermediate step to the full edition of all the resolutions of the Dutch States General. Exactly because we want to find out what works and what could be improved in a web application, the REPUBLIC team members have decided to build a prototype in an early stage of the project. The corpus of printed resolutions from the eighteenth century was available for this first practice in the production of a searchable online edition.
Together with the National Archives, who provided the scans of the resolutions, the REPUBLIC-teams have worked on the creation of the prototype since March 2019. Several challenges that come with providing semi-automatic access to vast source corpora were tackled during the process. Now it is your turn to test this first result.
- Includes the printed resolutions from the period between 1703-1796. Prior to 1703 the resolutions are only available in handwritten form. In 1796, the States General ceased to exist. At this stage of the project we limited ourselves to publishing the printed resolutions because we already had results of the Optical Character Recognition (OCR). Handwritten Text Recognition (HTR), which we use for the older resolutions, requires more time and effort.
- Includes the ordinary (‘ordinaris’) resolutions. There are also secret resolutions (‘geheime’) for this period. However, these were not meant for distribution and therefore not printed. These resolutions are not yet included in the prototype.
- The volumes are chronologically ordered and contain a calendar year. The States General met almost every day, with the exception of Sundays and official holidays, unless there were urgent reasons to meet anyway.
- Contains limited search methods. The corpus can be searched on session days, dates, resolutions and attendance lists, in which the participating deputies and the current president are included.
- Provides access to the scans of the pages of the volumes that contain resolutions along with the recognized text. The recognized text formats are indicated on both the scans and the recognized text.
- Contains text recognized with OCR which, because of the size of the corpus, was not manually corrected. Although the quality of the OCR is acceptable (especially compared to similar corpora), the text contains more errors than digital born or manually corrected texts. Automatically recognizing Old-Dutch text is more complicated than the recognition of modern texts due to the variation in letters (especially the ‘long s’ which closely resembles the ‘f’) and in spelling. Person names relatively often contain flaws that were introduced due to the use of OCR. Thus, while searching in the prototype, one must take these problems into account. We are continuously improving the quality of the OCR, even after publication of the prototype. We expect that the quality of the text will improve even further in the future.
- Has limited search-possibilities. The printed resolutions have been published according to the (logical) structure of the text:
- attendance lists, including delegates and a president
- session days including date
- contain search facets in the interface, which enable users to search the resolutions on the metadata-level.
The printed volumes consist of resolutions and an index meant to make the text more accessible.
- The text of the resolutions is searchable on content.
- The indices have been digitized and OCR-ed, but are not further accessible.
Please note that the prototype entails a snapshot of the REPUBLIC project. In the project we continue with the publication of both the printed and the hand-written resolutions. For the time being, the prototype will not be updated with this improved data.
Further development of REPUBLIC
Publication of the full web-edition of REPUBLIC is foreseen at the end of the project, which continues until march 2024. Before we reach this point, a lot of tasks remain. We still work on the following:
- Publication of the printed text is improved because the indices created by the former griffier are made accessible and connected to the text.
- Publication of person names, locations, organizations and possibly other entities in the resolutions.
- Publication of the manuscripts of the resolutions, which means that all the resolutions between 1576 and 1703 and the secret resolutions afterwards. For this, volunteers of the crowdsourcing project Goetgevonden! work on the improvement of text recognition.
- Expanding the publication which was developed for the prototype to the recognized handwritten material. The (ambitious) goal is to reach an as uniform as possible publication for the entire corpus.
- With the aid of an analysis of the formulaic language use of the resolutions, we try to get a grip on the logical lay-out of the text. We are already able to distinguish session days and attendance lists and resolutions, but we try to further refine this approach to the level of the structure of the resolutions: what is the opening, what is the decision and what is discussed?
- We develop a suitable user-application for the data. For visualization and presentation of the texts and scans this application will probably be based on the prototype. However, the data that still needs to be published is so extensive that for search- and navigation possibilities separate extensions are needed along and on top of the visualization of the texts and scans.