After Easter, I attended three sessions of training on Transkribus, the Handwriting Recognition Software that works a bit like Optical Character Recognition to create automatic transcriptions of manuscripts. This is the last in a series of three posts about the training.
Our third session, we started by looking at our own case studies. I started out by describing the mass of documents that I need transcribed in order to pursue my Pilgrimage of Grace project. I’d tried the English Eagle model (all the handwriting recognition models have names – some are more interesting than others!) on a few of my documents and while it’s by no means perfect, it produces transcriptions that I can get the general sense of. What this means in practice is that I could, if I had enough credits, run my documents through the software and get a sense of which ones might actually be useful and worth a human being correcting the transcription. That in itself will save me weeks of work. The only fly in the ointment is that it would take quite a bit of money to deal with the number of documents that I have. The discussion we had was also really helpful, as the trainer was able to give me the name of someone who has a private Handwriting Recognition Model trained on secretary hand that they might be able to share with me to use on my documents.
After we had all shared our projects, we moved back on to the training itself. The next section looked at how to plan a project using the software. They posed a series of things to consider as we begin our projects, starting with identifying your goals and research questions. Once you have these, you can think about how the software can help and whether you are going to need to train language and field models in order to complete the project. You should then consider whether the automatic transcription will be enough and you can accept an error rate of around 8% because it doesn’t need to be perfect, or whether you will need to check and correct everything, for example, for an edition of the sources. Other questions include whether the layout important for the meaning, how many pages you will need to process and whether you need to train a model. Whether you would like to publish the transcriptions will have an effect on how you decide to proceed. Considering those issues help to plan the project and come up with responses to the following questions and points:
Planning
- Are the images already digitised?
- What will be the best
- Transcription model?
- Field/table model?
- Baseline model?
- Be familiar with the models and conduct tests across the volumes of your images.
Personnel
- How will you train your collaborators?
- Any volunteers on the project need to be guided and managed. You should consider how to keep them interested, perhaps by visiting the archives etc.
- Think about external collaborators and what is in it for them.
Resources
- How many pages do you need to transcribe?
- Subscription plans
- You need a scholar plan for field models, unless you can get one of the memberships for the organisation which will just need to buy the credits.
Finally, the training closed with some discussion of how we might use the software in our teaching. I think I am some way off that at the moment, although it is something I would like to think about for the future…
Leave a comment