Once grouped by angle, the code looks for any columns in the text. cross writing, vertical writing in the margins, etc…). This separates out text written horizontally from text written at odd angles (e.g. The first step of the process is to cluster the slope of the drawn lines of text. To find the “high density” areas you need to define two different parameters: the minimum density for a cluster, and the minimum number of points that should be considered a cluster. When multiple volunteers underline the same line of text on an image, the code groups those lines together to make one single line. For the Anti-Slavery Manuscripts project, these “high density” areas refer to places where volunteers have marked lines on an image. The goal of this process is to identify areas of “high density” in the data. In this blog post, I want to walk through how this aggregation process works and (just as importantly) point out the cases where it does not.īefore jumping in too deeply into the process, I just want to give a brief introduction on how data clustering works. This is also the code that draws the underline markings on the pages to indicate what lines of text are finished (grey lines), or what lines other volunteers have worked on (red lines in the collaborative workflow). In other words, I write the code that combines each of the volunteers’ transcriptions into one consensus transcription. My role for the Anti-Slavery Manuscripts project has been building the data aggregation code.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |