Generating a Semantically-Rich Data Set from the Ivan Doig Archive

Digital Scholarship | Overview | All

Through a new joint initiative known as DISC, the Library and ITC propose to generate a semantically rich data set and related tools to enable digital humanities research of the Ivan Doig Archive. DISC (Digital Infrastructure and Scholarly Communication) was launched by the Library and ITC in fall 2015; its mission is to provide researcher services at Montana State University that include research computing and storage, data science, scholarly communication, digitization and discovery.

Montana State University recently acquired the Ivan Doig archive through a partnership proposal submitted to Carol Doig by the Library and the College of Letters and Science. A Montana writer of renown, Ivan Doig completed 16 books before his life was cut short by multiple myeloma in 2015, and his archive is a rich treasure of the documentation of a prodigious writer and chronicler of his life. The proposal was successful largely due to the promise that the entire archive would be digitized and made publicly accessible as per the wishes of Ivan and Carol Doig. Previously that promise would have entailed scanning documents, photographs and other materials, and posting them on a website for viewing in a relatively static environment, but developments in digital scholarship and Semantic Web technologies now create the possibility for much more.

With the appropriate tools and data sets, humanities scholars are now able to query, analyze and visualize enormous corpora of information that have been properly enabled as machine-comprehendible data sets. Semantic Web metadata schemas now allow markup that establishes relationships between entities and their actions. Commonly known as “triples,” the relationships of subject, object and predicate are expressed in ways that facilitate machine learning and allow researchers to run sophisticated SPARQL queries to synthesize new theories or reveal previously undiscovered elements of the works generated over a writer’s lifetime. Combined with geo-tagging, the type of data set we are proposing could provide a rich source of high-profile content for generations of humanities researchers and could help DISC develop a path for similar data sets in other disciplines.