Advanced Digital Textual Analysis Tutorial

Learning Goals

  • Understand how to clean and parse textual data set from the Archive Scraping Tutorial
  • Understand what stop words, tokenizing words, and lemmatization are
  • Understand how to extract and visualize the archive’s top ten keywords using a digital tool
  • Understand how to use a simple machine learning text generation

Digital Textual Analysis

Just as the advanced image analysis tutorial used a computer to see patterns in the visual data housed in the archive, so to the Textual Analysis tutorial relies on computers to help pick up patterns in the words stored in the archive. In this tutorial you will learn how to do textual analysis on the data scraped in the Archive Scraping Tutorial. In the activity you will learn how to extract the top ten subject keywords from the archives metadata.

Run the code yourself in Google Colaboratory! All cells must be run in order, or use the run-all button in the run menu on the top left! This activity includes a machine learning model training aspect which can take around 40 mins.