Tutorial in Text & Data Mining w. Python
Date:
DESCRIPTION
The recent explosion in digitized and digital text-media is rapidly changing the evidential basis for the humanities. While the humanities used to be the principal scientific consumers of text-based data, the majority of text analysis is now performed by ‘machines’ outside traditional humanistic domains. Text-Analytics applies automated and data-intensive techniques in order to extract useful knowledge from from large collections of linguistic data. In this PhD course, the participant will acquire experience with two major machine learning paradigms (supervised and unsupervised learning) in order to answer research questions fundamental to the humanities: can we classify texts by genres, periods and status and how do surface structures reveal latent semantic properties. The workshop consists of a series of hands-on tutorials with Python combined with useful explanations and illustrations through use-cases. Programming experience is not a requirement, but participants are should to prepare by installing Python and completing three introductory tutorials available on-line.
KEYWORDS
TEXT ANALYTICS
, TEXT DATA MINING
, DIGITAL HUMANITIES
, HUMANITIES COMPUTING
, CULTURE ANALYTICS
PROGRAM
DAY 1: Text Classification and Supervised Learning
Time | Content | Instructor |
---|---|---|
09:00-10:00 | Text Analytics #1 | KLN |
10:00-11:00 | Text Classification | KLN |
11:00-12:00 | Representation | KLN |
12:00-13:00 | Lunch | |
13:00-14:00 | Validation | KLN |
14:00-15:00 | Optimization | KLN |
15:00-16:00 | Free Play | KLN |
DAY 2: Thematic Analysis and Unsupervised Learning
Time | Content | Instructor |
---|---|---|
09:00-10:00 | Text Analytics #2 | KLN |
10:00-11:00 | Topic Modeling | KLN |
10:11-12:00 | Preparation | KLN |
12:00-13:00 | Lunch | KLN |
13:00-14:00 | Training | KLN |
14:00-15:00 | Application | KLN |
15:00-16:00 | Free Play | KLN |