Text-Mining the Great Unread

Under-graduate and graduate course, Aarhus University, School of Culture and Society, 2017

Three weeks intensive course about text-mining, natural language processing and information retrieval with Python and Unix.

Heading 1

Heading 2

Heading 3

Text Mining the Great Unread – An introduction to data-intensive methods and digital tools for analysis of texts in the humanities and social sciences\

Description of qualifications Key learning outcome
\begin{enumerate} \item Demonstrate an ability to delineate and critically evaluate research problems related to text analysis in terms of text mining solutions. This involves accessing previous solution to similar problems. \item Competences in design and implementation of knowledge discovery pipelines that solve research problems related to text analysis. This includes a basic understanding of the various pipeline elements and their dependencies (i.e., data selection, preprocessing and transformation followed by pattern discovery, data mining and evaluation) as well as implementation in open source software. \item Have an understanding of how to communicate projects and findings in accordance with academic and industrial standards. \end{enumerate} \bigskip

\noindent\textit{Contents}
\smallskip

\noindent Texts have always been essential to research and education in the humanities and social sciences. Close reading and detailed interpretation have traditionally constituted the standard approach to texts, that is, we combine qualitative methods and theoretically motivated arguments to a small textual corpus with the purpose of understanding the meaning of that corpus. However, the rapid expansion of digital full­text databases, increasingly faster computers, and advances in language technology are starting to impact the standard approach by offering a new digital and data­intensive paradigm in the study of text. Humanities and social science researchers are beginning to ask new types of questions and propose novel solutions to old problems by using faster and more efficient methods to collect, analyze, and visualize texts.
Many students (as well as researchers) experience a lack of digital competences when faced with text mining, that is, the application of tools and methods to analyze large sets of digitized texts. This is unfortunate because text mining 1) enables students to extract high quality information and acquire new knowledge in a fast and efficient manner; and 2) enhances the qualifications of students for a data­driven job market that is relying on the very same tools and methods. Finally, many tools and methods in text mining are in need of a thorough revision by academics who understand the importance of text meaning and context. Academia and industry alike are therefore in great need of students with text mining skills.
``Text Mining the Great Unread’’ is an introductory level course to text mining tools and methods in the humanities and social sciences, which will supply participants with sufficient knowledge and experience to develop and implement their own text mining projects. The core of the course is a series of hands­on workshops supplemented by lectures and tutorials by international researchers and industry experts. Through the course, participants will become familiar with text mining methods and software for analyzing and visualizing texts. Participants will learn how to write their own text mining application in R and Python. Through the workshops, participants will also be presented with a range of paradigmatic studies and go through explain research design, best practice, and reporting standards. It is possible to work with one’s own corpus, but historical and contemporary corpora (both works of fiction, historical documents and websites) are also available in class. Participants are not expected to have prior experience with text mining (i.e., programming, statistics, or visualization).\