In the fall 2019 semester, my big data analytics and text mining course will be available on campus. I’ve taught this course online at SIS for the past several years, and starting in the fall of 2019 it will now be offered by the Kogod School of Business (KSB) on Wednesday evenings from 5:30 – 8:00 p.m.
In this course, you will be able to learn how to use the CRISP-DM approach to managing a text mining project, and use R, Rstudio, and a wide variety of free R packages to learn how to collect, analyze, and visualize large-scale textual data. In addition to learning the most popular R package for text mining (tm), you will also learn a powerful overall approach to data analytics in R using a collection of packages known as the tidyverse.
The course starts with developing a theoretical background to text mining, and understanding how it fits into the broader data science community. You will then learn how to manage text mining projects using RStudio, and write readable and repeatable code. We will review using R for statistical analysis, and then move to using R for text mining. Using various R packages, such as rvest, you will learn how to collect unstructured text-based data from websites. You will learn how to pre-process text for analysis. In addition you will learn the basics of social media mining and social network analysis in R (including using twitteR, Rtweet, and sna), including dictionary development and sentiment analysis.
After your mid-term project progress report, you will learn a variety of more advanced data analytics techniques, including unsupervised machine learning techniques such as clustering and topic modeling, and classification and predictive modeling. You will learn some advanced data visualization techniques, including how to use RShiny, RMarkdown, and Rpubs. You will end the semester by moving from our statistical “bag of words” approach to text mining, into Natural Language Processing (using openNLP and cleanNLP), and focus on using named entity recognition (NER). Finally, you will learn how to run R in a cluster computing environment by running R projects on American University’s High-Performance Computer (HPC) called Zorro. For more information, please write to me for the draft syllabus.