natural language processing with R

 

Overview

R is a powerful language for statistical computing. A prolific user community backs R with with an extensive library of packages. If you can think of it, somebody has already written a library for it. R also has a superb IDE, R Studio, facilitating reproducible research.

This course is for people with some R programming experience. Students may or may not have any NLP experience. We will introduce base R’s text manipulation capabilities, use some frameworks for analyzing text data in R, and implement some commonly-used NLP algorithms. The course culminates in a full NLP project performed in R.


What You Will Learn

This course introduces R's capabilities for text manipulation and natural language processing. NLP is an emerging field, and we will focus on performing some core NLP tasks with the help of R libraries.

  • Text manipulation with base R 
  • Document clustering 
  • Part of speech tagging 
  • Sentence parsing 
  • Named-entity recognition 
  • Topic modeling

Linguistic data can be large, so we will learn how to track use of system resources. We will also learn some strategies to optimize memory use and computation time.


Course Outline

  • Reproducible research: Setting up an R Studio Project and file structure.
  • Review of R, R Studio, and R markdown.
  • CRAN task view: Natural Language Processing.
  • Importing text documents using the scan function and enc2utf8.
  • Basic search and replace functionality with grep, grepl, gsub, and more.
  • Monitoring system resources
  • Basic counting of things tf/tfidf/word counts and word clouds.
  • Document clustering.
  • Sentence parsing/POS tagging/entity extraction with Apache Open NLP.
  • Build a quick and dirty document summarizer
  • Introduction to topic modeling
  • Document classification
  • Final project: construct a reproducible data analysis with R markdown and techniques covered.

After this course you will have used several methods and libraries for NLP. You will have completed a final project using several of these NLP techniques. You will have performed your work using reproducible research methods. This will allow you to revisit your work (and publish it on the web if you’d like).


Prerequisites

  • Bring a computer with wifi connection capability and a power cord.
  • Install the latest versions of 64-bit GNU-R and RStudio.
  • Update your 64-bit Java installation.
  • Install the following R libraries using install.packages('x') where x is: 
    • rJava
    • tm
    • e1071
    • lda
    • lsa
    • Matrix
    • igraph
    • slam
    • openNLP
    • wordcloud
    • rmarkdown
    • knitr

Instructor: Tommy Jones

tommyjones.jpg

Tommy is a statistician, mathematician, or data scientist; depending on the problem or audience. He holds an MS in mathematics and statistics from Georgetown University and a BA in economics from the College of William and Mary. He is the Director of Data Science at Impact Research, LLC.

Tommy has previously performed economic and statistical modeling and analysis at the Science and Technology Policy Institute, the Federal Reserve Board, and the Institute for the Theory and Practice of International Relations. He has expertise in regression analyses, time series modeling and forecasting, natural language processing, data mining, and other quantitative techniques.


DATE & TIME: 
SATURDAY, November 12th 2016
9AM-5PM 

LOCATION: 
4601 FAIRFAX DRIVE
ARLINGTON, VA 22203

REGULAR PRICE: $300
EARLY BIRD PRICE: $250
(EXPIRES 10/29/2016)


Buy a course bundle and save!

Two Workshop Bundle - Save 25%

Price

Bundle Price: $450
($225 per workshop)

Description

Attend any two workshops and save 25% off the regular price!
Perfect for those looking to skill-up in a couple data science topics.

Purchase

To purchase this bundle, go to our course bundle registration page.


Three Workshop Bundle - Save 33%

Price

Bundle Price: $600
($200 per workshop)

Description

Attend any three workshops and save 33% off the regular price!
Perfect for those who need a little more exposure to data science.

Purchase

To purchase this bundle, go to our course bundle registration page.


Four Workshop Bundle - Save 42%

Price

Bundle Price: $700
($175 per workshop)

Description

Attend any four workshops and save 42% off the regular price!
Perfect for those looking to gain exposure to several topics.

Purchase

To purchase this bundle, go to our course bundle registration page.