The Data Incubator

Businesses are drowning in data
but starving for insights
Forrester

Advanced Machine Learning

Summary

While machine learning on structured data lays an important foundation, a larger world of analytical opportunities becomes available through understanding advanced machine learning techniques and how to handle unstructured data. We explore techniques such as support vector machines, decision trees, random forests, neural nets, clustering, KMeans, expectation maximization, time series, and signal processing. Students come away with intuition about the suitability of different techniques for different problems. In addition to handling structured data, students directly apply these techniques to large volumes of real-world unstructured data, solving problems in natural language processing using Word2Vec, bag of words, feature hashing, and topic modeling.


Associated project work

Students will use NLP techniques to extract sentiment from English text. Working with 300MB of venue reviews, they will build a series of models to predict the star rating associated with a given review. They will also examine statistically improbable phrases that appear in the text corpus.


Students will examine methods of dealing with seasonality, as they build models to predict temperatures in several cities. The training data come from National Weather Service observations and must be cleaned before use.


This module is currently part of our Data Science Fellowship.

Prerequisites
Intermediate to advanced statistics
Intermediate linear algebra
Basic programming