Businesses are drowning in dataForrester
but starving for insights
The first step of data science is mastering the computational foundations on which data science is built. We cover the fundamental topics of programming relevant for data science - including pandas, NumPy, SciPy, matplotlib, regular expressions, SQL, JSON, XML, checkpointing, and web scraping - that form the core libraries around handling structured and unstructured data in Python. Students gain practical experience manipulating messy, real-world data using these libraries. They also walk away with a firm understanding of tools like pip, git, IPython, Jupyter notebooks, pdb, and unit testing that leverage existing open source packages to accelerate data exploration, development, debugging, and collaboration.
Students will scrape picture captions off of a website that tracks the goings-on of New York’s socially well-to-do. By extracting names from these captions, they will assemble a graph of friendships amongst this crowd. Analysis of this graph will produce insights about the most connected New Yorkers.
Students will gain experience with Python-based data wrangling technologies to extract insights from a structured, web-API-based dataset. Students will learn the fundamental building blocks of data extraction, manipulation, and aggregation via Pandas DataFrames and good Python programming practice.
This module is currently part of our Data Science Fellowship.