Python is a great language for data analytics. It offers a lot (although not all) of the tools available in languages like Matlab and R. Unlike Matlab it's free. Unlike both these languages, it promotes good coding practices. But more importantly (for when you start working) it's a real engineering language that makes it easy for you to:
- Collect data from existing databases using the tools that are currently available in your company.
- Integrate your code and contributions into the rest of the codebase that your company will use.
You are already familiar with programming; you just have to get familiar with Python's syntax (if you aren't already) and the numerical and scientific tools available. We are using Python 2.
Action Item:
To get started with learning the syntax, we suggest starting with either the Google Tutorial or this Codecademy tutorial. Once you get the basics, this page gives you an amazing number of Python features. Features particularly important to data science include:
- Built-in Data Structures
- Lists, dictionaries, tuples, sets
- Adding and accessing data
- Unpacking tuples
- Strings
- How to write strings in Python
- Raw strings, byte strings, and unicode strings
- String formatting
- Functions
- Creating and calling functions
- Accessing variables in local and global scopes
- Lambda functions
- Keyword and variable number of arguments
- Conditional Statements
- Loops
- While and for loops
- List and dictionary comprehensions
- Classes
- Object oriented programming
- Difference between objects and classes
- Attributes and methods
- Class inheritance
- Reading and writing files
- Familiarity with the following packages:
Once you've gone through the above short tutorials, go to Project Euler and use Python to solve at least 10 problems. Try to choose problems that allow you to practice using dictionaries, list comprehensions, and other Pythonic features.