Data Analysis

26 Resources
  220+ Hours
  4493 Learners
START LEARNING NOW

Introduction

The Data Analysis learning path provides a short but intensive introduction to the field of data analysis. The path is divided into three parts. In part 1, we learn general programming practices (software design, version control) and tools (python, sql, unix, and Git). In part 2, we learn R and focus more narrowly on data analysis, studying statistical techniques, machine learning, and presentation of findings. Part 3 includes a choice of elective topics: visualization, social network analysis, and big data (Hadoop and MapReduce). Choose from any or all of them to enrich your understanding and skills.

The course consists of free online lectures, homework assignments, quizzes and projects, and will take around 350-400 hours. There will also be a capstone project at the end that you can use to demonstrate your skills to potential employers or for a school application. This is an intensive path with a lot of material to learn, but at the end, you will know all the tools and techniques you need to start analyzing data: how to manipulate data, apply statistical and machine learning techniques, and analyze and visualize results. You should also be prepared to begin a career in data analysis.

Why Learn This

Data analysis is both a fascinating topic in itself and a tool that lets you make powerful inferences and understand the world around you. The techniques you will learn will help you accurately characterize data using models and use those models to make inferences and decisions. If you enjoy applying math and analytical thinking to practical problems, this course is for you.

There has also been a large spike in demand for data analysts, so learning analytics can be extremely advantageous from a career perspective as well. Being able to find trends in large datasets will help you know how to make sound decisions, for an organization/business, in life, etc.

What will I learn

This path teaches some of the most important techniques and tools necessary to manipulate and analyze large datasets and summarize conclusions. This includes:

  • exploratory and predictive statistics
  • basic computer programming in Python
  • more advanced computer program design
  • an introduction to algorithms
  • R for statistical analysis
  • practical machine learning techniques
  • using Unix and Git
  • data visualization best practices

Finally, there are three optional elective tracks: Visualizing Data, Analyzing Social Networks, and Big Data: Hadoop and MapReduce. What won’t we learn? Analytics and data science are enormous and burgeoning fields with many areas of study, and we will not have time to cover them all. In the interest of getting you analyzing real datasets as quickly as possible, the emphasis in this path is on practical applications as opposed to theory. Furthermore, while significant math is required in this path, we will not be covering the theoretical basis for statistics or machine learning. Also, the focus will be on analysis and manipulation of data rather than setup and storage. Some advanced statistics topics such as time series and Bayesian methods will not be studied in this path. Finally, some specific topics such as natural language processing and computer vision will not be covered. However, students who finish this path should be well-prepared to study those areas.

Who is this for

This course is intended for people with little to no background in data analysis and computer programming. An introductory statistics class and an introductory programming class will both come in handy, but are not necessary. A basic familiarity with calculus and general computer competency are assumed.

Author
Claudia Gold
Data Scientist -- Airbnb, ClassDojo & MoveOn.org.

Claudia graduated from MIT in 2007 and has worked on data-related problems ever since, ranging from automatically tracking owls in the forest at the MIT Media Lab to being the second analyst at Airbnb. In her free time, she enjoys traveling to far-away places and has been to about 30 countries.