Python for Data Analysis

A 2-Day Seminar Taught by Jason Anastasopoulos, Ph.D.


Python is a premier language for modern data science and data analysis. It is a free, open-source language that has a simple, easy-to-understand syntax and an incredible range of data analysis and visualization libraries.

Python is rapidly becoming the preferred language of data scientists in both industry and academia. It’s used by Google, Facebook and other tech giants to perform data analysis and run machine learning algorithms that can handle hundreds of thousands of terabytes of data per day.

Python can be used for:

  • Storing and analyzing large and small datasets.
  • Web scraping and data collection using APIs.
  • Beautiful data visualization.
  • Natural language processing and text analysis.
  • General machine learning.
  • Deep learning.
  • Image analysis and much, much more…

How you will benefit from this seminar

In two days, this seminar combines both an introductory and intermediate course in Python. The goal is to get participants to fully understand many of the basic elements of Python and immediately apply them to practical data analysis and data collection problems.

By the end of this seminar you will be able to:

  • Program using Python (Jupyter) notebooks and IDEs.
  • Understand and use basic data analysis and visualization libraries such as NumPy, Pandas, Matplotlib, SciPy and statsmodels, among others.
  • Use basic data structures needed to do data analysis: variables, lists, loops, dictionaries, Boolean operators, functions.
  • Perform data analysis and basic statistical inference: GLMs, ANOVA, hypothesis testing.
  • Produce beautiful data visualizations.
  • Scrape and parse semi-structured data, including HTML, XML, and JSON.
  • Create and extract information from databases with Python.
  • Grasp the basics of unstructured data and natural language processing.

COMPUTING

This is a hands-on class that will involve at least two hours of structured and supervised assignments. To ensure that you are prepared, you must bring your own laptop with Anaconda Python installed.

Please download and install Anaconda Python for your operating system prior to attending the seminar here: https://www.anaconda.com/distribution/.

You should also know how to access the command prompt (Windows users) or the terminal (Mac users). We will briefly review how to access these in class, but it will save you time and effort if you come already knowing these basics. You can get resources on the internet that will help you get started with the Windows Command Prompt or the Mac Terminal.


WHO SHOULD ATTEND? 

This seminar is designed for anyone who wants to quickly and efficiently obtain a solid foundation in the Python language that will allow them to begin using the language for their research, data analysis or visualization needs.

No prior experience in Python is assumed. Basic knowledge of programming would be helpful. However, those at an intermediate or advanced level in other packages or languages can also benefit greatly from this course. 


LOCATION, Format, and MATERIALS

The class will meet from 9 am to 5 pm each day with a 1-hour lunch break at Temple University Center City, 1515 Market Street, Philadelphia, PA 19103. 

Participants receive a bound manual containing detailed lecture notes (with equations and graphics), examples of computer printout, and many other useful features. This book frees participants from the distracting task of note taking.


Registration and lodging

The fee of $995 includes all course materials. The early registration fee of $895 is available until April 28.

Refund Policy

If you cancel your registration at least two weeks before the course is scheduled to begin, you are entitled to a full refund (minus a processing fee of $50). 

Lodging Reservation Instructions

A block of guest rooms has been reserved at the Club Quarters Hotel, 1628 Chestnut Street, Philadelphia, PA at a special rate of $141 per night. This location is about a 5-minute walk to the seminar location. In order to make reservations, call 203-905-2100 during business hours and identify yourself by using group code STH527 or click here. For guaranteed rate and availability, you must reserve your room no later than Monday, April 27, 2020.

If you need to make reservations after the cut-off date, you may call Club Quarters directly and ask for the “Statistical Horizons” rate (do not use the code or mention a room block) and they will try to accommodate your request.


SEMINAR OUTLINE

Day 1: Python Basics

I. Getting started with Python:
     ○ Why Python?
     ○ Introduction to Anaconda Python.
     ○ Introduction to Python (Jupyter) notebooks.
     ○ Overview of basic libraries: NumPy, Pandas, Matplotlib, SciPy,
        statsmodels.
II. Python basics and data structures:
     ○ Variables: numbers, strings values, using variables.
     ○ Lists and loops: lists basics, simple loops, pythonic loops.
     ○ Logical statements in Python.
     ○ Using and creating dictionaries.
     ○ Creating functions.
III. Data analysis and statistical inference:
     ○ Handling arrays with Pandas and NumPy.
     ○ Basic data analysis:
         1. Summary statistics: mean, median, mode, variance and standard
         deviation.
         2. Hypothesis testing: t-tests, confidence intervals.
         3. Basic statistical models: linear regression, logistic regression, ANOVA.
     ○ Advanced data analysis: statistical inference and models with very large
         datasets.

Day 2: Advanced Python Data Analysis and Applications

I. Data visualization
     ○ Distributions: densities, box plots, histograms.
     ○ Correlations: scatterplots, line plots, heat maps.
     ○ Special topics: plotting maps.
II. Semi-structured data:
     ○ HTML and XML parsing.
     ○ JSON parsing.
III. Database creation and extraction:
     ○ Introduction to MongoDB.
     ○ Using MongoDB to store and retrieve data.
IV. Unstructured data and natural language processing:
     ○ Introduction to text processing in python: tokenization and text cleaning.
     ○ Preparing text data for analysis with the document-term matrix.
     ○ Sentiment analysis.