Workflow of Data Analysis - Online Course
A 4-Day Livestream Seminar Taught by
Bianca Manago10:30am-12:30pm (convert to your local time)
1:30pm-3:00pm
Data management and analysis are difficult. Without a structured and systematic way to approach these processes, they are even harder. Additionally, scientific advancements and research progress depend on replication and reproducibility. Since there are dozens of decisions that go into data management, if they are not conducted in a way that facilitates sharing, they preclude replication.
This seminar is designed to teach researchers how to prepare and analyze data in a way that is both accurate and replicable. By following these principles, your data analytic projects will be both well-planned and executed. The scope of the seminar ranges from such broad topics as developing research plans to the detailed minutia of planning variable names.
Starting July 9, this seminar will be presented as a 4-day synchronous, livestream workshop via Zoom. Each day will feature two lecture sessions with hands-on exercises, separated by a 1-hour break. Live attendance is recommended for the best experience. But if you can’t join in real time, recordings will be available within 24 hours and can be accessed for four weeks after the seminar.
Closed captioning is available for all live and recorded sessions. Captions can be translated to a variety of languages including Spanish, Korean, and Italian. For more information, click here.
ECTS Equivalent Points: 1
More details about the course content
This seminar is for researchers who are trying to establish or improve their workflow. You do not need to be an expert programmer to benefit from this seminar; it is designed to be accessible to even very novice Stata and R users while still being useful to more advanced users. Lessons from this seminar balance ease of use with proper functioning, introducing researchers to useful tools, e.g., dual-pane browsers, macro programs, plain text editors, etc. For those who are already familiar with these tools, this seminar will teach you how to optimize them. Lessons from this seminar should make conducting research less painful, more efficient, more accurate, and reproducible.
This is a hands-on seminar with ample opportunities to plan and practice your workflow.
Some highlights include:
-
- Planning (analyses, sensitivity analyses, variable construction, etc.)
- Organizing files using a standardized directory structure
- Preserving data and findings
- Effectively documenting findings, data sources, cleaning methods
- Separating data management and analyses using dual workflow
- Writing robust script files
- Naming variables
- Labeling variables and values
- Creating research that is both reproducible and replicable
- Examining data quality
This seminar is for researchers who are trying to establish or improve their workflow. You do not need to be an expert programmer to benefit from this seminar; it is designed to be accessible to even very novice Stata and R users while still being useful to more advanced users. Lessons from this seminar balance ease of use with proper functioning, introducing researchers to useful tools, e.g., dual-pane browsers, macro programs, plain text editors, etc. For those who are already familiar with these tools, this seminar will teach you how to optimize them. Lessons from this seminar should make conducting research less painful, more efficient, more accurate, and reproducible.
This is a hands-on seminar with ample opportunities to plan and practice your workflow.
Some highlights include:
-
- Planning (analyses, sensitivity analyses, variable construction, etc.)
- Organizing files using a standardized directory structure
- Preserving data and findings
- Effectively documenting findings, data sources, cleaning methods
- Separating data management and analyses using dual workflow
- Writing robust script files
- Naming variables
- Labeling variables and values
- Creating research that is both reproducible and replicable
- Examining data quality
Computing
The empirical examples and exercises in this course will emphasize Stata and R. To fully benefit from the course, you should use your own computer with a recent version of Stata or R installed.
Stata users are encouraged to use Stata version 18, but earlier versions should also work for most exercises.
R users should also download and install RStudio, a front-end for R that makes it easier to work with. This software is free and available for Windows, Mac, and Linux platforms.
If you’d like to use Stata for this course but don’t yet have much experience with that package, we recommend following along with a “getting started” video like the one here before the seminar begins.
Seminar participants who are not yet ready to purchase Stata could take advantage of StataCorp’s 30-day software return policy.
If you’d like to use R for this course but don’t yet have much experience with that package, here are some excellent online resources for building your R skills.
The empirical examples and exercises in this course will emphasize Stata and R. To fully benefit from the course, you should use your own computer with a recent version of Stata or R installed.
Stata users are encouraged to use Stata version 18, but earlier versions should also work for most exercises.
R users should also download and install RStudio, a front-end for R that makes it easier to work with. This software is free and available for Windows, Mac, and Linux platforms.
If you’d like to use Stata for this course but don’t yet have much experience with that package, we recommend following along with a “getting started” video like the one here before the seminar begins.
Seminar participants who are not yet ready to purchase Stata could take advantage of StataCorp’s 30-day software return policy.
If you’d like to use R for this course but don’t yet have much experience with that package, here are some excellent online resources for building your R skills.
Who should register?
This course is for anyone who wants to improve the efficiency and accuracy of their data management, analysis, and presentation. You should have experience with data analysis, as well as familiarity with Stata or R.
This course is for anyone who wants to improve the efficiency and accuracy of their data management, analysis, and presentation. You should have experience with data analysis, as well as familiarity with Stata or R.
Seminar outline
Part 1: Introduction to workflow
-
- What is “workflow” (WF)?
- Why care about WF?
- WF and replication
- Steps in and principles of WF
Part 2: Plan, organize, document, and preserve
-
- Planning research projects in the:
- Large (overall questions, project checklist, and timeline)
- Middle (data cleaning, analyses, tables, and figures)
- Small (naming variables, naming files, value labels, and order of analyses/cleaning)
- Organizing files and folders
- Documentation
- Preserving data and preventing loss
- Replication
Part 3: Script files in R
-
- Strengths and weaknesses of R for workflow
- Dual workflow
- Robust script files
- Legible script files
- Automation in script files
Part 4: Cleaning, labeling, & missing data
-
- Naming and labeling variables
- Missing data
- Merging data
- Verifying data
Part 5: Analyzing & presenting findings
-
- Principles of data analysis
- Documenting provenance
- The posting principle
- Presenting findings
Part 6: Collaboration
-
- Key factors in collaboration
- Introducing workflow with co-authors
- Coordinating workflow with multiple authors
Part 1: Introduction to workflow
-
- What is “workflow” (WF)?
- Why care about WF?
- WF and replication
- Steps in and principles of WF
Part 2: Plan, organize, document, and preserve
-
- Planning research projects in the:
- Large (overall questions, project checklist, and timeline)
- Middle (data cleaning, analyses, tables, and figures)
- Small (naming variables, naming files, value labels, and order of analyses/cleaning)
- Organizing files and folders
- Documentation
- Preserving data and preventing loss
- Replication
- Planning research projects in the:
Part 3: Script files in R
-
- Strengths and weaknesses of R for workflow
- Dual workflow
- Robust script files
- Legible script files
- Automation in script files
Part 4: Cleaning, labeling, & missing data
-
- Naming and labeling variables
- Missing data
- Merging data
- Verifying data
Part 5: Analyzing & presenting findings
-
- Principles of data analysis
- Documenting provenance
- The posting principle
- Presenting findings
Part 6: Collaboration
-
- Key factors in collaboration
- Introducing workflow with co-authors
- Coordinating workflow with multiple authors
Payment information
The fee of $995 USD includes all course materials.
PayPal and all major credit cards are accepted.
Our Tax ID number is 26-4576270.
The fee of $995 USD includes all course materials.
PayPal and all major credit cards are accepted.
Our Tax ID number is 26-4576270.