Who am I?
Who are you?
What is your background?
Do you have experience with data analysis?
What’s your attitude towards statistics?
Attendance is mandatory! (I don’t make the rules :) )
You may miss one session without giving reasons (recommendation: don’t waste it early!)
If you miss additional sessions, please write an email with an explanation (further proof may be required, e.g. doctor’s certificate)
The course will take place in person
We will use these textbooks (Open Educational Resources - freely available online, also linked in WueCampus):
For the R part: Fundamentals of Quantitative Analysis
(Additional Resource: R for Data Science)
Find a dataset that can answer a question you are interested in
preprocess/wrangle it,
analyze the data,
and write a short (min. 2-page) report!
The up-to-date version can be found at https://spressi.github.io/biostats/
Date | Topic | Reading | Lecturer | Project Deadlines |
---|---|---|---|---|
16.10. | General & R Intro | ST21: 1-3, QF: 1-3 | Reutter | R & RStudio installed |
23.10. | Probability | ST21: 4, QF: 4-6 | Hildebrandt | |
30.10. | Data Wrangling | ST21: 4, QF: 4-6 | Reutter | |
06.11. | Data Visualization | QF: 7 | Reutter | |
13.11. | Sampling | ST21: 7-8, QF: 8 | Hildebrandt | Dataset |
20.11. | Penance Day | |||
27.11. | Probability & Sampling in R | ST21: 7-8, QF: 8 | Hildebrandt | |
04.12. | Hypothesis Testing | ST21: 9-10 | Reutter | |
11.12. | Comparing Means & Categories | ST21: 12, 15 | Reutter | Research Question & Hypotheses |
18.12. | Exercises (t-Tests, Chi²) | ST21: 12, 15 | Reutter | |
25.12. & 01.01. | Christmas & New Year’s | |||
08.01. | (General) Linear Models | ST21: 12-13 | Hildebrandt | First Analysis Ideas |
15.01. | Exercises (GLM) | ST21: 12-13 | Hildebrandt | |
22.01. | Linear Mixed Models | ST21: 14 | Hildebrandt | Analysis (with R scripts) |
29.01. | R Markdown for Reports | Reutter | ||
05.02. | Troubleshooting Your Report | both | Send Questions via Email | |
12.02. | Report (with R Markdown) |
You’re doing a research master!
We live in an increasingly data-centric world
Facts & data literacy matter more than ever!
Fake News, “Lying with stats”, Reproducibility Crisis
Being able to call bullshit (https://www.callingbullshit.org/)
“I only believe in statistics that I doctored myself” ― Winston S. Churchill
“a systematic way of thinking about how we describe the world and use data [to] make decisions and predictions, all in the context of the inherent uncertainty that exists in the real world.” (Poldrack, Preface of ST21)
“Statistical thinking is a way of understanding a complex world by describing it in relatively simple terms that nonetheless capture essential aspects of its structure or function, and that also provide us some idea of how uncertain we are about that knowledge.” (Poldrack, Chapter 1)
Describe patterns by summarizing/breaking down data (“descriptive statistics”)
Decide whether one thing is better than another, given the uncertainty (“inferential statistics”)
Predict how other people would “behave” (generalize to new observations)
Learning from data: Update our beliefs
Aggregation: How to summarize the data to draw meaningful conclusions?
Uncertainty: Probabilistic evidence
Sampling from the population: Which people etc. do we select?
Correlation does not imply causation… but is a hint!
Example: Smoking = less risk for Parkinson’s disease? (Godwin-Austen et al., 1982; Chen et al., 2010)
→ confounding factors?
e.g., individual dopaminergic activity ⇒ addiction & motor function
Randomized Controlled Trials (RCT) as the solution?
qualitative vs. quantitative
qualitative?
quantitative?
Data types
character/string: text (qualitative)
factors/categories
types of numbers (quantitative)
binary: 0 or 1, TRUE or FALSE (logical)
integers: whole numbers
real numbers: decimals/fractions
discrete vs. continuous
discrete: finite set of particular values (0 or 1, scale from 1 to 10)
continuous: real numbers that fall into particular range (e.g., brain activity, visual analoge scale)
What data type is eye color?
a collection of data
usually organized into rows and columns (like an excel spreadsheet)
rows: participants/animals/cells…
columns: variables!
table cells = unique observations of variables per participant etc.
NHANES dataset
What is being measured?
constructs vs. proxies: need to be well-defined! (Difficult)
measurement error
random: e.g., variation in reaction times of same participant across trials
systematic: e.g., miscalibrated eye-tracking device
Do we have a “gold standard” to compare the measurement to?
Correlation of a measurement with “itself”
Internal reliability (consistency)
Test-retest reliability (stability)
Inter-rater reliability (agreement)
Correlation with other variables can’t be higher than reliability (cf., Wilmer et al., 2012, Table 1)!
Are we measuring the construct we’re interested in?
Face validity: Does it intuitively make sense? First reality check!
Construct validity
convergent validity: Related to similar measures that should measure the same construct
divergent validity: Is it unrelated to other measures?
Predictive validity: Is it predictive of other outcomes? (e.g., intelligence & job success)
Reliability & Validity
Throwing away (some of the) information!
extract the quintessence of the data (important for forming models)
make predictions
Counts, frequencies, percentages, averages
Learning objective today:
Why is statistics important?
What are data?
What is reliability and validity?
Next:
R session: “Intro to R”
Installation tutorial: https://learningstatisticswithr-bookdown.netlify.app/intror
You can also already install Rtools, which will be needed later: https://cran.r-project.org/bin/windows/Rtools/
Theory: Models