What | When |
---|---|
Introduction | Week 1 |
Data visualisation | Week 2 |
Model fit and cross validation | Week 3 |
Linear regression for data science | Week 4 |
Classification | Week 5 |
Interactive visualisations | Week 6 |
Tree-based methods | Week 7 |
Text mining | Week 8 |
Network Analysis | Week 9 |
Exam | Week 10 |
Part of the course material for aDAV is based on the courses in Applied Data Science master profile:
GSLS ADS profile: https://studyguidelifesciences.nl/profiles/applied-data-science
GSNS ADS profile: https://students.uu.nl/en/science/applied-data-science/profile-gsns
Machine Learning and Recommendations in Netflix
Netflix Movies and TV Shows dataset from Kaggle
Our emphasis is going to be on creating results that help you (the analyst) understand the data and make predictions.
Oversimplification:
Exploratory | Confirmatory |
---|---|
EDA | Hypothesis Testing |
Unsupervised learning | Supervised learning |
Correlation analysis | Causal modeling |
Exploratory | Confirmatory |
---|---|
EDA |
Hypothesis Testing |
Unsupervised learning | Supervised learning |
Correlation analysis | Causal modeling |
Describing interesting patterns: use graphs, summaries, to understand subgroups, detect anomalies (“outliers”), understand the data
Examples: boxplot, barplots, histograms, scatterplots…
Exploratory | Confirmatory |
---|---|
EDA |
Hypothesis Testing |
Unsupervised learning | Supervised learning |
Correlation analysis | Causal modeling |
Confirmatory analysis
You are a scientist and you want to determine if a new drug treatment is effective in reducing blood pressure compared to a placebo.
Theory testing, hypothesis: new treatment better than placebo
Analysis can be defined in advance: which outcome variables, how to sample from the population, which method?
Exploratory | Confirmatory |
---|---|
EDA | Hypothesis Testing |
Unsupervised learning. |
Supervised learning |
Correlation analysis | Causal modeling |
Clustering:
Exploratory | Confirmatory |
---|---|
EDA | Hypothesis Testing. |
Unsupervised learning |
Supervised learning |
Correlation analysis | Causal modeling |
Building a statistical model for predicting / estimating an output based on one or more inputs.
Most widely used machine learning methods are supervised
Classification: predict to which category an observation belongs (qualitative outcomes)
Classification: predict to which category an observation belongs (qualitative outcomes)
Regression: predict a quantitative outcome
Methods such as:
“The simple graph has brought more information to the data analyst’s mind than any other device.” — John Tukey
When thinking about important topics, such as whether to stay in school, it helps to know that more highly educated people tend to earn more, but also that there is no difference for top earners.
Florence Nightingale’s Rose Diagram, created in 1858, is a an example of data visualization that played a significant role for improvements in healthcare during the Crimean War.
(Eder, M. et al. (2016). Stylometry with R: A Package for Computational Text Analysis. The R Journal 8:1)
Scholars fight over who wrote various songs (Wilhelmus), treatises (Caesar), plays (Shakespeare), etc., with shifting arguments. By counting words, we can sometimes identify the most likely author of a text, and we can explain exactly why we think that is the right answer.
Data analysis..
In short, it makes sense to learn more about the background and ideas behind aDAV techniques!
Lab 1 is on Thursday
Let your teacher know if you can’t attend
Submit the homework before 11:00am which will account for weeks’ one pass/fail grade
Next lecture: next Tuesday about Data visualisation
Read the Course information and Preparation on the course website.
Have a nice day!