MATEMATICA E INFORMATICAData ScienceAnno accademico 2025/2026

9798801 - Statistical Learning
Modulo Supervised Learning

Docente: SALVATORE DANIELE TOMARCHIO

Risultati di apprendimento attesi

This course provides an introduction to supervised statistical learning (SSL) at a beginner level. On the background of the statistical methodologies presented in the first module (Unsupervised Statistical Learning), the course focuses on supervised learning techniques mainly in the framework of regression and classification approaches. The course content requires a basic understanding of calculus and linear algebra. All computing tasks will be performed using R.

Expected Learning Results

  1. Knowledge and understanding. The first “Supervised Learning” module primarily focuses on the fundamentals of supervised learning techniques within the framework of predictive approaches, utilizing regression and classification methods.
  2. Applying knowledge and understanding. On completion, the student will be able to: i) implement the main methods used in supervised learning; ii) summarize the main features of a dataset, and provide models from data properly.
  3. Making judgements. On completion, the student will be able to choose a suitable statistical model, apply it, and perform the analysis using statistical software.
  4. Communication skillsOn completion, the student will be able to present the results from the statistical analysis, and the conclusions can be drawn.
  5. Learning skillsOn completion, the student will be able to understand the structure of unsupervised learning.

General Synthetic Description

The course provides notions on supervised learning techniques mainly in the framework of regression and classification approaches for both parametric (Linear Regression, logistic regression, discriminant analysis) and non-parametric (tree-based methods) approaches.

Modalità di svolgimento dell'insegnamento

Classroom lectures.

Prerequisiti richiesti

Basic notions in probability and statistics, calculus and linear algebra, computing, principal component and cluster analysis.

Frequenza lezioni

Highly recommended.

Contenuti del corso

Basics of statistical learning (0.5 CFU). Estimation of Dependencies Based on Empirical Data. General Model of Learning from Examples. The problem of risk minimization. Assessing Model Accuracy, The Bias-Variance Trade Off. Regression vs Classification, Bayes Classifier, KNN. Learning Paradigms in Statistics Exploratory Data Analysis. Lab with R.

Linear regression (1.5 CFU). Introduction to Linear Regression Models. Estimating Model Parameters. Model Adequacy Checking. Assessing the Accuracy of the Coefficient Estimates. Properties of least-squares estimators. Confidence intervals and hypothesis testing. Multiple regression. Parameter estimation. Model assessment. Subset selection. Qualitative predictors. Extension of the linear model. Diagnostics for Multiple regression models. Lab with R. 

Resampling methods (0.5 CFU). Basics of Sampling. Validation set approach. Leave-One-Out Cross-Validation and k-Fold Cross-Validation. Bias-Variance Trade-Off for k-Fold Cross-Validation. Cross-Validation on Classification Problems Lab with R. 

Logistic regression (1.5 CFU). Logistic regression. Simple logistic regression. Parameter estimation and interpretation Diagnostic checking in Logistic Regression. Multiple logistic regression. Linear model selection criteria. Generalized Linear Models. Lab with R.

Generative Models for Classification (0.5 CFU). Linear discriminant analysis. Quadratic discriminant analysis. Naïve Bayes. Comparison among methods. Lab with R.

Tree-based methods (1.5 CFU). Basics of tree-based methods. Regression trees. Growing regression trees. Tree pruning. Classification trees Bagging, Random Forests Boosting. Lab with R.

Testi di riferimento

1. James G., Witten D., Hastie T., Tibshirani R. An Introduction to Statistical Learning with Applications in R, Springer 2021

2. Hastie T., Tibshirani R., Friedman J. The Elements of Statistical Learning, Springer 2008.

3. Teaching material that will be made available during the lessons on Studium.

Programmazione del corso

 ArgomentiRiferimenti testi
1Basics of statistical learning:  Estimation of dependencies Based on Empirical Data, General Model of Learning from Examples, The problem of risk minimization, Assessing Model Accuracy, The Bias-Variance Trade Off, Regression vs Classification, Bayes Classifier, KNN, Learning Paradigms in Statistics Exploratory Data Analysis, Lab with R.Slides, Textbook n.1, chap. 1-2.
2Linear regression: Introduction to linear regression Models, Estimating Model Parameters, Model Adequacy Checking, Assessing the Accuracy of the Coefficient Estimates, Properties of least-squares estimators, Confidence intervals and hypothesis testing, Multiple regression, Parameter estimation, Model assessment, Subset selection, Qualitative predictors, Extension of the linear model, Diagnostics for Multiple regression models, Lab with R.Slides, Textbook n.1, chapp. 3.
3Resampling methods: Basics of Sampling, Validation set approach, Leave-One-Out Cross-Validation and k-Fold Cross-Validation, Bias-Variance Trade-Off for k-Fold Cross-Validation, Cross-Validation on Classification Problems, Lab with R. Slides, Textbook n.1, chap. 5.
4Logistic regression: Logistic regression, Simple logistic regression, Parameter estimation and interpretation, Diagnostic checking in Logistic Regression, Multiple logistic regression, Linear model selection criteria, Generalized Linear Models, Lab with R. Slides, Textbook n.1, chapp. 4 and 6.
5Generative Models for Classification: Linear discriminant analysis, Quadratic discriminant analysis, Naïve Bayes, Comparison among methods, Lab with R. Slides, Textbook n.1, chap. 4.
6Tree-based methods: Basics of tree-based methods, Regression trees, Growing regression trees, Tree pruning, Classification trees Bagging, Random Forests Boosting. Lab with R. Slides, Textbook n.1, chap. 8

Verifica dell'apprendimento

Modalità di verifica dell'apprendimento

The purpose of the exam is to assess the attainment of the learning objectives. It involves an oral assessment featuring questions regarding the course content, as well as a discussion on a report detailing a practical data analysis conducted using the methodologies covered in the class and the R statistical software.

The final mark is based on the following criteria:

Esempi di domande e/o esercizi frequenti

Any topic covered in the course may be the subject of an exam question.

English version