This module covers the fundamental concepts of management database systems at scale as well as the analysis of existing benchmarks in different application scenarios. Topics include data models (relational); query languages (SQL); implementation techniques of database management systems even at large scale; noSQL databases, temporal, patial, Multimedia, and Deductive Databases. The module will also discuss available large scale multimedia datasets and how to query them as well as the state of the art techniques on how to create benchmarks for testing data analytics techniques. Principles on how to detect mistakes, biases, systematic errors, and other unexpected problems will be analyzed.
The learning objectives are:
Knowledge and understanding
Applying knowledge and understanding
This module covers the fundamental concepts of management and design of a business intelligence system. Topics include data models for building a data warehouse; ETL (extract, transform and load) functionalities; OLAP analysis; basic data mining; reporting and interactive dashboards, evolution of BI architectures on large datasets. The module covers techniques and algorithms for data visualization and exploratory analysis based on principles and techniques from graphic design, perceptual psychology and cognitive science. It is targeted to using visualization in their data analytics work.
The learning objectives are:
Knowledge and understanding
Applying knowledge and understanding
The main teaching methods are as follows:
The main teaching methods are as follows:
Basic programming skills.
Strongly recommended. Attending and actively participating in the classroom activities will contribute to the overall assessment of the final exam (see evaluation procedure section) .
Strongly recommended. Attending and actively participating in the classroom activities will contribute positively towards the overall assessment of the oral exam.
1) Models and Languages for Database Management (15 hours)
2) Querying and processing big data (10 hours)
3) Analyzing existing benchmarks (15 hours)
1. Introduction to Business Intelligence and Big Data Analytics (6 hours)
2. Data models for data warehouse (12 hours)
3. BI Architecture (12 hours)
4. Data Visualization (10 hours)
1. R. Elmasri and S. Navathe, Fundamentals of Database Systems, 7th Edition, Pearson, 2016.
2. Denny Lee, Tomasz Drabas, Learning Spark SQL, Packt Publishing, 2017
3. Instructor’s notes
4. Research papers (a list will be published on the page course)
Instructor's notes will be made available on the Studium web site
DATA BASE | ||
Argomenti | Riferimenti testi | |
1 | Introduction to databases: Concepts and Architecture | Book 1 - Chapter 1 and 2 |
2 | Relational Data Model | Book 1 - Chapter 5 |
3 | Basic SQL: data definition, SQL query, update instruction set. | Book 1 - Chapter 6 + Notes |
4 | Advanced SQL: Complex Queries, Triggers, Views | Book 1 - Chapter 7 + Notes |
5 | Query processing and optimization | Book 1 - Chapter 18 and 19 |
6 | NOSQL Databases and Big Data Storage Systems | Book 1 - Chapter 24 + Notes |
7 | Active, Temporal, Spatial, Multimedia, and Deductive Databases | Book 1 - Chapter 26 |
8 | Getting started with Spark SQL for Data Processing | Book 2 - Chapter 1 and 2 + Notes |
9 | Spark SQL for Data Exploration | Book 2 - Chapter 3 + Notes |
10 | Spark SQL for Learning Applications | Book 2 - Chapter 6 and 10 + Notes |
11 | Multimedia benchmarks for bias identification and analysis | Research paper list on course course |
The final exam consists of a) a lab test aiming at assessing the capabilities in writing SQL and NoSQL queries using also SPARK SQL, b) a final report critically analyzing, in terms of possible biases, an existing dataset. The exam is evaluated according to the ability to write SQL queries, derive aggregated information from data and to discover correctly biases in data and motivate solutions for solving such biases.
The vote on the database module will account for 50% of the total grade for the entire course.
The module also foresees intermediate tests for students attending the course. These tests include: a lab test on SQL writing, a presentation, two written reports analyzing existing benchmarks (whose list will be given at the beginning of the course). The choice of the datasets to analyze will be done during classes in order to avoid overlap.
The grading policy for intermediate tests is:
SQL test: 30%
Paper presentation: 30%
Reports: 30%
Attendance and Discussion during classes: 10%
The final exam consists of a) a project work aiming at assessing the capabilities in developing a BI system including the analysis and the visualization of relevant information, b) an oral exam that will consist of the discussion of the project work.
Assessment criteria include: depth of analysis, adequacy, quality and correctness of the proposed solutions to the project work, ability to justify and critically evaluate the adopted solutions, clarity.
The vote on the Big Data Analytics module will account for 50% of the total grade for the entire course.
Examples of questions and exercises will be available on the webpage course and on the Studium platform.
Examples of questions and exercises are available on the Studium platform