Lectures will mainly consist in live sessions dealing with using the Cloud for the purposes of data analysis and machine learning. These sessions will be carried out by the lecturer and replicated, with suggested variations, by the students, on available equipment. Laboratory practice aims at enabling students to refine their understanding of the technologies presented and acquire autonomous operating skills. As a framework and guidance, lecture notes will be displayed during lectures and shared with students. Notes will provide a precise record of the material presented, as well as pointers to the required reference technical documentation.
Should teaching be carried out in mixed mode or remotely, it may be necessary to introduce changes with respect to previous statements, in line with the programme planned and outlined in the syllabus. Learning assessment may also be carried out on line, should the conditions require it.
Fundamentals of data analysis and machine learning. Basic skills in using a desktop computing environment and the Web.
Attending classes is not mandatory but strongly recommended.
This course aims at enabling the data scientist to put into practice on the public Cloud principles and methodologies learnt in courses concerned with data storage, processing, analysis, and machine learning. Indeed, in these areas, present day industrial and enterprise applications typically require storage volumes, computing power and bandwidth at a scale impossible or (even for large organizations) impractical to attain with proprietary equipment. In realistic Data Science scenarios, it is therefore hardly avoidable for the data scientist to resort to the Cloud, i.e. storage and computing services offered by third-party providers over the public Internet, with a pay-per-use cost model.
In a nutshell, quoting reference [2], we may say that: “The Cloud turbocharges Data Science” .
Google Cloud is the platform of choice, for its ease of use and free availability to students.
SQL on Google Cloud and BigQuery: performing structured queries on BigQuery and Cloud SQL. Importing data from CSV files.
Data acquisition into Google Cloud: downloading selected data from a large public data set over the internet, and processing it with Google App Engine.
Google Cloud Dataflow: processing a real-time, real-world data set, and storing the results on the cloud. Case study: real-time geospatial data.
Visualization with Google Data Studio: Visualizing data stored in Google Cloud SQL. Visualizing Real Time Geospatial Data.
Google Datalab for Data Analysis: loading text data into Google BigQuery; rapid exploratory data analysis with Google Cloud Datalab notebooks.
Google Cloud AI Platform: using Google AI Platform to perform queries and present the data.
Evaluating a Data Model: partitioning a data set into a training set and a test set; evaluating various predictive models.
Machine Learning with Spark on Google Cloud Dataproc. Implementing logistic regression through machine learning on Apache Spark running on a Google Cloud Dataproc. Developing a model from a multivariable dataset.
Machine Learning with TensorFlow: developing and evaluating prediction models.
MapReduce e Hadoop on Google Cloud: exploiting parallelism and machine clusters.
Lecture notes will be made available through the Studium portal.
Argomenti | Riferimenti testi | |
1 | Google Cloud (GC): Performing structured queries on BigQuery | Lecture notes |
2 | GC: Performing structured queries on Cloud SQL | Lecture notes |
3 | GC: Importing big data from CSV files | Lecture notes |
4 | Downloading large public data sets to GC | Lecture notes |
5 | Processing data with the Google App Engine | Lecture notes |
6 | GC Dataflow: processing a real-time, real-world data set | Lecture notes |
7 | Case study: real-time geospatial data on GC | Lecture notes |
8 | GC Data Studio: Visualizing data from Google Cloud SQL | Lecture notes |
9 | GC Datalab: Data Analysis and Google BigQuery | Lecture notes |
10 | GC Datalab notebooks for rapid exploratory data analysis | Lecture notes |
11 | GC AI Platform: queries and data presentation | Lecture notes |
12 | Partitioning a data set into a training set and a test set | Lecture notes |
13 | Predictive models and their evaluation | Lecture notes |
14 | Machine Learning (ML) with Spark on GC | Lecture notes |
15 | ML with Spark on GC: implementing logistic regression | Lecture notes |
16 | ML with Spark on GC: Developing a model from a multivariable dataset | Lecture notes |
17 | ML with TensorFlow on GC: developing and evaluating predictive models | Lecture notes |
18 | ML with TensorFlow on GC: developing and evaluating predictive models | Lecture notes |
19 | MapReduce e Hadoop on Google Cloud: exploiting parallelism and machine clusters | Lecture notes |
Laboratory session individually performed by the student vis-à-vis the lecturer. The student will be required to carry out the Cloud-based procedures demonstrated during the lectures, as well as to discuss their significance, and critically assess their outcomes. Learning assessment may also be carried out on line, should the conditions require it.
See material available on the Studium portal.