The main goal of the course is to understand principles of structured programming and learning of techniques for text processing, through the use of the Python language.
The teaching will be carried out through lectures in which the course contents will be presented, also by means of programming demonstrations. The teaching involves the application of concepts through the use of Python language. Besides the hours of frontal lessons, during Laboratory module, students will have the opportunity to perfect their preparation on structured programming in Python. In addition, a learning platform will be available to practice during the study hours and to evaluate what is learned in class. The same platform provides a valid tool for exam preparation.
The course is divided in two main modules.
In the first module the student will learn the basics of programming and Python language. He will learn the use of the basic constructs, functions, recursion, and the main data structures available in Python. The second module of the course aims to provide essential tools for the processing of text and natural language, such as the division into tokens, the stemming and the use of the WordNet dictionary. Part-of-speech tagging, chunking and named-entity-recognition skills will also be acquired. The student will learn how to perform each task required for the analysis and interpretation of the text automatically through the NTLK library and the Python programming language. Then, the same processing pipeline will be performed through the SpaCy library, comparing the operation and purpose of the two libraries. With the latter, two simple systems for displaying results will also be tested.
The reference text for the first module is " Pensare in Python: come pensare da informatico " by Allen B. Downey, published by O'Reilly Media (year 2019, the first 213 of 259 pages will be carried out). The book is an ideal tool to learn the basics of programming, using the Python language. Specifically, the text introduces the language gradually, starting with the basic concepts of programming and then moving on to functions, recursion, data structures and object-oriented design.
The reference text for the second module is "Python 3 Text Processing with NLTK 3 Cookbook" (in English, 228 of 279 pages will be carried out), by Jacob Perkins, published by Packt Publishing (year 2014). The text introduces the student to the essential techniques of text and natural language processing. The second part of the second module, relating Natutal Language Processing through SpaCy, will be covered following lecture notes provided by the teacher (28 pages) and the official SpaCy documentation available online (https://spacy.io/usage).
Please remember that in compliance with art 171 L22.04.1941, n. 633 and its amendments, it is illegal to copy entire books or journals, only 15% of their content can be copied.
For further information on sanctions and regulations concerning photocopying please refer to the regulations on copyright (Linee Guida sulla Gestione dei Diritti d’Autore) provided by AIDRO - Associazione Italiana per i Diritti di Riproduzione delle opere dell’ingegno (the Italian Association on Copyright).
All the books listed in the programs can be consulted in the Library.