Mathematical Tools for Big Data
Information
Teachers: Eugénio Rocha
Duration: One semester
Work hours: 162
Contact hours: 45
ECTS: 6
Scientific area: Mathematics
Objectives
Technological advances that have taken place in recent decades have provided a capacity never before possible for storing and making information available. This curricular unit has as its main objective the contact with some mathematical techniques dedicated to the treatment of these large volumes of data, whose study is vital to human activity.
Learning Outcomes
After finishing this course, students should be able to use numerical optimization methods in large-scale problems, apply dimensionality reduction techniques and aggregation methodologies, deal with concepts of information and entropy in inference, analyze large-dimensional graphs , as well as using computational learning techniques (machine learning) suitable for large-scale problems. Students are also expected to be able to interpret and communicate technical results in any intercultural environment.
Grading
Grading will consist of the presentation and discussion, in class, of a work (50%) and the completion of a written exam (50%).
Methodology
Classes take place in rooms equiped with computers. Special emphasis will be given to the presentation of techniques, algorithms and software (MATLAB, R and Python). Autonomy in solving proposed problems will be strongly encouraged.
Syllabus
- Numerical optimization methods in large problems
- Dimensionality reduction
- Aggregation procedures on homogeneous and non-homogeneous data
- Info-Metrics (information, maximum entropy and inference)
- Regression, classification and clustering algorithms for large problems
- Large-dimensional graph analysis (connectivity, centrality, paths).
Recommended reading
- Aggarwal, C. C. and Reddy, C. K. (2013). Data Clustering: algorithms and applications. CRC Press, Chapman and Hall
- Golan, A. (2017). Foundations of Info-Metrics: Modeling and Inference with Imperfect Information. Oxford University Press, 2017
- Hastie, T., Tibshirani, R. and Wainwright, M. (2015). Statistical Learning with Sparsity: The Lasso and Generalizations. CRC Press, Taylor & Francis Group
- Newman, M. (2010). Networks: An Introduction. Oxford University Press
- Nocedal, J. and Wright, S. J. (2006). Numerical Optimization. Springer, 2nd Edition
- Suthaharan, S. (2016). Machine Learning Models and Algorithms for Big Data Classification. Springer, 1st Edition.