Mathematical Tools for Big Data

Information

Teachers: Eugénio Rocha

Duration: One semester

Work hours: 162

Contact hours: 45

ECTS: 6

Scientific area: Mathematics

Objectives

Technological advances that have taken place in recent decades have provided a capacity never before possible for storing and making information available. This curricular unit has as its main objective the contact with some mathematical techniques dedicated to the treatment of these large volumes of data, whose study is vital to human activity.

Learning Outcomes

After finishing this course, students should be able to use numerical optimization methods in large-scale problems, apply dimensionality reduction techniques and aggregation methodologies, deal with concepts of information and entropy in inference, analyze large-dimensional graphs , as well as using computational learning techniques (machine learning) suitable for large-scale problems. Students are also expected to be able to interpret and communicate technical results in any intercultural environment.

Grading

Grading will consist of the presentation and discussion, in class, of a work (50%) and the completion of a written exam (50%).

Methodology

Classes take place in rooms equiped with computers. Special emphasis will be given to the presentation of techniques, algorithms and software (MATLAB, R and Python). Autonomy in solving proposed problems will be strongly encouraged.

Syllabus

  • Numerical optimization methods in large problems
  • Dimensionality reduction
  • Aggregation procedures on homogeneous and non-homogeneous data
  • Info-Metrics (information, maximum entropy and inference)
  • Regression, classification and clustering algorithms for large problems
  • Large-dimensional graph analysis (connectivity, centrality, paths).

Recommended reading

  • Aggarwal, C. C. and Reddy, C. K. (2013). Data Clustering: algorithms and applications. CRC Press, Chapman and Hall
  • Golan, A. (2017). Foundations of Info-Metrics: Modeling and Inference with Imperfect Information. Oxford University Press, 2017
  • Hastie, T., Tibshirani, R. and Wainwright, M. (2015). Statistical Learning with Sparsity: The Lasso and Generalizations. CRC Press, Taylor & Francis Group
  • Newman, M. (2010). Networks: An Introduction. Oxford University Press
  • Nocedal, J. and Wright, S. J. (2006). Numerical Optimization. Springer, 2nd Edition
  • Suthaharan, S. (2016). Machine Learning Models and Algorithms for Big Data Classification. Springer, 1st Edition.