18-899-K1   Data and Inference

Location: Africa

Units: 6

Semester Offered: Spring

Course description

This course will provide the expertise and skills for undertaking a range of practical applications such as descriptive statistics, exploratory data analysis and business intelligence. This will involve the process of collecting, cleaning, interpreting, transforming, exploring, analysing and modeling data with the goal of discovering useful information, communicating insights and supporting decision-making. Statistical hypothesis testing will be presented as a means of quantifying the confidence that can be assigned to the outcome of empirical investigations. The advantages of using visualization techniques to facilitate a better understanding of the data and to enable communication of the outcomes will be emphasised throughout.  Participants will obtain hands-on experience during project assignments that utilize publicly available datasets.

Learning objectives

The objective of this course is to give students an overview of the use and potential of data analysis in research, business and government.  For example, the task could be to seek an answer to a practical high-level question by applying data analysis techniques to real-world datasets. Participants will learn how to plan, design and implement an empirical research project using statistical and computational techniques.  They will learn how to test for statistically significant relationships and to build decision support tools. Practical skills will be strengthened by discussing project design, data collection, data quality and techniques to account for measurement errors, missing values and outliers. The statistical models employed will be primarily linear with normally distributed errors. The course will combine theoretical aspects of data analysis with visual examples and demonstrations of how to construct and utilize statistical models in practice.  There will be a strong emphasis on highlighting the challenges of working with real-world data and risks of relying on traditional assumptions.


  • After completing this course, students should be able to:
  • Design an empirical project in response to a research question
  • Identify and collect relevant data for undertaking the project
  • Load data into Matlab and organize it into a structured format
  • Visualize data, identify key characteristics and present a summary
  • Decide which models are likely to work best for a given applicatio
  • Estimate model parameters and avoid common mistakes
  • Produce diagnostic information for investigating model properties
  • Select an optimal model using statistical approaches
  • Understand model weaknesses and where assumptions could fail

Content details

  1. Measurement, data types, data collection, data cleaning
  2. Data manipulation, data exploration, visualization techniques
  3. Probability, statistical distributions, descriptive statistics
  4. Statistical hypothesis testing, quantifying confidence
  5. Time series analysis, autoregression, moving averages
  6. Linear regression, parameter estimation, model selection, evaluation


Background in quantitative discipline (Engineering, Computer Science, Physics, Mathematics, Statistics); Programming.

Faculty: Patrick McSharry