Introduction to Machine Learning

This course is part of the UBC Key Capabilities in Data Science Certificate Program.

This introductory course on machine learning for prediction focuses on regression and classification models. Understand how to map data to the correct model type, evaluate and select models, and communicate and interpret model results to help organizations reduce operating costs, optimize market strategies and identify trends.

By the end of the course, you’ll be able to:

  • describe supervised learning and identify what kind of tasks it is suitable for
  • explain common machine learning concepts such as classification and regression, training and testing, overfitting, parameters and hyperparameters, and the golden rule
  • choose a correct predictive modelling technique (e.g., regression or classification) given the available data
  • identify when and why to apply data pre-processing techniques such as scaling and one-hot encoding
  • describe at a high level how common machine learning algorithms work, including decision trees, and k-nearest neighbours
  • use Python and the scikit-learn package to develop an end-to-end supervised machine learning pipeline.

Course Outline

Week 1 and Week 2

  • Module 1: Machine Learning Technology
  • Module 2: Decision Trees
  • Module 3: Splitting, Cross-Validation and the Fundamental Tradeoff

Week 3 and Week 4

  • Module 4: Similarity-Based Approaches to Supervised Learning
  • Module 5: Preprocessing Numerical Features, Pipelines and Hyperparameter Optimization

Week 5 and Week 6

  • Module 6: Preprocessing Categorical Variables and Sklearn’s ColumnTransformer
  • Module 7: Assessment and Measurements
  • Module 8: Linear Models

Week 7

  • Final Project

How am I Assessed?

Each course module includes an auto-graded assignment. In weeks 4 and 7, you take an online 45-minute open-book quiz that covers materials from modules 1–4 and 5–8 respectively. At the end of the Week 7, you complete a final project using the skills you learned in the course. You must obtain an overall grade of 70% or higher, and complete the final project, to pass the course.

Expected Effort

Expect to spend 8–12 hours per week to complete weekly modules, auto-graded quizzes, open-book quizzes and the final project.

Technology Requirements

To take this course, and for the best experience, we recommend you have access to: 

  • an email account
  • a computer, laptop or tablet
  • the latest version of a web browser (or previous major version release)
  • a reliable internet connection.

For virtual office hours, you’ll also need: 

  • a video camera and microphone. 

One day before the start of your course, we’ll email you step-by-step instructions for accessing your course.

Textbooks

There are no textbooks for this course.

Requisites

Programming in Python for Data Science (FS011)

Note: You must complete the prerequisite course before starting any electives. When registering, please ensure the start date of the elective(s) you choose is after the end date of Programming in Python for Data Science.

Course Format

This course is 100% online and facilitator supported with weekly facilitator office hours. Course work is done independently and at your own pace within deadlines set by your facilitator. Log in anytime to your course to access the modules.

Course Virtual Office Hours (subject to change)
Mondays, 6:30-7:30pm Pacific Time
Wednesdays, 6:30-7:30pm Pacific Time

Join your facilitator and classmates by video conferencing to discuss course materials and assignments, receive feedback and ask questions.

Available Sessions

Course currently not available for registration.