Coursera - Machine Learning

C1 Supervised Machine Learning: Regression and Classification

C1_W1 Introduction to Machine Learning

C1_W1_M1 Supervised vs Unsuperverside Machine Learning

C1_W1_M1_1 What is machine learning?

Field of study that gives computers the ability learn without being explicitly pogrammed -- Arthur Samuel

Supervised vs Unsupervised

C1_W1_M1_2 Supervised Learning: Regression Algorithms

learn x to y or input to output mappings

Input (x) Output (Y) Application
email spam? (0/1) spam filtering
audio text transcripts speech recognition
Enlish Spanish machine translation
ad, user click? (0/1) online advertising
image, radar position of other cars self-driving car
image of phone defect? (0/1) visual inspection

img/supervised.learning.regression.housing.price.prediction.png

C1_W1_M1_3 Supervised Learning: Classification

Regression attempts to predict ininite possible results Classification predicts categories ie from limited possible results

img/supervised.classification.breast.cancer.png

img/supervised.learning.classification.malignant.png

img/classification.multiple.inputs.png

Supervised Learning

Regression Classification
Predicts numbers categories
Outputs infinite limited
C1_W1_M1_4 Unsupervised Learning

img/unsupervised.clusturing.png

img/clustering.dna.microarray.png

img/clustering.grouping.customers.png

Unsupervised Learning: Data only comes with inputs x, but not output labels y. Algorithm has to find structure in the data

  • Clustering: Group similar data points together
  • Anomaly Detection: Find unusual data points
  • Dimensionality Reduction: Compress data using fewer numbers
Question: Of the following examples, which would you address using an unsupervised learning algorithm?

(Check all that apply.)

Lab 01: Python and Jupyter Notebooks
Quiz: Supervised vs Unsupervised Learning

Which are the two common types of supervised learning (choose two)

Which of these is a type of unsupervised learning?

C1_W1_M2 Regression Model

C1_W1_M2_1 Linear regression model part 1

Linear Regression Model => a Supervised Learning Model that simply puts a line through a dataset

img/01.01.house.size.and.price.png

Terminology
Training Set data used to train the model
x input variable or feature
y output variable or target
m number of training examples
(x,y) single training example
(xⁱ,yⁱ) i-th training example
C1_W1_M2_2 Linear regression model part 2

f is a linear function with one variable

img/01.01.linear.regression.png

Lab 02: Model representation

Here is a summary of some of the notation you will encounter.

General Notation Python (if applicable) Description
$ a $ scalar, non bold
$ \mathbf{a} $ vector, bold
Regression
$ \mathbf{x} $ x_train Training Example feature values (in this lab - Size (1000 sqft))
$ \mathbf{y} $ y_train Training Example targets (in this lab Price (1000s of dollars))
$ x^{(i)}$, $y^{(i)} $ x_i, y_i $ i_{th} $ Training Example
m m Number of training examples
$ w $ w parameter: weight
$ b $ b parameter: bias
$ f_{w,b}(x^{(i)}) $ f_wb The result of the model evaluation at $ x^{(i)} $ parameterized by $ w,b $: $ f_{w,b}(x^{(i)}) = wx^{(i)}+b $

Code

C1_W1_M2_4 Cost function formula

img/01.01.parameters.png

img/01.01.cost.function.png

Question: Which of these parameters of the model that can be adjusted?
C1_W1_M2_5 Cost Function Intuition

To get a sense of how to minimize $ J $ we can use a simplified model

simplified
model $ f_{w,b}(x) = wx + b` $ f_{w}(x) = wx` by setting $ b=0 $
parameters $ w $, $ b $ $ w $
cost function $ J_{(w,b)} = \frac{1}{2m} \sum\limits_{i=1}^{m} (f_{w,b}(x^{(i)}) - y^{(i)})^{2} $ $ J_{(w)} = \frac{1}{2m} \sum\limits_{i=1}^{m} (f_{w}(x^{(i)}) - y^{(i)})^{2} $
goal we want to minimize $ J_{(w,b)} $ we want to minimize $ J_{(w)} $

img/01.01.04.simplified.png

img/01.01.04.w.is.1.png

img/01.01.04.w.is.0.5.png

img/01.01.04.w.is.0.png

img/01.01.04.negative.w.png

img/01.01.04.J.png

:bulb: The goal of linear regression is to find the values of $ w,b $ that allows us to minimize $ J_{(w,b)} $

C1_W1_M2_6 Visualizing the cost function
model $ f_{w,b}(x) = wx + b $``$
parameters $ w $, $ b $
cost function $ J_{(w,b)} = \frac{1}{2m} \sum\limits_{i=1}^{m} (f_{w,b}(x^{(i)}) - y^{(i)})^{2} $
goal minimize $ J_{(w,b)} $

img

img

img

img

img

C1_W1_M2_7 Visualization examples

Here are some examples of J

img

img

img

In the next lab, you can click on different points on the contour to view the cost function on the graph

Gradient Descent is an algorithm to train linear regression and other complex models

Lab 03: Cost function
Quiz: Regression Model
  1. Which of the following are the inputs, or features, that are fed into the model and with which the model is expected to make a prediciton?
  1. For linear regression, if you find parameters $ w $ and $ b $ so that $ J_{(w,b)} $ is very close to zero, what can you conclude?
Ans4, 1

C1_W1_M3 Train the model with gradient descent

C1_W1_M3_1 Gradient descent

Want a systematic way to find values of $ w,b $ that allows us to easily find smallest $ J $

Gradient Descent is an algorithm used for any function, not just in linear regression but also in advanced neural network models

local minima may not be the true lowest point

C1_W1_M3_2 Implementing gradient descent

C1_W1_M3_3 Gradient descent intuition

$$ \begin{aligned} \text{repeat until convergence {} \ &w = w - \alpha \frac{\partial}{\partial w} J_{(w,b)}\ &b = b - \alpha \frac{\partial}{\partial b} J_{(w,b)}\ } \end{aligned} $$

C1_W1_M3_4 Learning rate

C1_W1_M3_5 Gradient descent for linear regression

$$ \begin{align} \frac{\partial}{\partial w} J_{(w,b)} \ &= \frac{\partial}{\partial w} \frac{1}{2m} \sum\limits_{i=1}^{m} (f_{w,b}(x^{(i)} - y^{(i)})^2 \ &= \frac{\partial}{\partial w} \frac{1}{2m} \sum\limits_{i=1}^{m} (wx^{(i)} + b - y^{(i)})^2 \ &= \frac{1}{2m} \sum\limits_{i=1}^{m} (wx^{(i)} + b - y^{(i)}) 2x^{(i)} \ &= \frac{1}{m} \sum\limits_{i=1}^{m} (wx^{(i)} + b - y^{(i)}) x^{(i)} \ &= \frac{1}{m} \sum\limits_{i=1}^{m} ((f_{w,b}(x^{(i)}) - y^{(i)}) x^{(i)} \end{align} $$

$$ \begin{align} \frac{\partial}{\partial b} J_{(w,b)} \ &= \frac{\partial}{\partial b} \frac{1}{2m} \sum\limits_{i=1}^{m} (f_{w,b}(x^{(i)} - y^{(i)})^2 \ &= \frac{\partial}{\partial b} \frac{1}{2m} \sum\limits_{i=1}^{m} (wx^{(i)} + b - y^{(i)})^2 \ &= \frac{1}{2m} \sum\limits_{i=1}^{m} (wx^{(i)} + b - y^{(i)}) 2 \ &= \frac{1}{m} \sum\limits_{i=1}^{m} (wx^{(i)} + b - y^{(i)}) \ &= \frac{1}{m} \sum\limits_{i=1}^{m} ((f_{w,b}(x^{(i)}) - y^{(i)}) \end{align} $$

C1_W1_M3_6 Running gradient descent

Lab 04: Gradient descent
Quiz: Train the Model with Gradient Descent
  1. Gradient descent is an algorithm for finding values of parameters w and b that minimize the cost function J.

$$ \begin{aligned} \text{repeat until convergence {} \ &w = w - \alpha \frac{\partial}{\partial w} J_{(w,b)}\ &b = b - \alpha \frac{\partial}{\partial b} J_{(w,b)}\ } \end{aligned} $$

When $ \frac{\partial}{\partial w} J_{(w,b)} $ is a negative number, what happens to w after one update step?

  1. For linear regression, what is the update step for parameter b?
Ans2, 2

C1_W2: Regression with Multiple Input Variables

This week, you'll extend linear regression to handle multiple input features. You'll also learn some methods for improving your model's training and performance, such as vectorization, feature scaling, feature engineering and polynomial regression. At the end of the week, you'll get to practice implementing linear regression in code.

C1_W2_M1 Multiple Linear Regression

C1_W2_M1_1 Multiple features

Quiz

In the training set below (see slide: C1_W2_M1_1 Multiple features), what is $ x_{1}^{(4)} $?

Ans852
C1_W2_M1_2 Vectorization part 1

Learning to write vectorized code allows you to take advantage of modern numberical linear algebra libraries, as well as maybe GPU hardware.

C1_W2_M1_3 Vectorization part 2

How does vectorized algorithm works...

C1_W2_Lab01: Python Numpy Vectorization
C1_W2_M1_4 Gradient descent for multiple linear regression

C1_W2_Lab02: Muliple linear regression

Quiz: Multiple linear regression

  1. In the training set below, what is $ x_4^{(3)} $?
Size Rooms Floors Age Price
2104 5 1 45 460
1416 3 2 40 232
1534 3 2 30 315
852 2 1 36 178
  1. Which of the following are potential benefits of vectorization?
  1. To make a gradient descent converge about twice as fast, a technique that almost always works is to double the learning rate $ alpha $
Ans30, 4, F

C1_W2_M2 Gradient Descent in Practice

C1_W2_M2_01 Feature scaling part 1

:bulb: We can speed up gradient descent by scaling our features

C1_W2_M2_02 Feature scaling part 2

Quiz:

Which of the following is a valid step used during feature scaling? (see bedrooms vs size scatterplot)

Ans2

C1_W2_M2_03 Checking gradient descent for convergence

C1_W2_M2_04 Choosing the learning rate

C1_W2_M2_05 Optional Lab: Feature scaling and learning rate

C1_W2_M2_06 Feature engineering

C1_W2_M2_07 Polynomial regression

C1_W2_M2_08 Optional lab: Feature engineering and Polynomial regression

C1_W2_M2_09 Optional lab: Linear regression with scikit-learn

C1_W2_M2_10 Practice quiz: Gradient descent in practice

C1_W2_M2_11 Week 2 practice lab: Linear regression