Supervised vs Unsupervised Machine Learning

Supervised vs Unsupervised Machine Learning

1 What is machine learning?

Field of study that gives computers the ability learn without being explicitly pogrammed – Arthur Samuel

2 Supervised Learning: Regression Algorithms

#Supervised_Learning Given inputs x, predict output y

Input (x) Output (Y) Application
email spam? (0/1) spam filtering
audio text transcripts speech recognition
English Spanish machine translation
ad, user click? (0/1) online advertising
image, radar position of other cars self-driving car
image of phone defect? (0/1) visual inspection

img/supervised.learning.regression.housing.price.prediction.png

3 Supervised Learning: Classification

img/supervised.classification.breast.cancer.png

img/supervised.learning.classification.malignant.png

img/classification.multiple.inputs.png

  Regression Classification
Predicts numbers categories
Outputs infinite limited

4 Unsupervised Learning: Clustering, Anomaly Detection

img/unsupervised.clusturing.png

img/clustering.dna.microarray.png

img/clustering.grouping.customers.png

#Unsupervised_Learning Data only comes with inputs x, but not output labels y. Algorithm has to find structure in the data

  • #Clustering Group similar data points together
  • #Anomaly_Detection Find unusual data points
  • #Dimensionality_Reduction Compress data using fewer numbers

Question:

Of the following examples, which would you address using an unsupervised learning algorithm?
(Check all that apply.)

Ans1,3

Lab 01: Python and Jupyter Notebooks

Learn basics of Jupyter Local: Jupyter Notebook || Coursera Jupyter

Quiz: Supervised vs Unsupervised Learning

Which are the two common types of supervised learning (choose two)

Which of these is a type of unsupervised learning?

Ans - 1,2 - 1

Regression Model

1 Linear Regression Model

#Linear_Regression_Model => a Supervised Learning Model that simply puts a line through a dataset

  • most commonly used model

img/01.01.house.size.and.price.png

| Terminology | | | ———–: | :————————— | | Training Set | data used to train the model | | x | input or #feature | | y | output variable or #target | | m | number of training examples | | (x,y) | single training example | | (xⁱ,yⁱ) | i-th training example | f is a linear function with one variable

img/01.01.linear.regression.png

Lab 02: Model representation

Coursera Jupyter: Model representation || Local Jupyter

In this lab you will learn:  - Linear regression builds a model which establishes a relationship between features and targets

| | Python | Description | | :—————— | :———- | :———————————————————————————————————————– | | $\mathbf{x}$ | x_train | Training Example feature values (in this lab - Size (1000 sqft)) | | $\mathbf{y}$ | y_train | Training Example targets (in this lab Price (1000s of dollars)) | | $x^{(i)}$,$y^{(i)}$ | x_i,y_i | $i_{th}$Training Example | | m | m | Number of training examples | | $w$ | w | parameter: #weight (slope) | | $b$ | b | parameter: #bias (y-intersect) | | $f_{w,b}(x^{(i)})$ | f_wb | #Model_Function The result of the model evaluation at $x^{(i)}$ parameterized by w,b $f_{w,b}(x^{(i)}) = wx^{(i)}+b$ | Code

Goal is to find the w, b to give you best fit line through your data

import numpy as np                       # https://numpy.org/ for mathematical calculations
import matplotlib.pyplot as plt          # https://matplotlib.org package for 2D graphs
plt.style.use('./deeplearning.mplstyle') # matplotlib style sheet

x_train = np.array([1.0, 2.0])      # x_train is the input (size in 1000 square feet)
y_train = np.array([300.0, 500.0])  # y_train is target (price in 1000s of dollars)

def compute_model_output(x, w, b):      # Computes the prediction of a linear model
  m = x_train.shape[0]                  # `.shape[0]` returns the number of rows 
  f_wb = np.zeros(m)                    # return 1-dim array of zeros
  for i in range(m):
    f_wb[i] = w * x[i] + b              # compute the model output
  return f_wb

w = 200; b = 100;       # w = weight, b = bias; These are the params you can modify
tmp_f_wb = compute_model_output(x_train, w, b)

plt.plot(x_train, tmp_f_wb, c='blue',label='Our Prediction') # Plot our prediction
plt.scatter(x_train, y_train, marker='x', c='red',label='Actual Values') 
plt.title("Housing Prices")                                  # Set the title
plt.ylabel('Price (in 1000s of dollars)')                    # Set the y-axis label
plt.xlabel('Size (1000 sqft)')                               # Set the x-axis label
plt.legend(); plt.show()

4 Cost function formula

img/01.01.parameters.png

img/01.01.cost.function.png

Question:

Which of these parameters of the model that can be adjusted? -$w$ and $b$ -$f_{w,b}$ -$w$ only, because we should choose $b = 0$ -$\hat{y}$

Ans1

5 Cost Function Intuition

To get a sense of how to minimize $J$ we can use a simplified model

    simplified
model $f_{w,b}(x) = wx + b$ $f_{w}(x) = wx$ by setting $b=0$
parameters $w, b$ $w$
cost function $J_{(w,b)} = \frac{1}{2m} \sum\limits_{i=1}^{m} (f_{w,b}(x^{(i)}) - y^{(i)})^{2}$ $J_{(w)} = \frac{1}{2m} \sum\limits_{i=1}^{m} (f_{w}(x^{(i)}) - y^{(i)})^{2}$
goal we want to minimize$J_{(w,b)}$ we want to minimize$J_{(w)}$

img/01.01.04.simplified.png

img/01.01.04.w.is.1.png

img/01.01.04.w.is.0.5.png

img/01.01.04.w.is.0.png

img/01.01.04.negative.w.png

img/01.01.04.J.png

💡 The goal of linear regression is to find the values of $w,b$ that allows us to minimize $J_{(w,b)}$

6 Visualizing the cost function

   
model $f_{w,b}(x) = wx + b $``$
parameters $w$,$b$
cost function $J_{(w,b)} = \frac{1}{2m} \sum\limits_{i=1}^{m} (f_{w,b}(x^{(i)}) - y^{(i)})^{2}$
goal minimize $J_{(w,b)}$

img

img

img

img

img

7 Visualization examples

img

img

img

In the next lab, you can click on different points on the contour to view the cost function on the graph

#Gradient_Descent is an algorithm to train linear regression and other complex models

Lab 03: Cost function

Coursera Jupyter: Cost Function || Local: Cost Function

Quiz: Regression Model

  1. Which of the following are the inputs, or features, that are fed into the model and with which the model is expected to make a prediction?
    • $m$
    • $w$ and $b$
    • $(x,y)$
    • $x$
  2. For linear regression, if you find parameters $w$ and $b$ so that $J_{(w,b)}$ is very close to zero, what can you conclude?
    • The selected values of the parameters $w, b$ cause the algorithm to fit the training set really well
    • This is never possible. There must be a bug in the code
    • The selected values of the parameters $w, b$ cause the algorithm to fit the training set really poorly
Ans4, 1

Train the model with gradient descent

1 Gradient descent

Want a systematic way to find values of $w,b$ that allows us to easily find smallest $J$

#Gradient_Descent is an algorithm used for any function, not just in linear regression but also in advanced neural network models

#local_minima may not be the true lowest point

2 Implementing gradient descent

3 Gradient descent intuition

\[\begin{aligned} \text{repeat until convergence \{} \\ &w = w - \alpha \frac{\partial}{\partial w} J_{(w,b)}\\ &b = b - \alpha \frac{\partial}{\partial b} J_{(w,b)}\\ \} \end{aligned}\]

4 Learning rate

5 Gradient descent for linear regression

\[\begin{align} \frac{\partial}{\partial w} J_{(w,b)} \\ &= \frac{\partial}{\partial w} \frac{1}{2m} \sum\limits_{i=1}^{m} (f_{w,b}(x^{(i)} - y^{(i)})^2 \\ &= \frac{\partial}{\partial w} \frac{1}{2m} \sum\limits_{i=1}^{m} (wx^{(i)} + b - y^{(i)})^2 \\ &= \frac{1}{2m} \sum\limits_{i=1}^{m} (wx^{(i)} + b - y^{(i)}) 2x^{(i)} \\ &= \frac{1}{m} \sum\limits_{i=1}^{m} (wx^{(i)} + b - y^{(i)}) x^{(i)} \\ &= \frac{1}{m} \sum\limits_{i=1}^{m} ((f_{w,b}(x^{(i)}) - y^{(i)}) x^{(i)} \end{align}\] \[\begin{align} \frac{\partial}{\partial b} J_{(w,b)} \\ &= \frac{\partial}{\partial b} \frac{1}{2m} \sum\limits_{i=1}^{m} (f_{w,b}(x^{(i)} - y^{(i)})^2 \\ &= \frac{\partial}{\partial b} \frac{1}{2m} \sum\limits_{i=1}^{m} (wx^{(i)} + b - y^{(i)})^2 \\ &= \frac{1}{2m} \sum\limits_{i=1}^{m} (wx^{(i)} + b - y^{(i)}) 2 \\ &= \frac{1}{m} \sum\limits_{i=1}^{m} (wx^{(i)} + b - y^{(i)}) \\ &= \frac{1}{m} \sum\limits_{i=1}^{m} ((f_{w,b}(x^{(i)}) - y^{(i)}) \end{align}\]

6 Running gradient descent

Lab 04: Gradient descent

Coursera Jupyter: Gradient descent || Local: Gradient Descent

Quiz: Train the Model with Gradient Descent

  1. Gradient descent is an algorithm for finding values of parameters w and b that minimize the cost function $J$. \(\begin{aligned} \text{repeat until convergence \{} \\ &w = w - \alpha \frac{\partial}{\partial w} J_{(w,b)}\\ &b = b - \alpha \frac{\partial}{\partial b} J_{(w,b)}\\ \} \end{aligned}\) testing code When $\frac{\partial}{\partial w} J_{(w,b)}$ is a negative number, what happens to w after one update step?
    • It is not possible to tell is w will increase or decrease
    • w increases
    • w stays the same
    • w decreases
  2. For linear regression, what is the update step for parameter b?
    • $b = b - \alpha \frac{1}{m} \sum\limits_{i=1}^{m} ((f_{w,b}(x^{(i)}) - y^{(i)})$
    • $b = b - \alpha \frac{1}{m} \sum\limits_{i=1}^{m} ((f_{w,b}(x^{(i)}) - y^{(i)}) x^{(i)}$
Ans2, 2