Field of study that gives computers the ability learn without being explicitly pogrammed – Arthur Samuel
#Supervised_Learning Given inputs x, predict output y
Input (x) | Output (Y) | Application |
---|---|---|
spam? (0/1) | spam filtering | |
audio | text transcripts | speech recognition |
English | Spanish | machine translation |
ad, user | click? (0/1) | online advertising |
image, radar | position of other cars | self-driving car |
image of phone | defect? (0/1) | visual inspection |
Regression | Classification | |
---|---|---|
Predicts | numbers | categories |
Outputs | infinite | limited |
#Unsupervised_Learning Data only comes with inputs x, but not output labels y. Algorithm has to find structure in the data
- #Clustering Group similar data points together
- #Anomaly_Detection Find unusual data points
- #Dimensionality_Reduction Compress data using fewer numbers
Of the following examples, which would you address using an unsupervised learning algorithm?
(Check all that apply.)
Learn basics of Jupyter Local: Jupyter Notebook || Coursera Jupyter
Which are the two common types of supervised learning (choose two)
Which of these is a type of unsupervised learning?
#Linear_Regression_Model => a Supervised Learning Model that simply puts a line through a dataset
- most commonly used model
| Terminology | |
| ———–: | :————————— |
| Training Set | data used to train the model |
| x | input or #feature |
| y | output variable or #target |
| m | number of training examples |
| (x,y) | single training example |
| (xⁱ,yⁱ) | i-th training example |
f
is a linear function with one variable
Coursera Jupyter: Model representation || Local Jupyter
In this lab you will learn: - Linear regression builds a model which establishes a relationship between features and targets
| | Python | Description |
| :—————— | :———- | :———————————————————————————————————————– |
| $\mathbf{x}$ | x_train
| Training Example feature values (in this lab - Size (1000 sqft)) |
| $\mathbf{y}$ | y_train
| Training Example targets (in this lab Price (1000s of dollars)) |
| $x^{(i)}$,$y^{(i)}$ | x_i
,y_i
| $i_{th}$Training Example |
| m | m
| Number of training examples |
| $w$ | w
| parameter: #weight (slope) |
| $b$ | b
| parameter: #bias (y-intersect) |
| $f_{w,b}(x^{(i)})$ | f_wb
| #Model_Function The result of the model evaluation at $x^{(i)}$ parameterized by w,b
$f_{w,b}(x^{(i)}) = wx^{(i)}+b$ |
Code
NumPy
, a popular library for scientific computingMatplotlib
, a popular library for plotting data
scatter()
to plot on a graph
marker
for symbol to usec
for colorGoal is to find the w, b
to give you best fit line through your data
import numpy as np # https://numpy.org/ for mathematical calculations
import matplotlib.pyplot as plt # https://matplotlib.org package for 2D graphs
plt.style.use('./deeplearning.mplstyle') # matplotlib style sheet
x_train = np.array([1.0, 2.0]) # x_train is the input (size in 1000 square feet)
y_train = np.array([300.0, 500.0]) # y_train is target (price in 1000s of dollars)
def compute_model_output(x, w, b): # Computes the prediction of a linear model
m = x_train.shape[0] # `.shape[0]` returns the number of rows
f_wb = np.zeros(m) # return 1-dim array of zeros
for i in range(m):
f_wb[i] = w * x[i] + b # compute the model output
return f_wb
w = 200; b = 100; # w = weight, b = bias; These are the params you can modify
tmp_f_wb = compute_model_output(x_train, w, b)
plt.plot(x_train, tmp_f_wb, c='blue',label='Our Prediction') # Plot our prediction
plt.scatter(x_train, y_train, marker='x', c='red',label='Actual Values')
plt.title("Housing Prices") # Set the title
plt.ylabel('Price (in 1000s of dollars)') # Set the y-axis label
plt.xlabel('Size (1000 sqft)') # Set the x-axis label
plt.legend(); plt.show()
w
& b
to find the best fit linew
is the #slope and b
is the #y-interceptCost function
takes predicted $\hat{y}$ and compares to y
error
= $\hat{y} - y$m
is the number of training examples2m
makes the calculation neater $\frac{1}{2m} \sum\limits_{i=1}^{m} (\hat{y}^{(i)} - y^{(i)})^{2}$w,b
where $\hat{y}^{(i)}$ is close to $y^{(i)}$ for all $(x^{(i)}, y^{(i)})$Which of these parameters of the model that can be adjusted? -$w$ and $b$ -$f_{w,b}$ -$w$ only, because we should choose $b = 0$ -$\hat{y}$
To get a sense of how to minimize $J$ we can use a simplified model
simplified | ||
---|---|---|
model | $f_{w,b}(x) = wx + b$ | $f_{w}(x) = wx$ by setting $b=0$ |
parameters | $w, b$ | $w$ |
cost function | $J_{(w,b)} = \frac{1}{2m} \sum\limits_{i=1}^{m} (f_{w,b}(x^{(i)}) - y^{(i)})^{2}$ | $J_{(w)} = \frac{1}{2m} \sum\limits_{i=1}^{m} (f_{w}(x^{(i)}) - y^{(i)})^{2}$ |
goal | we want to minimize$J_{(w,b)}$ | we want to minimize$J_{(w)}$ |
w
and get a graph (on the right)💡 The goal of linear regression is to find the values of $w,b$ that allows us to minimize $J_{(w,b)}$
model | $f_{w,b}(x) = wx + b $``$ |
parameters | $w$,$b$ |
cost function | $J_{(w,b)} = \frac{1}{2m} \sum\limits_{i=1}^{m} (f_{w,b}(x^{(i)}) - y^{(i)})^{2}$ |
goal | minimize $J_{(w,b)}$ |
J
vs w
in 2-dimensionsb
then it’s 3-dimensionalJ
is the heightcontour plot
J
for given w,b
J
in 2-DJ
In the next lab, you can click on different points on the contour to view the cost function on the graph
#Gradient_Descent is an algorithm to train linear regression and other complex models
Coursera Jupyter: Cost Function || Local: Cost Function
cost
function for linear regression with one variable.
# Cost Function
cost_sum = 0
for i in range(m):
f_wb = w * x[i] + b
cost = (f_wb - y[i]) ** 2
cost_sum = cost_sum + cost
total_cost = (1 / (2 * m)) * cost_sum
Want a systematic way to find values of $w,b$ that allows us to easily find smallest $J$
#Gradient_Descent is an algorithm used for any function, not just in linear regression but also in advanced neural network models
#local_minima may not be the true lowest point
tmp_w
= $w - \alpha \frac{\partial}{\partial w} J_{(w,b)}$tmp_b
= $b - \alpha \frac{\partial}{\partial b} J_{(w,b)}$w = tmp_w && b = tmp_b
w,b
min w
we can simplify to just $J(w)$w
: $\min J(w)$b = 0
w
at a random locationslope = 0
and therefore $\frac{\partial}{\partial w} J(w) = 0$
w = w * 0
w
b
w = -0.1
, b = 900
1250 sq ft
, we can predict it should sell for $250k per the model
Coursera Jupyter: Gradient descent || Local: Gradient Descent
testing code
When $\frac{\partial}{\partial w} J_{(w,b)}$ is a negative number, what happens to w
after one update step?
w
will increase or decrease