Basis

0004/01/01 Reading time: about 7 mins

Optimization method

  • Stochestic gradient descent (standard)
  • Corss-entropy descent
  • Quadratic descent

Cost functions

  • Cross-entropy loss (used with sklearn-MLP Classifier)
  • Quadratic cost
  • Stochastic gradient cost

Unsupervised Learning

Clustering

  • K-means clustering
    • Assign a centroid to each group
    • K is the desired number of clusters
  • Pricinpal Component Analysis (PCA)
    • Find central axes of data
  • Singular Value Decomposition (SVD)
    • Break down data into composite parts

Anomaly Detection

  • K-means clustering
    • Search for outliers in means

Supervised Learning

Regression

Make predictions on a continuous set of real numbers from a set of input features

  • Linear Regression
    • Ex) Support Vector Machine (SVM)
    • Ex) Ridge/Lasso regressors (岭回归)
  • Tree-based regressors
    • Ex) Random Forest Regressor
  • Gradient-boosting algorithms
    • Ex) ADA Boost

Classification

Classify a group of events or objects into two or more groups

  • Linear Discriminators
    • Ex) Support Vector Machine (SVM)
    • Ex) Linear Deiscriminant Analysis
  • Tree-based classifiers
    • Ex) Random Forest Classifier
  • Neural Networks
    • Ex) Multi-layer Perceptron

Model Selection

Across Different Types of Models

  • Supervised vs. Unsupervised Learning
    • Regression vs. Classification (Supervised)
    • Clustering vs. Anomaly Detection (Unsupervised)
  • Methods: SVM, KNN, Linear Regressors, Neural Networks, Random Forests

Within a Specific Model Type

  • Hyperparameter tuning: Train/Test/Validation Split
  • Loss Function Selection and Regularization Techniques

Considerations:

  • Accuracy
  • Reliability
  • Speed
  • Explainaility (Siplicity)

Learning

Learning is finding out what weights and biaes lead to minimum cost, which is also a process of Backpropagation

  1. Feed Forward: Give input, give output, calculate cost
  2. Backpropagation: Work backwards, calculate gradients for each neuron, one layer at a time
  3. Update Weights: in the opposite direction of increasing “blame”. Decreases cost
  4. Repeat 1-3 until satisfied with cost

Jupyter Notebook: Simple gradient descent 实现了对梯度下降法的简单应用,同时研究了不同的学习速率和学习步数对学习效果的影响

Loss Functions

how bad the guesses are

MSE & SSE Sum Squared Error $SSE=\sum(y^*_i-y_i)^2$ Mean Squared Error $MSE=SSE/y.\text{size}$

Train/Gradient Descent

could be derivatives of loss function

e.g. $SSE=\sum(w_0+w_1x_i-y_i)^2$

then gradient \(G=[\frac{\partial SSE}{\partial w_0}, \frac{\partial SSE}{\partial w_1}] =\sum2(w_0+w_1x_i-y_i)[1,x_i]\)

Update Weights

通常会以随机的方式初始化 weights,然后根据 gradient 更新

# 首先 Normalize gradient
gradient = gradient/np.sqrt(sum([g**2 for g in gradient]))
# 再乘上学习率(0.0001-0.01)
gradient *= lr
# 最后更新到 weights 中
weights -= gradient

Document Information

Search

    Table of Contents