Optimization method
- Stochestic gradient descent (standard)
- Corss-entropy descent
- Quadratic descent
Cost functions
- Cross-entropy loss (used with sklearn-MLP Classifier)
- Quadratic cost
- Stochastic gradient cost
Unsupervised Learning
Clustering
- K-means clustering
- Assign a centroid to each group
- K is the desired number of clusters
- Pricinpal Component Analysis (PCA)
- Find central axes of data
- Singular Value Decomposition (SVD)
- Break down data into composite parts
Anomaly Detection
- K-means clustering
- Search for outliers in means
Supervised Learning
Regression
Make predictions on a continuous set of real numbers from a set of input features
- Linear Regression
- Ex) Support Vector Machine (SVM)
- Ex) Ridge/Lasso regressors (岭回归)
- Tree-based regressors
- Ex) Random Forest Regressor
- Gradient-boosting algorithms
- Ex) ADA Boost
Classification
Classify a group of events or objects into two or more groups
- Linear Discriminators
- Ex) Support Vector Machine (SVM)
- Ex) Linear Deiscriminant Analysis
- Tree-based classifiers
- Ex) Random Forest Classifier
- Neural Networks
- Ex) Multi-layer Perceptron
Model Selection
Across Different Types of Models
- Supervised vs. Unsupervised Learning
- Regression vs. Classification (Supervised)
- Clustering vs. Anomaly Detection (Unsupervised)
- Methods: SVM, KNN, Linear Regressors, Neural Networks, Random Forests
Within a Specific Model Type
- Hyperparameter tuning: Train/Test/Validation Split
- Loss Function Selection and Regularization Techniques
Considerations:
- Accuracy
- Reliability
- Speed
- Explainaility (Siplicity)
Learning
Learning is finding out what weights and biaes lead to minimum cost, which is also a process of Backpropagation
- Feed Forward: Give input, give output, calculate cost
- Backpropagation: Work backwards, calculate gradients for each neuron, one layer at a time
- Update Weights: in the opposite direction of increasing “blame”. Decreases cost
- Repeat 1-3 until satisfied with cost
Jupyter Notebook: Simple gradient descent 实现了对梯度下降法的简单应用,同时研究了不同的学习速率和学习步数对学习效果的影响
Loss Functions
how bad the guesses are
MSE & SSE Sum Squared Error $SSE=\sum(y^*_i-y_i)^2$ Mean Squared Error $MSE=SSE/y.\text{size}$
Train/Gradient Descent
could be derivatives of loss function
e.g. $SSE=\sum(w_0+w_1x_i-y_i)^2$
then gradient \(G=[\frac{\partial SSE}{\partial w_0}, \frac{\partial SSE}{\partial w_1}] =\sum2(w_0+w_1x_i-y_i)[1,x_i]\)
Update Weights
通常会以随机的方式初始化 weights,然后根据 gradient 更新
# 首先 Normalize gradient
gradient = gradient/np.sqrt(sum([g**2 for g in gradient]))
# 再乘上学习率(0.0001-0.01)
gradient *= lr
# 最后更新到 weights 中
weights -= gradient
Document Information
- Author: Zeka Lee
- Link: https://zhekaili.github.io/0004/01/01/Basis/
- Copyright: 自由转载-非商用-非衍生-保持署名(创意共享3.0许可证)