Machine Learning

Machine learning provides a computer with data, rather than explicit instructions. Using these data, the computer learns to recognize patterns and becomes able to execute tasks on its own.

Supervised Learning

Supervised learning is a task where a computer learns a function that maps inputs to outputs based on a dataset of input-output pairs.

基于空气湿度(humidity,input)和大气压强(pressure,input)，我们想知道是否会下雨(output)。

通过许多已有的数据，我们可以训练出模型，并用模型来预测结果。这个模型可以是$f(humidity,pressure)\rightarrow (rain,not \space rain)$

Nearest-Neighbor Classification

在这个问题中，即找到离预测点最接近的一个数据点，将前者视为后者的同类。

下图中，白色数据点是待预测点

K-Nearest-Neighbors Classification

选取k个最接近的点，将待预测点视为其中大多数点的同类

Perceptron Learning

Another way of going about a classification problem, as opposed to the nearest-neighbor strategy, is looking at the data as a whole and trying to create a decision boundary.

找到一种边界，这个边界将数据区域分区，根据数据位于的区域得出结果。在二维的数据域中，我们可以使用直线作为边界。

$$h_{w}(x)= \begin{cases}
1 &\text{if } w_0 + w_1 * x_1 + w_2 * x_2 > 0 \
0 &\text{otherwise }
\end{cases}
$$

perceptron learning的关键是我们在遇到新数据时会根据我们的模型的表现与实际的差别对模型进行优化。

$w_i=w_i+\alpha (actual-estimate) * x_i$

这里$\alpha$被称为学习效率(learning rate)，我们可以自由设置，这取决于我们对新数据的重视程度。

上述模型给出的是离散的结果，而我们的模型是有出错概率的，我们同时想直到我们的预测的准确性。

实际上，即使两组不同的输入都得到rain的答案，这两个答案准确的概率也不一定相同，因此，相较于直接给出下雨或不下雨的判断，给出一个下雨的概率会更加合理和实用。

Support Vector Machines

支持向量机希望找到一个边界，这个边界将不同类型的数据分割开来。

事实上，有很多这样的边界，它们完美地分割了数据

即使这些边界都很好地实现了数据的分割，但它们对于未知数据点的期待位置显然是不一样的

基于最大分割(Maximum Margin Separator)原则，总是给不同的数据组留有相当的位置缓冲，以期待更多的未知数据点在落在同类的周围时有空余的位置。最终我们选择上图中最右边的分割方式。

并不是所有的数据集都有一个线性的分割

在更高维的数据域中，这种分割会更加抽象。

Regression

regression is a supervised learning task of a function that maps an input point to a continuous value, some real number. This differs from classification in that classification problems map an input to discrete values (Rain or No Rain).

eg:线性回归

Loss Functions

Loss functions are a way to quantify the utility lost by any of the decision rules above. The less accurate the prediction, the larger the loss.

0-1 Loss Function

L(actual, predicted):
- 0 if actual = predicted
- 1 otherwise

L₁: L(actual, predicted) = |actual - predicted|

L₂: L(actual, predicted) = (actual - predicted)²

Overfitting

Overfitting is when a model fits the training data so well that it fails to generalize to other data sets.

Regularization

Regularization is the process of penalizing hypotheses that are more complex to favor simpler, more general hypotheses. We use regularization to avoid overfitting.

In regularization, we estimate the cost of the hypothesis function h by adding up its loss and a measure of its complexity.

$cost(h)=loss(h)+\lambda complexity(h)$

$\lambda$是一个可调控的参数，取决于我们对模型函数的复杂度的容忍程度。

Holdout Cross Validation

In this technique, we split all the data in two: a training set and a test set. We run the learning algorithm on the training set, and then see how well it predicts the data in the test set.

k-Fold Cross-Validation

In this process, we divide the data into k sets. We run the training k times, each time leaving out one dataset and using it as a test set. We end up with k different evaluations of our model, which we can average and get an estimate of how our model generalizes without losing any data.

Reinforcement Learning

Reinforcement learning is another approach to machine learning, where after each action, the agent gets feedback in the form of reward or punishment (a positive or a negative numerical value).

Markov Decision Processes

Reinforcement learning can be viewed as a Markov decision process, having the following properties:

Set of states S
Set of actions Actions(S)
Transition model P(s’ | s, a)* Reward function R(s, a, s’)

Q-Learning

Q-Learning is one model of reinforcement learning, where a function Q(s, a) outputs an estimate of the value of taking action a in state s.

$$
begin: Q(s,a)=0 \space for \space all \space s,a
\newline
repeat:\newline
take \space action\newline
estimate \space Q(s,a) \space now \newline
update \space Q(s,a) \space by \newline
Q(s,a) \leftarrow Q(s,a)+\alpha (new \space value \space estimate - Q(s,a))
$$

我也将当前的选择对将来效益的影响考虑在内，r(reward)是当前的效益，$\gamma$是参数，$a^`$是做出当前行为后的下一次可做行动的集合中的一个：

$Q(s,a) \leftarrow Q(s,a)+\alpha ((r + \gamma \space max_{a^{}}Q(s^,a^`))-Q(s,a))$

Explore vs. Exploit Tradeoff

A greedy algorithm always exploits, taking the actions that are already established to bring to good outcomes. However, it will always follow the same path to the solution, never finding a better path. Exploration, on the other hand, means that the algorithm may use a previously unexplored route on its way to the target, allowing it to discover more efficient solutions along the way.

ε (epsilon) greedy algorithm

In this type of algorithm, we set ε equal to how often we want to move randomly. With probability 1-ε, the algorithm chooses the best move (exploitation). With probability ε, the algorithm chooses a random move (exploration).

Function Approxomation

The approach becomes more computationally demanding when a game has multiple states and possible actions, such as chess. It is infeasible to generate an estimated value for every possible move in every possible state. In this case, we can use a function approximation, which allows us to approximate Q(s, a) using various other features, rather than storing one value for each state-action pair. Thus, the algorithm becomes able to recognize which moves are similar enough so that their estimated value should be similar as well, and use this heuristic in its decision making.

Unspervised Learning

In unsupervised learning, only the input data without labels is present and the AI learns patterns in these data.

Clustering

Clustering is an unsupervised learning task that takes the input data and organizes it into groups such that similar objects end up in the same group.