Gaussian Mixture Model and EM method.

alpha

代码写的好，bug改到老

免责声明：网站内容仅供个人学习记录，禁做商业用途，转载请注明出处。

版权所有 © 2017-2020 NEUSNCP个人学习笔记辽ICP备17017855号-2

Gaussian Mixture Model and EM method.

alpha 2019年7月18日 14:09:32

——徐老师讲课最后一节。

首先是 WEKA安装libsvm及使用演示。

言归正传，Generative Models，生成模型。

A model of the data fenerating process gives rese to data.
Model estimation from data is most commonly through Likelihood estimation.

Find the “best”model which has generated the data. In a likelihood function the data is considered fixed and one searches for the best model over the different choices available.

The choice of the model space is plentiful but not unlimited.
There is a bit of “art”in selecting the appropriate model space.
Typically the model space is assumed to be a linear combination of known probability distribution functions.

你自己拍一张照片，然后输入生成模型中，用梵高的技巧（生成模型），就会变成梵高风格的画作，这个思想在去年被一个法国人，跑出不同的结果，其中有一幅画，居然花了50万美金买走了。。。

忽然想起 VincentWei的说说：

高斯混合模型，英文全称：Gaussian mixture model，简称GMM。高斯混合模型就是用高斯概率密度函数（二维时也称为：正态分布曲线）精确的量化事物，将一个事物分解为若干基于高斯概率密度函数行程的模型。这句话看起来有些深奥，这样去理解，事物的数学表现形式就是曲线，其意思就是任何一个曲线，无论多么复杂，我们都可以用若干个高斯曲线来无限逼近它，这就是高斯混合模型的基本思想。那么下图（图1.1）表示的就是这样的一个思想。

Likelihood Function

P(Model | Data) = \frac{P(Data | Model)P(Model)}{P(Data)}

这里的Model 就是一个参数，不是一个随机变量。左侧，给定数据，我的模型的参数应该服从一个什么样的分布，这个是很难得到的。后面，根据数据的分布，看哪种参数的值跟我的模型最相似，只需要找出最优的值就行了。

例子：Suppose we have the following data,

0-1-1-0-0-1-1-0
In this case it is sensible to choose the Bernoulli distribution (B(p)) as the model space. 这个数据来源一个二元伯努利分布，

P (X=x) = p^x (1-p) ^ {1-x}

p应该取什么样的值，跟P最吻合。

Now we want to choose the best p, i.e.,

argmax_p P(Data|B(p))

其中，

现在需要极大化，取log, 得到 l(p) = LogL(p)
拉格朗日法，求导数，
4log(p) + 4log(1-p)
4/p – 4/(1-p) = 0
p = ½ 的时候，跟伯努利分布最吻合。
问题是可能存在过拟合的问题。

EM算法，无监督学习中一种比较重要的算法。

Vector Clustering

Data points.
图像切割：

K-means VS GMM

K-means是硬性的聚类。目标是数据分成k个类，算k个中心坐标，让每个点距离中心的聚类最小。

J = \sum_{n=1}^N \sum_{k=1}^K r_{nk} {|| X_n  – \mu_k ||}^2

r_{nk}

表示每个点数据哪个聚类的中心。

r_{nk}

= 1 if

k = argmin_j {|| X_n – \mu_j||}^2

eles 0
GMM是一个Fuzzy的聚类方法，软聚类。

EM 算法，先假设我知道，随机的分布，知道哪个点属于哪个高斯分布。第一步固定的高斯密度函数，算每个点属于哪一个类。然后，根据现有参数，重新算后验的分布，算出每个数据的k个不同的高斯分布，得到不同的函数值，每个数据到底属于那一个类的概率是知道的。等于修改了第一步的概率。这次的划分跟前一次的划分不是一样的，前一次随机的，后一次不是随机的。

参考资料：

Gatys L A , Ecker A S , Bethge M . A Neural Algorithm of Artistic Style[J]. Computer Science, 2015.
https://www.jianshu.com/p/9f03b61fdeac
课件下载：https://www.neusncp.com/user/file?id=158
高斯混合模型的例子：https://lukapopijac.github.io/gaussian-mixture-model/

最近更新： 2019年7月18日 17:11:24

浏览： 2.6K

您的评论 *

[[total]] 条评论

添加评论

[[item.time]]

[[item.user.username]] [[item.floor]]楼

[[cc.time]]

[[cc.user.username]] #[[cc.room]]

- «
- 1
- ...
- [[i]]
- ...
- »

点击加载更多……
添加评论
登录后即可回复

添加评论登录后即可回复

alpha

43

869

Gaussian Mixture Model and EM method.

[[total]] 条评论