TIL 2021-12-08 Linear Discrimination

Likelihood-based

likelihood에 Bayes rule 써서 discriminant function을 만들어냈음

Discriminant-based

density 예측 안하고 바로 discriminant function 추출해냄

Boundary만 알면 되지 각 경계의 density를 맞게 예측할 필요 없다

Linear Discriminant

단순함 O(d) space/computation

feature의 weighted sum으로 되어있으니 knowledge extraction 가능

p(x|C)가 shared cov matrix 있는 Gaussian이면 Optimal(linearly seperable이면 유용)

Generalized Linear Model

Qudratic discriminant

O(d^2)

Higher-order term

High order에서 찾은 linear discriminant는 original space에서는 nonlinear임

x->z할떄 nonlinear basis function 사용할 수 있음

Two class

one discriminant면 충분

Multiple class

one vs all

겹치는 경우 discriminant function으로부터 제일 먼 것

Pairwise

Class i,j와 대해서 구분하는 값을 계싼

각각과 비교했을때 모든 j에 대해서 i가 가깝다

i로 labeling함

문제는 모든 케이스에 대해서 만족하지 못하는 경우 생김

Discriminant -> Posterior

p(x|Ci) ~ N(ui, shared cov)

Sigmoid

음수로 갈수록 0

양수로 갈수록 1

0일땐 0.5

Gradient Descent

E(w|x)는 샘플 x의 파라미터W에서의 에러

Error function의 contour를 gradient따라 min 포인트까지 도달하는 것

random w에서 시작해서 gradient의 음수 방향으로 반복적으로 w 업데이트

global min이 아닐수도 있음

converge 안할수도 있음

Logistic Discrimination

Two class: log likeli ratio가 linear라고 가정

cross-entropy error

iteration이 반복될수록

w은 작은 값에서 중요한 애들은 커지고 안 중요하면 계속 작은 상태

positive하게 영향 미치면 양의 방향으로 커지고 negative면 음의 방향으로 작아짐

Generalizing linear model

Quadratic

Sumo of basis funtion

neural network의 hidden unit

svm의 kernel

Multiple label

k개의 label이 0이나 1일수 있음

Outputs가 two-class sigmoid

Error fn 이 cross-entropy losses의 합임

Learning to Rank

u보다 v를 더 선호하면 g(x^u)>g(x^v)

instance들이 project되었을떄 rank가 잘 보존되는 direction w 구하기

Rank Error

g(x^v)-g(x^u)가 더 크면 error

순서가 뒤집어졌을때만 Error를 정의함'

Multilayer perceptron

뇌 따라하는 알고리즘

Axon으로 신호 전달

Recepter가 신경전달물질 받음 일정 이상이면 신호가 fire

Training

Online learning vs Batch learning

online은 하나의 example 보고 error 계산해서 gradient 계산해서 적용함

전체데이터 저장 안해도 됨

시간 따라 문제가 변할수도 있음

전체 example 보는게 batch

Multilayer Perceptron

original space x에 있는 input을 nonlinear transformation으로 z로 mapping한 후에

거기서 linear combination을 이용한 perceptron을 사용하면 nonlinear한 term들을 학습할수 있게 됨

이거 학습시키는게 Backpropagation

Overfitting

weight 개수 H(d+1)+ (H+1)K

Hidden Unit 개수 많아지면 complexity 증가

Validation Error 올라감

Overtraining

Validation Error 올라감

0에 가까웟던 Unit들이 학습 거듭해가면서 0에서 멀어짐

0에서 멀어지는 Unit들이 많아지면 Model의 복잡도가 늘어나는거랑 동일함

Hidden Representation

MLP는 hidden unit이 nonlinear일때 generalized linear model임

장점은 데이터로 basis function parameter 학습 가능

hidden unit은 code/ embedding을 공부함

Transfer learning -> 코드를 다른 task에 사용함

Input dimension보다 hidden layer의 dimension이 작으면 dimensionality reduction으로도 작동함

Autoencoders

Word2vec

Skipgram

linear encoder and input이 center word고 output이 context word인 decoder가 있는 autoencoder

'TIL' 카테고리의 다른 글

TIL 2021-12-21 Redux ToolKit과 createAsyncThunk, Figma 아이콘 제작 (0)	2021.12.22
TIL 2021-12-11 Graphical Model (0)	2021.12.11
TIL 2021-12-05 Kernel Machines (0)	2021.12.06
TIL 2021-12-03 OSPFv2, (0)	2021.12.03
TIL 2021-12-02 RIPv2, OSPF (0)	2021.12.02

일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Moon on River