TIL 2021-11-27 Deep Learning, Local Model

딥러닝

MLP에서 hidden layer가 엄청 많은 것

Human contribution 최소화

complex error surface -> difficult to learn

grandient vanishing/exploding

계산된 Output으로부터 error을 backpropagate해서 각 layer의 gradient를 계산하는데, 중간에 0에 가까운 값이 나오면 low layer들은 update가 잘 안되서 gradient가 사라지는 현상, 1보다 큰 값들이 계속나와 너무 커지는 현상

Rectified Linear Unit(ReLU)

큰 a에서 saturate하지 않음

0보다 작으면 0이므로 sparse representation (hidden unit 개수 줄음)

a<0인 경우 학습이 없으니 초기화 조심해야함 (0에 가까운 작은 wegiht에 positive bias를 붙이는 등..)

Leaky ReLU

low layer에서부터 몇개의 hidden node로부터 weighted sum이 계산되는 incoming edge 개수 -> fan in

Momentum

과거 gradient의 평균값을 일부 적용해줌(궤적 smoothing)

Adaptive Learning factor

RMSprop

gradient가 적으면 업데이트 더 하고 적으면 적게 함

ADAM

s,r은 처음에 0이고 알파와 로우는 1에 가까워서 초반에는 estimation이 0에 biased되는데 division을 통해서 고쳐짐

Batch Normalization

online, small mini-batch에 적용안됨

hiddent unit을 Z-normalize

inference에는 먼저 트레이닝 subset 을 통해서 mj, sj을 계산하고 이후 inference

Regularization

Hints

사이즈, 회전등의 변화를 주어도 불변해야할수 있도록

Virtual example 추가

원래 값과 virtual example 값 사이의 prediction값이 같아야함, 멀어질수록 Eh값이 커짐

Weight Decay

weight 0 -> simpler model

모든 weight은 0이었다가 점차 움직임

미리 멈춤 or 0이 아닌 weight에 penalty 적용(weight decay)

Bayesian Perspective

Bayesian rule 통해서

p(w)의 log likelihood function이 최대가 되는 것 찾기

Dropout

input/ hidden unit에 noise를 넣거나 drop을 해버림

Convoltions

Fan-in 줄이기

Image 경우는 local patch만 보는 등..

Weight Sharing

다른 location에 같은 weight을 사용함

pooling은 dimension을 줄이는 방법, 2x2를 하나의 셀로 바꿔버리는 등..

stride는 layer에 있을때 중간에 있는것은 버려버림. pooling을 하고 위의 layer에서 stride해서 pooling한 값만 가지고 있는 등..

Tuning Network size

불필요한 edge를 제거하기

network growth

Skip Connection

중간 layer 스킵 -> simpler model

gate unit

비중을 조절할수 있음

Learning Time

Time-delay network

마지막 input이 들어올떄까지 기다렸다가 mlp 실행

input 사이즈가 t보다 넘으면 t까지만 본다는게 단점

Recurrent

과거 state를 반영해서 현재를 prediction

Long short term memory

과거의 값이 시간이 갈수록 영향이 적어지는 것을 보완

Gated Recurrent Unit

Generative Adversarial Networks

G가 생성하는 사진이 진짜 사진인지 D가 판별

상호 경쟁적

D 업데이트할떄는 G를 고정하고 max loss

G 업데이트할떄는 D를 고정하고 min loss

Local Model

input space를 local region으로 나누어 각 patch마다 간단한 모델 학습

Online k-Means

xt에서 가장 가까운 centroid를 xt 방향으로 move

Adaptive Resonance Theory

vigilance로 비교해 만약 커버되지 않는 input x라면 새로운 cluster 추가함

Self Organizing Maps

neighbored function 통해서 neighbor에 해당하는 모든 애들을 centroid 적용

Radial Basis Function (RBF)

아웃풋 입장에서는 MLP와 다를게 없는데, 각각의 hidden unit이 갖는 값들이 달라짐

xt번째 input의 feature는, RBF의 center와 3 시그마 이내 unit들만 active되고 나머지는 0에 수렴하게 됨

x0가 필요하지 않음 center만 찾는것이기 떄문

Training RBF

Hybrid

1. k-mean 돌려서 center, spread값 구함

2.activation unit을 input으로 하는 퍼셉트론으로 보고 weight을 학습

Fully supervised

Rules and Exception

local region을 벗어난 input x에 대해서는 default rule에 의해서 w0 or linear rule을 리턴

Rule based Knowledge

미리 prior knowledge로 initialize하고 데이터로 fine tuning

Mixture of Expert

linear model로 weight을 학습시킴

'TIL' 카테고리의 다른 글

TIL 2021-11-30 NAT3, RIP(벨만포드) (0)	2021.11.30
TIL 2021-11-29 NAT2 (0)	2021.11.29
TIL 2021-11-25 Multilayer Perceptron, git cherrypick, rebase, react-query invalidateQueries (0)	2021.11.25
TIL 2021-11-22 Firebase 9, webpack-bundle-analyzer (0)	2021.11.22
TIL 2021-11-21 scikit learn, lemmatization, ffmpeg (0)	2021.11.21

일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Moon on River