The Beautiful Future

Adam 본문

DNN

Adam

Small Octopus 2017. 5. 24. 11:55

Adam: Adaptive Moment Estimation


** Original Paper

- ADAM: A METHOD FOR STOCHASTIC OPTIMIZATION


** 정리 잘되어있는 블로그

http://shuuki4.github.io/deep%20learning/2016/05/20/Gradient-Descent-Algorithm-Overview.html


** 설명

RMSProp과 Momentum 방식을 합친 알고리즘

Momentum과 유사하게 Gradient의 지수평균을 저장

mt=β1mt1+(1β1)θJ(θ)

RMSProp과 유사하게 Gradient의 제곱을 지수평균으로 저장

vt=β2vt1+(1β2)(θJ(θ))2

mt=0,vt=0  where t=0 이기 때문에

학습 초반부에는 0에 가까운 값을 가질 것이다. 

^mt=mt1βt1

^vt=vt1βt2


** 사용 예

* LENET

base_lr: 0.001

momentum: 0.9

momentum2: 0.999


*DCGAN

lr = 0.0002

beta1 = 0.5

beta2 = 0.999


*lenet_solver_adam.prototxt

# The train/test net protocol buffer definition

# this follows "ADAM: A METHOD FOR STOCHASTIC OPTIMIZATION"

net: "examples/mnist/lenet_train_test.prototxt"

# test_iter specifies how many forward passes the test should carry out.

# In the case of MNIST, we have test batch size 100 and 100 test iterations,

# covering the full 10,000 testing images.

test_iter: 100

# Carry out testing every 500 training iterations.

test_interval: 500

# All parameters are from the cited paper above

base_lr: 0.001

momentum: 0.9

momentum2: 0.999

# since Adam dynamically changes the learning rate, we set the base learning

# rate to a fixed value

lr_policy: "fixed"

# Display every 100 iterations

display: 100

# The maximum number of iterations

max_iter: 10000

# snapshot intermediate results

snapshot: 5000

snapshot_prefix: "examples/mnist/lenet"

# solver mode: CPU or GPU

type: "Adam"

solver_mode: GPU

Comments