The Beautiful Future

Adam 본문

DNN

Adam

Small Octopus 2017. 5. 24. 11:55

Adam: Adaptive Moment Estimation


** Original Paper

- ADAM: A METHOD FOR STOCHASTIC OPTIMIZATION


** 정리 잘되어있는 블로그

http://shuuki4.github.io/deep%20learning/2016/05/20/Gradient-Descent-Algorithm-Overview.html


** 설명

RMSProp과 Momentum 방식을 합친 알고리즘

Momentum과 유사하게 Gradient의 지수평균을 저장

\( m_{t} = \beta_{1}m_{t-1} + (1 - \beta_{1})\nabla_{\theta}J(\theta) \)

RMSProp과 유사하게 Gradient의 제곱을 지수평균으로 저장

\( v_{t} = \beta_{2}v_{t-1} + (1 - \beta_{2})(\nabla_{\theta}J(\theta))^2 \)

\( m_{t} = 0, v_{t} = 0 \ \ where \ t = 0 \) 이기 때문에

학습 초반부에는 0에 가까운 값을 가질 것이다. 

\( \hat{m_{t}} = \frac{m_{t}}{1-\beta_{1}^{t}} \)

\( \hat{v_{t}} = \frac{v_{t}}{1-\beta_{2}^{t}} \)


** 사용 예

* LENET

base_lr: 0.001

momentum: 0.9

momentum2: 0.999


*DCGAN

lr = 0.0002

beta1 = 0.5

beta2 = 0.999


*lenet_solver_adam.prototxt

# The train/test net protocol buffer definition

# this follows "ADAM: A METHOD FOR STOCHASTIC OPTIMIZATION"

net: "examples/mnist/lenet_train_test.prototxt"

# test_iter specifies how many forward passes the test should carry out.

# In the case of MNIST, we have test batch size 100 and 100 test iterations,

# covering the full 10,000 testing images.

test_iter: 100

# Carry out testing every 500 training iterations.

test_interval: 500

# All parameters are from the cited paper above

base_lr: 0.001

momentum: 0.9

momentum2: 0.999

# since Adam dynamically changes the learning rate, we set the base learning

# rate to a fixed value

lr_policy: "fixed"

# Display every 100 iterations

display: 100

# The maximum number of iterations

max_iter: 10000

# snapshot intermediate results

snapshot: 5000

snapshot_prefix: "examples/mnist/lenet"

# solver mode: CPU or GPU

type: "Adam"

solver_mode: GPU

Comments