Pytorch Quantization

Notice

Recent Posts

Recent Comments

Link

« 2025/05 »
일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

The Beautiful Future

Pytorch Quantization 본문

DNN/Quantization

Pytorch Quantization

Small Octopus 2020. 8. 16. 20:58

https://pytorch.org/docs/stable/quantization.html

Quantization — PyTorch 1.6.0 documentation

Shortcuts

pytorch.org

Introduction

콴타이제이션하면 2~4배 빠르다.

x86 CPU에서 AVX2 이상 SIMD연산을 지원한다.

ARM CPU에서 일반적으로 mobile/embedded devices를 찾을 수 있다.

Backend

파이토치는 콴타이제이션은 현재 x86과 ARM용으로 두가지 백엔드를 지원한다.

fbgemm for x86, qnnpack for ARM QNNPACK

이 두 모드지원은 qconfig에서 설정이 가능하다.

post training quantization

qconfig = torch.quantization.get_default_qconfig('qnnpack')

quantization aware training

qconfig = torch.quantization.get_default_qat_qconfig('qnnpack')

게다가 torch.backens.quantized.engine 파라미터에 백엔드가 설정될수있다.

torch.backends.quantized.engine = 'qnnpack'

Quantized Tensors

파이토치는 텐서단위 콴타이제이션을 지원해주고 또한 채널단위 비대칭 선형 콴타이제이션을 지원한다.

텐서단위라는 의미는 하나의 텐서에 속해있는 모든 값들이 하나의 값으로 같은 방식으로 스케일된다.

채널단위라는 의미는 디멘전 단위로 스케일 되는 것을 말하는데, 보다적은 콴타이제이션 에러를 가진다.

$$ Q(x, scale, zero_point) = round( \frac{x}{scale} + zero_point) $$

제로 포인트는 콴타이제이션 후에 에러가 없어서 패딩과 같은 연산에 추가적인 에러는 발생하지 않는다.

Operation coverage

콴타이즈된 텐서는 플로팅 포인트 텐서 보다 제한적인 조작이 지원된다.

파이토치의 NN operator는 8 bit weights(data_type = qint8), 8 bit activations(data_type = qint8)를

제한적으로 지원한다.

conv와 linear 연산만 채널단위 콴타이제이션이 적용된다.

입력 데이터의 최소와 최대는 선형으로 콴타이즈 범위의 최소와 최대로 맵핑되는데

제로 포인트는 콰타이제이션 에러에 영향이 없도록 한다.

추가적인 데이터 타입과 콴타이제이션 기술은 custom operator mechanism을 통해 구현될 수 있다.

torch.nn.quantized 에서 콴타이제이션 모듈을 찾을 수 있다. 명시적으로 scale 과 zero_point를 받는다.

일반적인 혼합 콴타이제이션 방법은 torch.nn.intrinsic.qat 에서 사용할 수 있다.

quantization aware training을 위해서는 torch.nn.qat 와 torch.nn.intrinsic.qat에서 지원한다.

Quantized torch.Tensor operations

연산들은 torch namespace 또는 Tensor for quantized tensors에서 찾아서 쓸수 있다.

- quantize_per_tensor() - scale과 zero point를 이용해서 float tensor를 콴타이즈 텐서로 변환

- quantize_per_channel() - per channel scale 과 zero point를 이용해서 변환

- 만약 채널단위 콴타이제이션이 아니면, view(), as_strided(), expand(), flatten(), select()

파이선 스타일 인덱싱이 가능하다.

- 비교, ne(), eq(), ge(), le(), gt(), lt()

- copy_() 내부 자기자신으로 복사

- clone() 딥카피

- dequantize() 콴타이즈 텐서에서 float tensor로 변환

- equal()

- int_repr() 프린트 콴타이즈 텐서의 인트저 표현

- max(), mean(), min(),

- q_scale(), q_zero_point(), q_per_channel_scales(), q_per_channel_zero_points(), q_per_channel_axis()

- resize_(), sort(), topk()

torch.nn.functional

torch.nn.intrinsic

torch.nn.intrinsic.qat

versions of layers for quantization-aware training

* ConvBn2d — Conv2d + BatchNorm * ConvBnReLU2d — Conv2d + BatchNorm + ReLU * ConvReLU2d — Conv2d + ReLU * LinearReLU — Linear + ReLU

'DNN > Quantization' 카테고리의 다른 글

Quantization and Training of Neural Networks for EfficientInteger-Arithmetic-Only Inference (0)	2020.08.25
gemmlowp (0)	2020.08.24
TF Quantize function (0)	2020.08.23

'DNN/Quantization' Related Articles

Comments

The Beautiful Future

Pytorch Quantization 본문

Pytorch Quantization

torch.nn.functional

torch.nn.intrinsic

'DNN > Quantization' 카테고리의 다른 글

티스토리툴바