YoloV3

Notice

Recent Posts

Recent Comments

Link

« 2025/12 »
일	월	화	수	목	금	토
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

The Beautiful Future

YoloV3 본문

DNN/Detection

YoloV3

Small Octopus 2020. 1. 13. 12:03

https://github.com/eriklindernoren/PyTorch-YOLOv3

코드분석

Grid Anchor

쿠다 사용사용 가능인지 보고 Float 타입선택

그리드 싸이즈는 출력의 높이 넓이, 입력 이미지 크기를 나눠서 스트라이드 크기를 지정

그리드 x, y를 arange를 사용해서 0~g까지 만들고 반복해서 그리드로 만듬, 그리드 shape은 11gg

anchor의 넓이 높이 크기를 스트라이드 대비한 크기로 만들어 줌, 즉 원래 anchor의 크기는 입력 이미지에 대비한 크기였다.

리스트 튜플로 w, h 순으로 담기고 1n11 shape으로 w와 h를 나눔.

def compute_grid_offsets(self, grid_size, cuda=True):
        self.grid_size = grid_size
        g = self.grid_size
        FloatTensor = torch.cuda.FloatTensor if cuda else torch.FloatTensor
        self.stride = self.img_dim / self.grid_size
        # Calculate offsets for each grid
        self.grid_x = torch.arange(g).repeat(g, 1).view([1, 1, g, g]).type(FloatTensor)
        self.grid_y = torch.arange(g).repeat(g, 1).t().view([1, 1, g, g]).type(FloatTensor)
        self.scaled_anchors = FloatTensor([(a_w / self.stride, a_h / self.stride) for a_w, a_h in self.anchors])
        self.anchor_w = self.scaled_anchors[:, 0:1].view((1, self.num_anchors, 1, 1))
        self.anchor_h = self.scaled_anchors[:, 1:2].view((1, self.num_anchors, 1, 1))

LOSS

x,y,w,h,confidence, probability를 출력, prediction은 batch, grid, grid, (anchor x (5+n_class)) 크기를 가짐.

채널에 x, y ,w , h, conf, cls pro 순으로 들어가 있음

x,y,confidence, probability는 sigmoid 출력, x,y는 현 그리드안에서 offset으로 0~1사이 값로 위치를 표현

confidence, probability는 확률 값.

	# Get outputs
        x = torch.sigmoid(prediction[..., 0])  # Center x
        y = torch.sigmoid(prediction[..., 1])  # Center y
        w = prediction[..., 2]  # Width
        h = prediction[..., 3]  # Height
        pred_conf = torch.sigmoid(prediction[..., 4])  # Conf
        pred_cls = torch.sigmoid(prediction[..., 5:])  # Cls pred.

예측된 x,y,w,h를 box x1,y1,x2,y2 좌표로 바꾼다.

grid의 크기는 11gg임으로 broad casting 되어서 더해진다.

anchor의 크기는 1n11의로 w, h는 exp을 통해서 각 anchor를 스케일한다.

	# Add offset and scale with anchors
        pred_boxes = FloatTensor(prediction[..., :4].shape)
        pred_boxes[..., 0] = x.data + self.grid_x
        pred_boxes[..., 1] = y.data + self.grid_y
        pred_boxes[..., 2] = torch.exp(w.data) * self.anchor_w
        pred_boxes[..., 3] = torch.exp(h.data) * self.anchor_h

build target으로 GT를 만든다.

 iou_scores, class_mask, obj_mask, noobj_mask, tx, ty, tw, th, tcls, tconf = build_targets(
                pred_boxes=pred_boxes,
                pred_cls=pred_cls,
                target=targets,
                anchors=self.scaled_anchors,
                ignore_thres=self.ignore_thres,
            )

x, y, w, h에는 mse loss를 주고 confidence와 cls pro에는 bce loss를 준다.

그리고 위 로스를 다합쳐서 토탈 로스가 된다.

obj_mask로 indexing을 해서 obj가 존재하는 곳만 loss를 계산한다.

	# Loss : Mask outputs to ignore non-existing objects (except with conf. loss)
            loss_x = self.mse_loss(x[obj_mask], tx[obj_mask])
            loss_y = self.mse_loss(y[obj_mask], ty[obj_mask])
            loss_w = self.mse_loss(w[obj_mask], tw[obj_mask])
            loss_h = self.mse_loss(h[obj_mask], th[obj_mask])
            loss_conf_obj = self.bce_loss(pred_conf[obj_mask], tconf[obj_mask])
            loss_conf_noobj = self.bce_loss(pred_conf[noobj_mask], tconf[noobj_mask])
            loss_conf = self.obj_scale * loss_conf_obj + self.noobj_scale * loss_conf_noobj
            loss_cls = self.bce_loss(pred_cls[obj_mask], tcls[obj_mask])
            total_loss = loss_x + loss_y + loss_w + loss_h + loss_conf + loss_cls

Ground-Truth

def build_targets(pred_boxes, pred_cls, target, anchors, ignore_thres):

    ByteTensor = torch.cuda.ByteTensor if pred_boxes.is_cuda else torch.ByteTensor
    FloatTensor = torch.cuda.FloatTensor if pred_boxes.is_cuda else torch.FloatTensor

	# pred_boxes : batch, anchor, grid, grid, (x y w h conf)
    # pred_cls : batch, anchor, grid, grid, (cls)
    nB = pred_boxes.size(0)
    nA = pred_boxes.size(1)
    nC = pred_cls.size(-1)
    nG = pred_boxes.size(2)

    # Output tensors
    obj_mask = ByteTensor(nB, nA, nG, nG).fill_(0)
    noobj_mask = ByteTensor(nB, nA, nG, nG).fill_(1)
    class_mask = FloatTensor(nB, nA, nG, nG).fill_(0)
    iou_scores = FloatTensor(nB, nA, nG, nG).fill_(0)
    tx = FloatTensor(nB, nA, nG, nG).fill_(0)
    ty = FloatTensor(nB, nA, nG, nG).fill_(0)
    tw = FloatTensor(nB, nA, nG, nG).fill_(0)
    th = FloatTensor(nB, nA, nG, nG).fill_(0)
    tcls = FloatTensor(nB, nA, nG, nG, nC).fill_(0)

    # Convert to position relative to box
    # target nbox, (0 label cx cy w h)
    target_boxes = target[:, 2:6] * nG # 0~1 -> 0 ~ grid size
    gxy = target_boxes[:, :2]
    gwh = target_boxes[:, 2:]
    
    # Get anchors with best iou
    ious = torch.stack([bbox_wh_iou(anchor, gwh) for anchor in anchors])
    best_ious, best_n = ious.max(0)
    
    # Separate target values
    b, target_labels = target[:, :2].long().t()
    gx, gy = gxy.t()
    gw, gh = gwh.t()
    gi, gj = gxy.long().t()
    
    # Set masks
    obj_mask[b, best_n, gj, gi] = 1
    noobj_mask[b, best_n, gj, gi] = 0

    # Set noobj mask to zero where iou exceeds ignore threshold
    for i, anchor_ious in enumerate(ious.t()):
        noobj_mask[b[i], anchor_ious > ignore_thres, gj[i], gi[i]] = 0

    # Coordinates
    tx[b, best_n, gj, gi] = gx - gx.floor()
    ty[b, best_n, gj, gi] = gy - gy.floor()
    
    # Width and height, log
    tw[b, best_n, gj, gi] = torch.log(gw / anchors[best_n][:, 0] + 1e-16)
    th[b, best_n, gj, gi] = torch.log(gh / anchors[best_n][:, 1] + 1e-16)
    
    # One-hot encoding of label
    tcls[b, best_n, gj, gi, target_labels] = 1
    # Compute label correctness and iou at best anchor
    class_mask[b, best_n, gj, gi] = (pred_cls[b, best_n, gj, gi].argmax(-1) == target_labels).float()
    iou_scores[b, best_n, gj, gi] = bbox_iou(pred_boxes[b, best_n, gj, gi], target_boxes, x1y1x2y2=False)

    tconf = obj_mask.float()
    return iou_scores, class_mask, obj_mask, noobj_mask, tx, ty, tw, th, tcls, tconf

import torch
import numpy as np
import torch.nn as nn

anchor_w = 224
anchor_h = 224
img_w = 224
img_h = 224

nB = 1
nA = 5
nC = 80
nG = 14

def bbox_wh_iou(wh1, wh2):
    wh2 = wh2.t()
    w1, h1 = wh1[0], wh1[1]
    w2, h2 = wh2[0], wh2[1]
    # print('w1 size: ', w1.size()) 1
    # print('h1 size: ', h1.size()) 1
    # print('w2 size: ', w2.size()) n
    # print('h2 size: ', h2.size()) n
    inter_area = torch.min(w1, w2) * torch.min(h1, h2)
    union_area = (w1 * h1 + 1e-16) + w2 * h2 - inter_area
    return inter_area / union_area

def bbox_iou(box1, box2, x1y1x2y2=True):
    """
    Returns the IoU of two bounding boxes
    """
    if not x1y1x2y2:
        # Transform from center and width to exact coordinates
        b1_x1, b1_x2 = box1[:, 0] - box1[:, 2] / 2, box1[:, 0] + box1[:, 2] / 2
        b1_y1, b1_y2 = box1[:, 1] - box1[:, 3] / 2, box1[:, 1] + box1[:, 3] / 2
        b2_x1, b2_x2 = box2[:, 0] - box2[:, 2] / 2, box2[:, 0] + box2[:, 2] / 2
        b2_y1, b2_y2 = box2[:, 1] - box2[:, 3] / 2, box2[:, 1] + box2[:, 3] / 2
    else:
        # Get the coordinates of bounding boxes
        b1_x1, b1_y1, b1_x2, b1_y2 = box1[:, 0], box1[:, 1], box1[:, 2], box1[:, 3]
        b2_x1, b2_y1, b2_x2, b2_y2 = box2[:, 0], box2[:, 1], box2[:, 2], box2[:, 3]

    # get the corrdinates of the intersection rectangle
    inter_rect_x1 = torch.max(b1_x1, b2_x1)
    inter_rect_y1 = torch.max(b1_y1, b2_y1)
    inter_rect_x2 = torch.min(b1_x2, b2_x2)
    inter_rect_y2 = torch.min(b1_y2, b2_y2)
    # Intersection area
    inter_area = torch.clamp(inter_rect_x2 - inter_rect_x1 + 1, min=0) * torch.clamp(
        inter_rect_y2 - inter_rect_y1 + 1, min=0
    )
    # Union Area
    b1_area = (b1_x2 - b1_x1 + 1) * (b1_y2 - b1_y1 + 1)
    b2_area = (b2_x2 - b2_x1 + 1) * (b2_y2 - b2_y1 + 1)

    iou = inter_area / (b1_area + b2_area - inter_area + 1e-16)

    return iou

def get_target(cuda=True):
    label_path = 'E:/DB/COCO/detection/labels/train2014/COCO_train2014_000000000009.txt'
    boxes = torch.from_numpy(np.loadtxt(label_path).reshape(-1, 5))

    normalized_labels = True
    h_factor, w_factor = (img_h, img_w) if normalized_labels else (1, 1)
    # Pad to square resolution

    # Extract coordinates for unpadded + unscaled image
    x1 = w_factor * (boxes[:, 1] - boxes[:, 3] / 2)
    y1 = h_factor * (boxes[:, 2] - boxes[:, 4] / 2)
    x2 = w_factor * (boxes[:, 1] + boxes[:, 3] / 2)
    y2 = h_factor * (boxes[:, 2] + boxes[:, 4] / 2)

    boxes[:, 1] = ((x1 + x2) / 2) / img_w
    boxes[:, 2] = ((y1 + y2) / 2) / img_h
    boxes[:, 3] *= w_factor / img_w
    boxes[:, 4] *= h_factor / img_h

    targets = torch.zeros((len(boxes), 6))
    targets[:, 1:] = boxes

    return targets.to('cuda' if cuda else 'cpu')



is_cuda = torch.cuda.is_available()

# gpu 유무 환경에 따라 텐서 타입 지정
ByteTensor = torch.cuda.ByteTensor if is_cuda else torch.ByteTensor
FloatTensor = torch.cuda.FloatTensor if is_cuda else torch.FloatTensor

# anchors
# stride 로 나줘져서 입력 크기 대비 출력의 크기로 줄어든다.
anchors = [(10,14),  (23,27),  (37,58),  (81,82),  (135,169)]
stride = img_w / nG
# n x 2
scaled_anchors = FloatTensor([(a_w / stride, a_h / stride) for a_w, a_h in anchors])
# 1 x n x 1 x 1
anchor_w = scaled_anchors[:, 0:1].view((1, nA, 1, 1))
anchor_h = scaled_anchors[:, 1:2].view((1, nA, 1, 1))
print('scaled_anchors size: ', scaled_anchors.size())

# Input
# 예측값 배치 앵커개수 채널 컨피던스 확률 바박 그리드 넓이 높이
prediction = FloatTensor(nB, nA*(nC + 5), nG, nG).uniform_(-10.0, 10.0)
print('prediction size bchw: ', prediction.type(), prediction.size())
# 인덱싱 편하게 앵커개수와 디텍션 정보사이 디멘젼을 나눔
prediction = prediction.view(nB, nA, nC+5, nG, nG)
print('prediction size bachw: ', prediction.size())
# 마지막에 디멘전에 검출 정보를 할당해서 인덱싱을 편하게한다.
prediction = prediction.permute(0,1,3,4,2).contiguous()
print('prediction size bahwc: ', prediction.size())

# 그리드 상대 좌표를 시그모이드로 영에서 일 사이 값으로 예측
x = torch.sigmoid(prediction[..., 0])  # Center x
y = torch.sigmoid(prediction[..., 1])  # Center y
# 앵커 대비 바박 넓이 높이의 비율 예측, GT에 ln 트랜스폼한 값으로 예측하게함
# 실제 쓸때는 exp() 로 출력하여 exp(ln(x)) = x 로  앵커 대비 바박 넓이 높이의 비율 예측
# loss를 줄때 다른 loss와 균형을 맞추기 위해 안정적으로 주기 위해 log-space transform을 주어서 크기를 작게 완화해서 학습
w = prediction[..., 2]  # Width
h = prediction[..., 3]  # Height
# 오브젝트 존재 여부 컨피던스를 시그모이드를 통해 확률값으로 출력
pred_conf = torch.sigmoid(prediction[..., 4])  # Conf
# 오브젝트 클래스를 시그모이드를 통해 확률값으로 출력
pred_cls = torch.sigmoid(prediction[..., 5:])  # Cls pred.


# Add offset and scale with anchors
pred_boxes = FloatTensor(prediction[..., :4].shape)
print('pred_boxes size: ', prediction.size())
grid_x = torch.arange(nG).repeat(nG, 1).view([1, 1, nG, nG]).type(FloatTensor)
grid_y = torch.arange(nG).repeat(nG, 1).t().view([1, 1, nG, nG]).type(FloatTensor)
#print('grid_x: ', grid_x)
#print('grid_y: ', grid_y)
print('grid_x size: ', grid_x.size())
print('x size: ', x.size())
# g x g 크기 x에 는 x offset이 0~1 값이 들어있고 그리드 단위에 더해준다.
pred_boxes[..., 0] = x.data + grid_x
pred_boxes[..., 1] = y.data + grid_y
# b x a x g x g 크기 w에 는 log space t되어있는 ln(w) 값이 들어있고
# 1 x a x 1 x 1 크기 anchor_w 와 곱해져서 그리드 단위 크기 넓이 높이가 된다
pred_boxes[..., 2] = torch.exp(w.data) * anchor_w
pred_boxes[..., 3] = torch.exp(h.data) * anchor_h

# target
# 이미지 한장에 있는 오브젝트를 표현하는 텐설르 반환
# 오브젝트 개수 x 6 의 크기로 컬럼에
# 배치 인덱스, 클래스 아이디, x, y, w, h 가 들어 있다.
target = get_target()
print('target size: ', target.size())

# Output tensors
obj_mask = ByteTensor(nB, nA, nG, nG).fill_(0)
noobj_mask = ByteTensor(nB, nA, nG, nG).fill_(1)
class_mask = FloatTensor(nB, nA, nG, nG).fill_(0)
iou_scores = FloatTensor(nB, nA, nG, nG).fill_(0)
tx = FloatTensor(nB, nA, nG, nG).fill_(0)
ty = FloatTensor(nB, nA, nG, nG).fill_(0)
tw = FloatTensor(nB, nA, nG, nG).fill_(0)
th = FloatTensor(nB, nA, nG, nG).fill_(0)
tcls = FloatTensor(nB, nA, nG, nG, nC).fill_(0)

# Convert to position relative to box
# 타겟안에 바박정보는 0~1임으로 그리드 단위로 바꿔준다.
target_boxes = target[:, 2:6] * nG # cx cy w h

# ground-truth xywh (그리드 단위)
# (batch x num obj) x 2
gxy = target_boxes[:, :2]
gwh = target_boxes[:, 2:]

# gt wh와 앵커 넓이 높이의 iou 계산
# 그리드 단위에서 iou 계산 출력은 nA x num obj 크기로
# 각 앵커에 대한 오브젝트의 iou
ious = torch.stack([bbox_wh_iou(anchor, gwh) for anchor in scaled_anchors])
print('ious: ', ious)
print('ious size: ', ious.size())

# best_ious: (batch x num obj) 개에 대해서 가장 잘맞는 iou를 반환
# best_n: (batch x num obj) 개에 대해서 가장 잘맞는 anchor index를 반환
best_ious, best_n = ious.max(0)
# print('best_ious: ', best_ious)
print('best_n: ', best_n)

# Separate target values
#obj가 포함되어있는 batch index와 target label
b, target_labels = target[:, :2].long().t()
print('target label size: ', target_labels.size())
#print('b target_labels: ', b, target_labels)

# ground truth x, y, w, h (그리드 단위)
gx, gy = gxy.t()
gw, gh = gwh.t()
# long 으로 변환 ground truth x, y
gi, gj = gxy.long().t()
print('gi: ', gi)
print('gj: ', gj)

# Set masks
# obj가 포함되어있는 batch에서 가까운 anchor를 선택하고 gj 행 gi 열에 1로 채운다
# 거기에 obj 가 존재한다.
obj_mask[b, best_n, gj, gi] = 1

# obj 존재 마스크가 confidence가 된다
tconf = obj_mask.float()
# 반대
noobj_mask[b, best_n, gj, gi] = 0

# Set noobj mask to zero where iou exceeds ignore threshold
# best 뿐만 아니라 어느정도 iou 높게 나오는 영역을  obj가 아닌것에서 빼주자
ignore_thres = 0.6
for i, anchor_ious in enumerate(ious.t()):
    noobj_mask[b[i], anchor_ious > ignore_thres, gj[i], gi[i]] = 0

# Coordinates
# obj가 포함되어있는 텐서 위치에 0~1 사이 offset을 넣어주자
tx[b, best_n, gj, gi] = gx - gx.floor()
ty[b, best_n, gj, gi] = gy - gy.floor()

# Width and height, anchor 에 대한 비율
# obj가 포함되어있는 텐서 위치에 log space transform 된 anchor에 대한 gt wh의 비율을 넣어주자
tw[b, best_n, gj, gi] = torch.log(gw / scaled_anchors[best_n][:, 0] + 1e-16)
th[b, best_n, gj, gi] = torch.log(gh / scaled_anchors[best_n][:, 1] + 1e-16)

# One-hot encoding of label
# obj가 포함되어있는 텐서 위치에 class index만 1
tcls[b, best_n, gj, gi, target_labels] = 1

# Loss
obj_mask = obj_mask.to(torch.bool)
noobj_mask = noobj_mask.to(torch.bool)
mse_loss = nn.MSELoss()
bce_loss = nn.BCELoss()

# x y w h 에는 mse loss을
loss_x = mse_loss(x[obj_mask], tx[obj_mask]) # 0~1
loss_y = mse_loss(y[obj_mask], ty[obj_mask])
loss_w = mse_loss(w[obj_mask], tw[obj_mask]) # 0~1 범위가 아니다. 비율 log
loss_h = mse_loss(h[obj_mask], th[obj_mask])

# obj confidence에는 obj 영역과 noobj 영역에 각각 bce loss를 준다. (확률)
loss_conf_obj = bce_loss(pred_conf[obj_mask], tconf[obj_mask])
loss_conf_noobj = bce_loss(pred_conf[noobj_mask], tconf[noobj_mask])

# obj 영역 loss와 noobj loss 에 스케일을 준다.
# bce loss를 거치면서 confidence 가 없는 영역이 더 넓은데 평균치로 계산 되므로
# 더 큰 가중치를 준다.
obj_scale = 1
noobj_scale = 100
loss_conf = obj_scale * loss_conf_obj + noobj_scale * loss_conf_noobj

# class
# num obj x class 크기로 bce loss가 계산된다. 같은 곳에 두 obj가 겹쳐있면 하나만 된다.
loss_cls = bce_loss(pred_cls[obj_mask], tcls[obj_mask])
print('pred_cls[obj_mask] size: ', pred_cls[obj_mask].size())

total_loss = loss_x + loss_y + loss_w + loss_h + loss_conf + loss_cls

#Metric
# Compute label correctness and iou at best anchor
# obj가 예측된 class 값 과 target의 결과값이 같은지 비교 (num obj)
arr_cls_mask = (pred_cls[b, best_n, gj, gi].argmax(-1) == target_labels).float()

# predicted class와 target class가 같은 위치만(맞춘경우만) 활성화하는 마스크
class_mask[b, best_n, gj, gi] = arr_cls_mask
# predicted box와 target box 사이 iou를 계산
iou_scores[b, best_n, gj, gi] = bbox_iou(pred_boxes[b, best_n, gj, gi], target_boxes, x1y1x2y2=False)

# class_mask의 obj 위치에는 target과 일치했다면 1이 들어가 있다.
cls_acc = 100 * class_mask[obj_mask].mean()
# 평균 prediction confidence
conf_obj = pred_conf[obj_mask].mean()
# 평균 prediction confidence of no obj
conf_noobj = pred_conf[noobj_mask].mean()
# confidence 가 0.5 이상인것들 1
conf50 = (pred_conf > 0.5).float()
# iou 가 0.5 이상인것들 1
iou50 = (iou_scores > 0.5).float()
# iou 가 0.75 이상인것들 1
iou75 = (iou_scores > 0.75).float()

# 검출된것들 conf 50 이상이면서 클래스를 맞추고, tconf: obj가 존재하는 곳만
# class_mask: 클래스 맞춘거
# 크기는 nB nA nG nG 이다.
detected_mask = conf50 * class_mask * tconf

# precision
# (iou50 이상 and conf 50 이상 and cls ok and obj gt) / (conf 50 이상)
precision = torch.sum(iou50 * detected_mask) / (conf50.sum() + 1e-16)

# (iou50 이상 and conf 50 이상 and cls ok and obj gt) / (전체 target)
recall50 = torch.sum(iou50 * detected_mask) / (obj_mask.sum() + 1e-16)

# (iou75 이상 and conf 50 이상 and cls ok and obj gt)
recall75 = torch.sum(iou75 * detected_mask) / (obj_mask.sum() + 1e-16)

Comments

The Beautiful Future

YoloV3 본문

YoloV3

Ground-Truth

티스토리툴바