YOLO 튜토리얼 1
2020/04/26 - [Programmer/Deep Learning] - YOLO 튜토리얼 1.YOLO 란?
YOLO 튜토리얼 2
Defining The Network
darknet.py에 darknet의 class를 정의
class Darknet(nn.Module):
def __init__(self, cfgfile):
super(Darknet, self).__init__()
self.blocks = parse_cfg(cfgfile)
self.net_info, self.module_list = create_modules(self.blocks)
Implementing the forward pass of the network
forward함수는 해당 layer의 입력을 받아서 출력할 결과를 계산한 후 return해줌
CUDA는 엔비디아(NVIDIA)에서 개발한 GPU개발 툴로 한번에 많은 양을 처리하기 위해서 사용한다.
def forward(self, x, CUDA): #해당 layer의 입력을 받아 출력할 결과를 계산한 후 return
modules = self.blocks[1:]
outputs = {} #We cache the outputs for the route layer
write = 0 #This is explained a bit later
for i, module in enumerate(modules):
module_type = (module["type"])
Convolutional and Upsample Layers, Route Layer/Shortcut Layer
if module_type == "convolutional" or module_type == "upsample":
x = self.module_list[i](x)
elif module_type == "route":
layers = module["layers"]
layers = [int(a) for a in layers]
if (layers[0]) > 0:
layers[0] = layers[0] - i
if len(layers) == 1:
x = outputs[i + (layers[0])]
else:
if (layers[1]) > 0:
layers[1] = layers[1] - i
map1 = outputs[i + layers[0]]
map2 = outputs[i + layers[1]]
x = torch.cat((map1, map2), 1)
elif module_type == "shortcut":
from_ = int(module["from"])
x = outputs[i-1] + outputs[i+from_]
elif module_type == 'yolo':
anchors = self.module_list[i][0].anchors
#Get the input dimensions
inp_dim = int (self.net_info["height"])
#Get the number of classes
num_classes = int (module["classes"])
#Transform
x = x.data
x = predict_transform(x, inp_dim, anchors, num_classes, CUDA)
if not write: #if no collector has been intialised.
detections = x
write = 1
else:
detections = torch.cat((detections, x), 1)
outputs[i] = x
return detections
YOLO(Detection Layer)
Yolo의 output은 feature map의 depth에 (혹은 dimension에) bounding box 속성들이 있는 feature map이다. Cell에 의해 predict된 attributes는 쭉 이어붙인 상태로 되어있다. 그러므로 예를 들어 (5,6) Cell 의 2번째 bounding box를 참조하고 싶다면
map[ 5, 6, (2-1) * (5+C) : 2 * (5+C) ] 의 인덱스를 참조해야 한다. 이전 글에서 언급했지만, 여기의 C는 class 개수이고 앞의 5는 xyhw와 confidence이다. 이 형식은 output을 처리하기가 굉장히 불편하다.
이 것들이 나누어진 tensor이 아닌 single tensor에 들어가 있게 하기 위해서, predict_transform 이라는 함수를 만들자.
Transforming Output
predic_transform 함수는 util.py에 작성할 것이며, darknet의 forward 시점에 사용할 것이다.
YOLO의 output을 처리하기가 불편하기 때문에 tensor가 아닌 single tensor에 들어가 있게 하기 위해서 만든다.
from __future__ import division
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
import numpy as np
import cv2
def predict_transform(prediction, inp_dim, anchors, num_classes, CUDA = True):
batch_size = prediction.size(0)
stride = inp_dim // prediction.size(2)
grid_size = inp_dim // stride
bbox_attrs = 5 + num_classes
num_anchors = len(anchors)
prediction = prediction.view(batch_size, bbox_attrs*num_anchors, grid_size*grid_size)
prediction = prediction.transpose(1,2).contiguous()
prediction = prediction.view(batch_size, grid_size*grid_size*num_anchors, bbox_attrs)
anchors = [(a[0]/stride, a[1]/stride) for a in anchors]
#Sigmoid the centre_X, centre_Y. and object confidencce
prediction[:,:,0] = torch.sigmoid(prediction[:,:,0])
prediction[:,:,1] = torch.sigmoid(prediction[:,:,1])
prediction[:,:,4] = torch.sigmoid(prediction[:,:,4])
#Add the center offsets
grid = np.arange(grid_size)
a,b = np.meshgrid(grid, grid)
x_offset = torch.FloatTensor(a).view(-1,1)
y_offset = torch.FloatTensor(b).view(-1,1)
if CUDA:
x_offset = x_offset.cuda()
y_offset = y_offset.cuda()
x_y_offset = torch.cat((x_offset, y_offset), 1).repeat(1,num_anchors).view(-1,2).unsqueeze(0)
prediction[:,:,:2] += x_y_offset
#log space transform height and the width
anchors = torch.FloatTensor(anchors)
if CUDA:
anchors = anchors.cuda()
anchors = anchors.repeat(grid_size*grid_size, 1).unsqueeze(0)
prediction[:,:,2:4] = torch.exp(prediction[:,:,2:4])*anchors
prediction[:,:,5: 5 + num_classes] = torch.sigmoid((prediction[:,:, 5 : 5 + num_classes]))
prediction[:,:,:4] *= stride
return prediction
Detection Layer Revisited
darknet.py file에와서 import를 추가해준다.
from util import *
Loading Weights :Weights file
def load_weights(self, weightfile):
#Open the weights file
fp = open(weightfile, "rb")
#The first 5 values are header information
# 1. Major version number
# 2. Minor Version Number
# 3. Subversion number
# 4,5. Images seen by the network (during training)
header = np.fromfile(fp, dtype = np.int32, count = 5)
self.header = torch.from_numpy(header)
self.seen = self.header[3]
weights = np.fromfile(fp, dtype = np.float32)
ptr = 0
for i in range(len(self.module_list)):
module_type = self.blocks[i + 1]["type"]
#If module_type is convolutional load weights
#Otherwise ignore.
if module_type == "convolutional":
model = self.module_list[i]
try:
batch_normalize = int(self.blocks[i+1]["batch_normalize"])
except:
batch_normalize = 0
conv = model[0]
if (batch_normalize):
bn = model[1]
#Get the number of weights of Batch Norm Layer
num_bn_biases = bn.bias.numel()
#Load the weights
bn_biases = torch.from_numpy(weights[ptr:ptr + num_bn_biases])
ptr += num_bn_biases
bn_weights = torch.from_numpy(weights[ptr: ptr + num_bn_biases])
ptr += num_bn_biases
bn_running_mean = torch.from_numpy(weights[ptr: ptr + num_bn_biases])
ptr += num_bn_biases
bn_running_var = torch.from_numpy(weights[ptr: ptr + num_bn_biases])
ptr += num_bn_biases
#Cast the loaded weights into dims of model weights.
bn_biases = bn_biases.view_as(bn.bias.data)
bn_weights = bn_weights.view_as(bn.weight.data)
bn_running_mean = bn_running_mean.view_as(bn.running_mean)
bn_running_var = bn_running_var.view_as(bn.running_var)
#Copy the data to model
bn.bias.data.copy_(bn_biases)
bn.weight.data.copy_(bn_weights)
bn.running_mean.copy_(bn_running_mean)
bn.running_var.copy_(bn_running_var)
else:
#Number of biases
num_biases = conv.bias.numel()
#Load the weights
conv_biases = torch.from_numpy(weights[ptr: ptr + num_biases])
ptr = ptr + num_biases
#reshape the loaded weights according to the dims of the model weights
conv_biases = conv_biases.view_as(conv.bias.data)
#Finally copy the data
conv.bias.data.copy_(conv_biases)
#Let us load the weights for the Convolutional layers
num_weights = conv.weight.numel()
#Do the same as above for weights
conv_weights = torch.from_numpy(weights[ptr:ptr+num_weights])
ptr = ptr + num_weights
conv_weights = conv_weights.view_as(conv.weight.data)
conv.weight.data.copy_(conv_weights)
model = Darknet("cfg/yolov3.cfg")
model.load_weights("yolov3.weights")
'Programmer > Deep Learning' 카테고리의 다른 글
YOLO ,object detection network의 이해 (0) | 2020.04.28 |
---|---|
[논문리뷰] Histograms of Oriented Gradients for Human Detection (0) | 2020.04.28 |
YOLO 튜토리얼 2. Creating the layers of the network architecture (0) | 2020.04.26 |
YOLO 튜토리얼 1.YOLO 란? (0) | 2020.04.26 |
[논문리뷰] Rapid Object Detection using a Boosted Cascade of Simple Features (0) | 2020.04.21 |