[CS231n 2016] 7강 Convolutional Neural Network

컴퓨터비전

[CS231n 2016] 7강 Convolutional Neural Network

수만이 2022. 7. 11. 14:21

Convolution Layer

이미지의 width, height, depth

필터는 이미지의 depth와 같다.

이미지에 필터를 convolve, 이미지에 공간적으로 슬라이딩하여 dot product 연산을 수행하는 것

한 로케이션 당 75번의 연산으로 한 숫자를 만들어 낸다.

one filter, one activation map

re-representation 했다.

필터의 개수 = 다음 이미지의 depth, 필터의 depth = 이미지 depth

첫번째 필터는 이미지의 weight를 시각화한것, 뒤의 것들은 앞단의 filter에 기반해서 시각화한 것.

hierarchy 구조로 복잡한 이미지를 인식하게 됨.

필터와 signal(image), 2개의 signal이 convolution 작용을 하기 때문에 convolutional layer라고 불린다.

output size의 일반화

패딩을 사용하여 size를 보존할 수 있다.

이미지의 사이즈가 계속 줄어들면 volume 자체가 shrink해버린다. 너무 빠른 shrinking은 좋지 않다...

Ex 1)

Input volume: 32x32x3 10 5x5 filters with stride 1, pad 2 Output volume size: ?

=> Output volume size: (32+2*2-5)/1+1 = 32 spatially, so 32x32x10

Ex 2)

Input volume: 32x32x3 10 5x5 filters with stride 1, pad 2 Number of parameters in this layer?

each filter has 5*5*3 + 1 = 76 params (+1 for bias) => 76*10 = 760

1x1 filter? -> 의미가 있다.

각각은 input의 작은 부분과 연결되어 있다. -> local connectivity

모든 뉴런들을 동일한 파라미터를 공유한다.

위와 같은 그림의 뉴런들은 같은 local connectivity를 가지지만, 동일한 파라미터를 공유하고 있지 않다.

즉, 동일한 depth내의 뉴런들은 parameter sharing을 하지만, 별개의 activation map에서 동일한 위치에 있는 뉴런들은 local connectivity를 가지는 같은 곳을 쳐다본다.

Pooling Layer

- representation을 작게, 더 관리할 수 있도록 만들어 준다.

- 각각의 activation map에 독립적으로 작용한다.

MAX pooling

Fully Connected Layer (FC layer)

- 열벡터화해서 fully connected하게 연결해 matrix multipication 연산을 해 마지막으로 softmax에서 분류를 하게끔 해준다.

Case Study

LeNet-5

AlexNet

Input: 227x227x3 images

First layer (CONV1): 96 11x11 filters applied at stride 4

=> Q: what is the output volume size?

(227-11)/4+1 = 55, [55x55x96]

=> Q: What is the total number of parameters in this layer?

Parameters: (11*11*3)*96 = 35K

Input: 227x227x3 images

After CONV1: 55x55x96

Second layer (POOL1): 3x3 filters applied at stride 2

=> Q: what is the output volume size?

(55-3)/2+1 = 27, [27x27x96]

=> Q: what is the number of parameters in this layer?

0, pooling layer에서는 파라미터가 없다.

ZFNet

VGGNet

FC가 유용하지 않다(param이 너무 많음)는 의견이 나와 요즘에는 average pooling을 자주 사용한다.

=> 성능이 좋지만 복잡하다. FC layer를 average pooling으로 대체하여 파라미터 수를 줄였음.

ResNet

기존의 net들이 layer가 많아짐에 따라 error가 커지는 경향이 보임을 지적함.

resnet은 layer가 많아짐에 따라 error가 적어지는 경향을 보임. 많은 layer를 사용함.(152)

- Batch Normalization after every CONV layer

- Xavier/2 initialization from He et al.

- SGD + Momentum (0.9)

- Learning rate: 0.1, divided by 10 when validation error plateaus

- Mini-batch size 256

- Weight decay of 1e-5

- No dropout used

요약

- ConvNets stack CONV,POOL,FC layers

- Trend towards smaller filters and deeper architectures

- Trend towards getting rid of POOL/FC layers (just CONV)

- Typical architectures look like [(CONV-RELU)*N-POOL?]*M-(FC-RELU)*K,SOFTMAX where N is usually up to ~5, M is large, 0 <= K <= 2.

- but recent advances such as ResNet/GoogLeNet challenge this paradigm