## Network Compression with Depthwise Separable Convolution

One way to achieve network compression is using depthwise separable convolution. Depthwise separable convolution is used in many pre-trained neural networks such as MobileNet, Xception, SqueezeNet and ShuffleNet.

Depthwise separable convolution consists of two steps. Step 1 performs a spatial convolution independently over each channel of an input. Step 2 performs a a pointwise convolution, i.e. 1×1 convolution, over the outputs from Step 1. The number of 1×1 convolution is the number of output channels.

In a conventional convolution, the number of parameters are (k*k*I)*O, while in depthwise separable convolution, the number of parameters are k*k*I + I*O. For large output channels, the reduction of parameters is approximately (1/k*1/k).

Tensorflow and Keras has a DepthwiseConv2D API. We evaluate DepthwiseConv2D API by comparing three models. Model 1, the control model, is the conventional CNN model. Model 2 replaces all the Conv2D with DepthwiseConv2D layers. Model 3 is a hybrid of CNN and DepthwiseConv2D layers. The model code is shown below:

# Model 1: Full Conv2D Model

model = Sequential([
MaxPooling2D(),
MaxPooling2D(),
Flatten(),
Dense(512,activation='relu'),
Dense(10,activation='softmax')
])
# Model 2: Fully DepthwiseConv2D layers

model = Sequential([
Input(shape=input_shape),
MaxPooling2D(),
MaxPooling2D(),
Flatten(),
Dense(512,activation='relu'),
Dense(10,activation='softmax')
])
# Model 3: Hybrid Conv2D and DepthwiseConv2D layers

model = Sequential([
Input(shape=input_shape),
MaxPooling2D(),
])