Abstract: Recently, Transformer has emerged as a new architecture in deep learning by utilizing self-attention without convolution. Transformer is also extended to Vision Transformer (ViT) for the ...