We propose a novel Swin Transformer block to optimize feature extraction and enable the ... This facilitates efficient information flow between the Transformer encoder and CNN decoder. Finally, a ...
Notably, it exclusively employs the transformer encoder to process the deepest layer of the feature map. Afterwards, we introduce the efficient residual mixing block (ERM Block), in order to apply ...
Encoder-Decoder Structure: It consists of three encoder blocks, three decoder blocks, and additional upsampling blocks. Use of Pyramid Vision Transformer (PVT): The network begins with a PVT as a ...