In the proposed method, we first localize the label-relevant regions in each transformer layer. Then we keep the label-relevant regions to mask the image and construct the masked image with ...