This code was written in 2019, and I was not very familiar with transformer model in that time. So don't trust this code too much. Currently I am not managing this code well, so please open pull ...