These datasets are preprocessed using the same preprocessing procedure described in the report Section 2.1. models that operate on bert-based (or RoBERTa-based) embeddings from huggingface's ...