Transformer Language Model

About 126 results

Open links in new tab

Past 24 hours

pytorch.org
https://pytorch.org › tutorials
Welcome to PyTorch Tutorials — PyTorch Tutorials 2.6.0+cu124 …
Accelerating PyTorch Transformers by replacing nn.Transformer with Nested Tensors and torch.compile() This tutorial goes over recommended best practices for implementing Transformers with native PyTorch.
pytorch.org
https://pytorch.org › docs › stable › generated › torch.nn.Transformer
Transformer — PyTorch 2.6 documentation
See this tutorial for an in depth discussion of the performant building blocks PyTorch offers for building your own transformer layers.
pytorch.org
https://pytorch.org › hub › huggingface_pytorch-transformers
PyTorch-Transformers
Model Description. PyTorch-Transformers (formerly known as pytorch-pretrained-bert) is a library of state-of-the-art pre-trained models for Natural Language Processing (NLP). The library currently contains PyTorch implementations, pre-trained model weights, usage scripts and conversion utilities for the following models:
pytorch.org
https://discuss.pytorch.org › why-do-we-flatten-the-output-in...
Why do we flatten the output in Language Model training?
Dec 21, 2023 · After reshaping the tensor, the model output, output_flat, is of size torch.Size([600, 4817]). My confusion is why we are training the model like this. Why are we flattening all the targets in the batch here and calculating the loss on the entire batch as a flattened-out object?
pytorch.org
https://pytorch.org › TensorRT › _notebooks › Hugging-Face-BERT.html
Masked Language Modeling (MLM) with Hugging Face BERT …
Masked Language Modeling (MLM) with Hugging Face BERT Transformer¶ Learning objectives ¶ This notebook demonstrates the steps for compiling a TorchScript module with Torch-TensorRT on a pretrained BERT transformer from Hugging Face, and …
pytorch.org
https://pytorch.org › tutorials › intermediate › TP_tutorial.html
Large Scale Transformer model training with Tensor Parallel (TP)
This tutorial demonstrates how to train a large Transformer-like model across hundreds to thousands of GPUs using Tensor Parallel in combination with Fully Sharded Data Parallel. It explains how to apply Tensor Parallel to different parts of …
pytorch.org
https://pytorch.org › ... › modelsyt_tutorial.html
Building Models with PyTorch — PyTorch Tutorials 2.6.0+cu124 …
A discussion of transformer architecture is beyond the scope of this video, but PyTorch has a Transformer class that allows you to define the overall parameters of a transformer model - the number of attention heads, the number of encoder & decoder layers, dropout and activation functions, etc. (You can even build the BERT model from this ...
pytorch.org
https://discuss.pytorch.org › in-torch-nn-functional-embedding-why...
In torch.nn.functional.embedding, why does padding_idx exist?
Mar 21, 2024 · I have a question about using padding indexes with input embedding layers. For context, suppose I am training a causally masked transformer language model, where sequences are always left-aligned in a batch, with padding…
pytorch.org
https://discuss.pytorch.org › how-to-use-nn-transformerdecoder-at...
How to use nn.TransformerDecoder() at inference time
Jul 2, 2019 · I am using nn.TransformerDecoder() module to train a language model. During training time, the model is using target tgt and tgt_mask, so at each step the decoder is using the last true labels. However, for text …
pytorch.org
https://pytorch.org › tutorials › beginner › torchtext_translation...
Language Translation with TorchText — PyTorch Tutorials 1.7.1 …
torchtext has utilities for creating datasets that can be easily iterated through for the purposes of creating a language translation model. In this example, we show how to tokenize a raw text sentence, build vocabulary, and numericalize tokens into tensor.
Pagination
- 1
- 2
- 3
- 4
- Next