Specifically I want to start by looking into the pytorch data loader that uses bin packing algorithm to load sentences into batches. Each sequence in each batch contains concatenated sentences under the max length (often set as an experiment hyperparameter).