determining protease transformer steps (iterations):

To calculate the number of steps (iterations) needed to run for 4 epochs, we use the following formula:

Given:

  • total_tokens_in_dataset = 589222838
  • num_epochs = 4
  • tokens_per_fwdbwd = 262144 (obtained from default setting in modded nanogpt)

Plugging in the values:

So, we should run approximately 8992 steps (iterations) to achieve 4 epochs. We eventually choose to run 9k steps.