In https://arxiv.org/abs/2405.16684, Rohan used gzip compressibility as a proxy of text sample efficiecy.

  • people have explored similar reasonales in images in 2023, as shown by the seminal paper where data pruning breaks data-naive neural scaling laws.
  • what about audio? video (time-range as new parameter/axis)? …