@teortaxesTex I think old AI guys who say something like “DL can’t get to true AGI” have a romantic view of artificial intelligence that far exceeds the natural one, which they also overrate. Seems that once this illusion breaks and you realize that curve-fitting is enough, you do a Hinton.
@PicoPaco17 the problem is that the models cannot learn new information except in pre-training. that’s what makes it not AGI . Intelligent beings are able to incorporate new pieces of knowledge on the fly. A DL model is a static, mathematical model
@teortaxesTex Not a necessary property of DL artifacts at all
@yar_vol DL is too generic to have any faults, it’s like saying “maths is enough for AGI”, well, yes, but current models do have catastrophic forgetting and issues with time.
@teortaxesTex ML discussions are often harmed by first-principles thinking. Images from metallurgy or cooking can be more edifying than math (from the wrong abstraction level). Eg finetuning vs pretraining. «it’s literally same loss!» Uh, yes I guess. And is annealing just melting?
@selfattentive What is the important difference between fine tuning and pre training as you see it
@teortaxesTex Batch size and LRS + starting from high-rank network amount to a different optimization regime. Check out https://arxiv.org/abs/2306.07042 https://arxiv.org/abs/2402.04362 (Remember the Sophon paper? I think it’s worst-case of what occurs naturally, a local optimum that your data will perturb)