PhD student at Oxford
Former Microsoft AI4Science Intern
github.com/jla-gardner/...
github.com/jla-gardner/...
Note that these student models are of a different architecture to MACE, and in fact ACE is not even NN-based.
Note that these student models are of a different architecture to MACE, and in fact ACE is not even NN-based.
@ask1729.bsky.social
and others extract additional Hessian information from the teacher. Again, this works well providing you have a training framework that lets you train student models on this data.
@ask1729.bsky.social
and others extract additional Hessian information from the teacher. Again, this works well providing you have a training framework that lets you train student models on this data.
and others attempt to align not only the predictions, but also the internal representations of the teacher and the student. This approach works well for models with similar architectures, but is incompatible with e.g. fast linear models like ACE.
and others attempt to align not only the predictions, but also the internal representations of the teacher and the student. This approach works well for models with similar architectures, but is incompatible with e.g. fast linear models like ACE.
Various existing methods in the literature do this in different ways.
Various existing methods in the literature do this in different ways.