Tom Sherborne
tomsherborne.bsky.social
Tom Sherborne
@tomsherborne.bsky.social
MTS @ Cohere on code. Views not my employer’s.
TRAM is part of my intern project with Hao Peng and Pradeep Dasigi at Allen AI with invaluable contributions from @nsaphra.bsky.social
October 11, 2023 at 9:33 AM
TRAM also improves the OOD epsilon sharpness (where SAM has little effect) with a stronger ID and OOD sharpness correlation. This suggests that SAM is only sharpness-aware within the training distribution.
October 11, 2023 at 9:32 AM
TRAM is SAM-style optimizer using an alternative to the rho hyperparameter. TRAM instead adapts to the trust region in the function space. TRAM strengthens the connection between task-specific performance and pre-trained structure for better zero-shot domain transfer and cross-lingual transfer.
October 11, 2023 at 9:32 AM