Lightnews — Scholar-powered news

@causalwizard.bsky.social

36 followers 100 following 26 posts

Posts Replies Media Videos

causalwizard.bsky.social

@causalwizard.bsky.social

In addition, by analyzing the divergence of the latent state we found evidence that the resulting plans (paths) are more consistent over time.

Divergence from initial latent state during recurrent processing. When the latent state from the previous environmental time step is carried forward, the initial latent state is more similar to the final latent state.

October 29, 2025 at 7:35 AM

causalwizard.bsky.social

@causalwizard.bsky.social

We found that the HRM-Agent can learn to navigate in dynamic and uncertain maze environments, with doors which open and close randomly.

$Plot showing fraction of validation episodes in which the agent reached the goal, from 5 runs with carry Z condition and 5 runs with reset Z condition.$

October 29, 2025 at 7:31 AM

causalwizard.bsky.social

@causalwizard.bsky.social

The Hierarchical Reasoning Model (HRM) has impressive reasoning abilities given its small size, but has only been applied to supervised, static, fully-observable problems.

We wanted to see if we could train a HRM to navigate in a maze using only reinforcement learning.

Dynamic maze environment from the paper, a screenshot from the nethack learning environment (NLE)

October 29, 2025 at 7:30 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news