but without caching internal intermediate values (that are only valid to cache if you are strictly _extending_ the context) the compute cost for each token would grow quadratically with the context length
but without caching internal intermediate values (that are only valid to cache if you are strictly _extending_ the context) the compute cost for each token would grow quadratically with the context length
for gui stuff it kinda speaks wayland now
for gui stuff it kinda speaks wayland now
That prompt seemingly caused the LLM to become Nazi 4Chan, and has now been deleted, but other recent changes remain.