@chatroutes.bsky.social
Prefix caching and speculative decoding will definitely help to keep it fast and cheap.
Tip:
I hope you are using prompt templates to experiment with different variables.
Run 3 to 5 candidates with a temperature ladder (e.g., 0.2 / 0.7 / 1.1) and fixed top_p.
Tip:
I hope you are using prompt templates to experiment with different variables.
Run 3 to 5 candidates with a temperature ladder (e.g., 0.2 / 0.7 / 1.1) and fixed top_p.
chatroutes.com
September 9, 2025 at 12:53 PM
Prefix caching and speculative decoding will definitely help to keep it fast and cheap.
Tip:
I hope you are using prompt templates to experiment with different variables.
Run 3 to 5 candidates with a temperature ladder (e.g., 0.2 / 0.7 / 1.1) and fixed top_p.
Tip:
I hope you are using prompt templates to experiment with different variables.
Run 3 to 5 candidates with a temperature ladder (e.g., 0.2 / 0.7 / 1.1) and fixed top_p.