It adds a very local sliding window attention, the context is only 3 adjacent fr...

		pitpatagain on Oct 28, 2024 \| parent \| context \| favorite \| on: State-space models can learn in-context by gradien... It adds a very local sliding window attention, the context is only 3 adjacent frames per step. They need the access to adjacent frames to show the implicit model gradient computation but I didn't yet follow the derivation for why this is so.