Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It adds a very local sliding window attention, the context is only 3 adjacent frames per step. They need the access to adjacent frames to show the implicit model gradient computation but I didn't yet follow the derivation for why this is so.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: