macarico icon indicating copy to clipboard operation
macarico copied to clipboard

Implement BanditLOLS

Open timvieira opened this issue 8 years ago • 2 comments

timvieira avatar Apr 27 '17 20:04 timvieira

basic implementation is done in https://github.com/hal3/macarico/blob/master/macarico/lts/lols.py

hal3 avatar Apr 27 '17 23:04 hal3

there's some super-ugliness in BanditLOLS/LinearPolicy that I'd like to get your take on (see lols.py:55,72 and init.py:82-85). the issue is that in order to do CS bLOLS, you need to remember the predicted costs at deviation time, so that you can set the cost vector and setup the regression problem at the end after you observe the reward. the current approach is to split this, but that's ugly. another option might be for LinearPolicy to provide something that returns a continuation? any other ideas?

hal3 avatar Apr 28 '17 14:04 hal3