Implement BanditLOLS

Open timvieira opened this issue 8 years ago • 2 comments

Apr 27 '17 20:04 timvieira

basic implementation is done in https://github.com/hal3/macarico/blob/master/macarico/lts/lols.py

Apr 27 '17 23:04 hal3

there's some super-ugliness in BanditLOLS/LinearPolicy that I'd like to get your take on (see lols.py:55,72 and init.py:82-85). the issue is that in order to do CS bLOLS, you need to remember the predicted costs at deviation time, so that you can set the cost vector and setup the regression problem at the end after you observe the reward. the current approach is to split this, but that's ugly. another option might be for LinearPolicy to provide something that returns a continuation? any other ideas?

Apr 28 '17 14:04 hal3