Hal Daumé III
Hal Daumé III
There are two equivalent definitions of a CONCAVE function. The first is that it’s second derivative is always non-negative. The second, more geometric, defition is that any chord of the...
I just pushed a change that implements this. https://github.com/hal3/macarico/blob/reorg/macarico/base.py#L80 for the interface and https://github.com/hal3/macarico/blob/reorg/macarico/policies/linear.py for some examples. any thoughts?
it should only require pytorch unless i forgot something. what errors are you getting?
I’m currently running python 3.5.4 with torch 0.4.1. If you go to the tests directory and run `python test_randomly.py` what happens? - hal (👨🔬 MSR-NYC ↔ 👨🏫 UMD ○ 🌐hal3.name...
for edit distance? i have one too ;). i wonder which is better
shouldn't this be a tail sum?
i'm not sure i agree. why should the env have to know how the RL algo works? and also not all RL algs will want tail sums
ugh gamma. i think this should be an argument of the RL algorithm. there's good reason (eg Nan Jiang's work) to think you migth want to learn with a different...
basic implementation is done in https://github.com/hal3/macarico/blob/master/macarico/lts/lols.py
there's some super-ugliness in BanditLOLS/LinearPolicy that I'd like to get your take on (see lols.py:55,72 and __init__.py:82-85). the issue is that in order to do CS bLOLS, you need to...