linearmodels Check performance of PanelOLS and AbsorbingLS in large models

Check performance and defer expensive operations

Jan 04 '21 17:01 bashtage

Hi @bashtage! Are there any improvements on that? I also wanted to raise this issue. For my pretty large dataset, running FE panel model works ~20 times slower than just going manual OLS way with substracting corresponding mean values and utilizing np.linalg.solve. Any particular reasons for that?

Also I wanted to ask whether multiprocessing is used here, cannot figure out (but it seems that all of my cores are used)?

Mar 29 '21 15:03 OleksiiRomanko

There is no multiprocessing, but there should be multithreading. Can you post an example with a simulated dataset that is like the one you are fitting (similar group structure), along with the command.

One reason why it might be expensive is that it performs more checks then are necessary, and may also create new data structures. This is a cost but the benefit is large in terms of long term maintainability and shallowness of bugs. In complicated models, those with 2 effects and really large datasets, e.g., 5 million + rows with many relatively small groups (1million+ groups) it should be nearly identical to the best methods available.

Mar 29 '21 15:03 bashtage