bedshift
bedshift copied to clipboard
Performance improvement
Python Pandas is slow for the large number of single-row operations in bedshift. It may be faster to read bedfiles into a native object like a list or dictionary and conduct operations on it.
This has been mostly addressed with commit d6674fc084b0805aedd3337ef0bd7c1adfeab669. Cut and merge are the two slowest operations now, but can still complete reasonably quickly. I'll keep this issue open to see if we can move off of pandas in the future.
Also see #11