bedshift icon indicating copy to clipboard operation
bedshift copied to clipboard

Shift is too slow

Open aaron-gu opened this issue 4 years ago • 2 comments

The performance of shift is really slow. I think it can be improved if regions are not modified in place, but are added as new regions and old regions are removed.

aaron-gu avatar Feb 27 '21 21:02 aaron-gu

Well, my change to creating new regions and dropping old regions didn't help improve the shift performance by much. I think the slow part about shift is in the Pandas Dataframe accession, when the code needs to get the chromosome, start, and end position at a certain row. Now imagine when you have a 50,000 region BED file and a high shift rate of 0.8, the code will have to access a lot of regions iteratively.

aaron-gu avatar Feb 28 '21 17:02 aaron-gu

New idea:

  1. take a subset of the Dataframe, which will be the rows to modify
  2. use an apply function on the start and end columns to get shifted positions.
  3. Drop the old rows, and append this new Dataframe

aaron-gu avatar Feb 28 '21 17:02 aaron-gu