python-rust-fst icon indicating copy to clipboard operation
python-rust-fst copied to clipboard

Initial attempt at efficient range queries across multiple Set objects.

Open davidblewett opened this issue 8 years ago • 6 comments

This is not compiling yet (using rustc 1.24.0-nightly (0a3761e63 2018-01-03)):

(dblewett)Wed Jan 10, 10:03 | /home/dblewett/src/python-rust-fst/fstwrapper
rusty% cargo build          
 Compiling fst-wrapper v0.3.0 (file:///home/dblewett/src/python-rust-fst/fstwrapper)
error[E0277]: the trait bound `for<'a> &fst::set::Stream<'_>: fst::Streamer<'a>` is not satisfied
 --> src/set.rs:152:36
  |
152 |     let ob = set::OpBuilder::new().add(stream);
  |                                    ^^^ the trait `for<'a> fst::Streamer<'a>` is not implemented for `&fst::set::Stream<'_>`
  |
  = help: the following implementations were found:
            <fst::set::Stream<'s, A> as fst::Streamer<'a>>

error[E0277]: the trait bound `for<'a> &fst::set::Stream<'_>: fst::Streamer<'a>` is not satisfied
 --> src/set.rs:168:8
  |
168 |     ob.push(stream);
  |        ^^^^ the trait `for<'a> fst::Streamer<'a>` is not implemented for `&fst::set::Stream<'_>`
  |
  = help: the following implementations were found:
            <fst::set::Stream<'s, A> as fst::Streamer<'a>>

error: aborting due to 2 previous errors

error: Could not compile `fst-wrapper`.

davidblewett avatar Jan 10 '18 15:01 davidblewett

@BurntSushi fstwrapper is now compiling; thanks!

@jbaiter However, now I'm getting segfaults in trying to test the functionality:

Python 3.6.3 (default, Oct  3 2017, 21:45:48)           
Type 'copyright', 'credits' or 'license' for more information                                                   
IPython 6.2.1 -- An enhanced Interactive Python. Type '?' for help.                                             

In [1]: import rust_fst                                 

In [2]: fst = rust_fst.Set('/home/dblewett/Desktop/bgzf-test/activity.fst')                                     

In [3]: fst.range_union(slice('1000~000003e8-0000-0054-01d3-8bdfe91ec2b7','1000~000003e8-0000-0054-01d3-8bdfe91ec2b7~~'), rust_fst.Set('/home/dblewett/Desktop/bgzf-test/index/nyt/2018/1/2/0/activity.fst'))                   
*** Error in `/home/dblewett/.virtualenvs/fst/bin/python3.6': free(): invalid pointer: 0x00007f88dd4f5cd8 ***   
zsh: abort (core dumped)  ipython

The API signature is a little awkward and should probably get some more thought, but I was just trying to test the plumbing without luck.

davidblewett avatar Jan 19 '18 03:01 davidblewett

Can you add a unit test for the range_union method? Then I can take a shot at debugging it :-) To fix the merge conflict, just move the new function signatures from _build_ffy.py to rust/rust_fst.h.

jbaiter avatar Jan 21 '18 19:01 jbaiter

@jbaiter @BurntSushi Sorry this got delayed so long. I have this working now.

davidblewett avatar Sep 12 '18 18:09 davidblewett

The tests pass for me (cPython 3.7, Ubuntu 18.04.1):

(fst-test) 
(dblewett)Wed Sep 12, 15:46 | /home/dblewett/src
rusty% py.test python-rust-fst/tests
==================================================================================================================== test session starts ====================================================================================================================
platform linux -- Python 3.7.0b3, pytest-3.8.0, py-1.6.0, pluggy-0.7.1
rootdir: /home/dblewett/src/python-rust-fst, inifile:
collected 47 items                                                                                                                                                                                                                                          

python-rust-fst/tests/test_map.py ..................                                                                                                                                                                                                  [ 38%]
python-rust-fst/tests/test_set.py .............................                                                                                                                                                                                       [100%]

================================================================================================================= 47 passed in 0.18 seconds =================================================================================================================```

davidblewett avatar Sep 12 '18 19:09 davidblewett

@jbaiter, @BurntSushi what do you think of this? I wasn't sure about the UnionSet name.

davidblewett avatar Sep 19 '18 14:09 davidblewett

@jbaiter is there interest in including this in this package? If not the entire PR, maybe the supporting C functions? it would be cumbersome to do completely outside this package.

davidblewett avatar Oct 31 '18 19:10 davidblewett