cherrymusic icon indicating copy to clipboard operation
cherrymusic copied to clipboard

Accurate and Efficient Search

Open systems-rebooter opened this issue 10 years ago • 4 comments

The question born from #551 and I think it deserve separate discussion :)

Per @tilboerner, CM have following great search operators:

You can use !f and !d (as the first or last term) to limit results to files or directories, respectively.

But still I am very interested to perform more accurate and efficient search with CherryMusic.

Right now, searching by 2 separate keywords let's say cherry music giving back folders and files containing following [sub]-strings:

[1] cherry 
[2] music
[3] cherry music
[4] cherrymusic
[5] cherrymerry
[6] musictusic

Those results are in random order and search page seems to be cluttered and not very accurate. It would be super-awesome to have "phrase search", which will help with this specific case.

Also, it would be soooo cool to implement searchable ID3 tags (#5) Please, please, please...

Thanks!

systems-rebooter avatar Apr 25 '15 10:04 systems-rebooter

Hey @systems-rebooter!

Sorry for the long wait! We want a better search as well, but searching is a tricky problem! Especially since we try to keep CM as light as possible (CM can perform searches on hundreds of gigabytes of data on a raspberry pi...!).

Anyway, we'd like to improve the search as well, and the best way to do so would probably be using word tuples to allow phrase searches and to search results rank better if the order of words is correct.

Right now we're using a heuristic to order the search results: you can look into resultorder.py if you're interested in how it works. Furthermore you can tweak the behavior of those results in the tweak.py. But beware that the ordering of those results happens after we've hit the database, so in some scenarios CM won't be able to find what you are looking for at all. The database search is optimized so that it looks for words that are less frequent first: You may have noticed that a search for the word it takes a lot more time than searching for gvbnko9ytff, now you know why.

As always thanks for your input, we're still thinking about rewriting larger parts of CM including, but it's always hard to maintain compability with everything before.

devsnd avatar May 17 '15 14:05 devsnd

Hey @devsnd

Thanks for the info! Yeah, word tuples sounds like wise idea!

For phrase search, searching by substring will be more then enough for now. Even if this substring is dummy and not contain actual words: "dum my sub string gggg" I don't know how hard is implement such kind of search (I'm not python guru yet, so for me its only all about choosing the delimiter and implement strict search by substring)

Maybe it will be faster to implement system level search with switch option in cherrymusic.conf to actual search algorithm. find is pretty much everywhere (even on Windows). In this case search will be granular and fast and allow cherrymusic to stay super-light.

systems-rebooter avatar May 18 '15 08:05 systems-rebooter

Maybe it will be faster to implement system level search with switch option in cherrymusic.conf to actual search algorithm. find is pretty much everywhere [...]

I don't think that's a good idea. Using find on a >1TB HDD completely filled with *.mp3 files would be considerably slower than the sqlite query. Additionally, a file system level search would need to spin up an external HDD (most modern HDDs support some sort of power saving mode with spin donw), which might take another 3-5 seconds. :-1:

6arms1leg avatar May 19 '15 10:05 6arms1leg

Yeah, agreed with you. That was absolutely bad idea :x: I'm dunno

systems-rebooter avatar May 19 '15 10:05 systems-rebooter