[Feature Request]: Add exclusive And/Or search options
Checklist
- [X] I am using an up-to-date version.
- [X] I have read the documentation.
- [X] I have searched existing issues.
Description
Currently, the search can only be set to either And (Includes all Tags) and Or (Includes any Tag). I just had the problem of trying to find an image that has two tags, but no others, which seemingly I have to do manually searching for all the tags I want and filtering the ones I don't want myself.
Solution
I think adding the options Ex. Or (Exclusively includes any Tags) Ex. And (Exclusively includes all Tags) would be great additions to the versatility of Tagstudio!
EXAMPLE
I have a database of many images from and about the bocchi the rock anime, which among others includes the tags:
Kita Bocchi Nijika Ryou
There are all combinations of images tagged, some with only one, some with multiple, some with all characters tagged.
Searching for "Kita, Bocchi" in Exclusive Or mode would result in all images that only have the kita tag, only have the bocchi tag or only have the bocchi and kita tags, and no others.
Searching for "Nijika, Ryou" in Exclusive And mode would result in all images that only have the Nijika and the Ryou tags, and no others.
Alternatives
The naming "Ex. Or" and "Ex. And" could probably be improved, but I can't think of a better solution currently (In my defense, i am writing this at 1 in the morning after having a horrible sleep)
This is related to #112 (as a part of larger discussion about search queries) Also useful as a confirmation of user demand for more Boolean operators.
Exclusive OR is commonly written as XOR: https://en.wikipedia.org/wiki/Exclusive_or There is no such thing as Exclusive AND, but there are such operators as XNOR, NAND.
Truth tables:
| A | B | AND | OR | XOR | XNOR | NAND | NOR |
|---|---|---|---|---|---|---|---|
| ❌ | ❌ | ❌ | ❌ | ❌ | ✔ | ✔ | ✔ |
| ❌ | ✔ | ❌ | ✔ | ✔ | ❌ | ✔ | ❌ |
| ✔ | ❌ | ❌ | ✔ | ✔ | ❌ | ✔ | ❌ |
| ✔ | ✔ | ✔ | ✔ | ❌ | ✔ | ❌ | ❌ |
This is related to #112 (as a part of larger discussion about search queries) Also useful as a confirmation of user demand for more Boolean operators.
Exclusive OR is commonly written as XOR: https://en.wikipedia.org/wiki/Exclusive_or There is no such thing as Exclusive AND, but there are such operators as XNOR, NAND.
Truth tables: A B AND OR XOR XNOR NAND NOR ❌ ❌ ❌ ❌ ❌ ✔ ✔ ✔ ❌ ✔ ❌ ✔ ✔ ❌ ✔ ❌ ✔ ❌ ❌ ✔ ✔ ❌ ✔ ❌ ✔ ✔ ✔ ✔ ❌ ✔ ❌ ❌
@KillyMXI, I don't believe that @coolesding was referring to exclusive OR the boolean operation. As I understand it, Coolesding was hoping to exclude entries from their searches without explicitly typing out the tags they want to exclude. As I understand it, the premise is that if the only tags Coolesding explicitly types out are Kita and Bocchi, then Coolesding doesn't want entries tagged Nijika or Ryou appearing in the search.
Personally, I strongly believe that Coolesding should be able to make the library work that way, but I really dislike the idea of adding features that only work if you only have a single category of tags. Coolesding ought to be able to use non-character tags without breaking the search. For example, if Coolesding adds a "text" tag to some of the entries, then there will be no way to include both entries with text and entries without text in an "Ex. And" search.
What I would suggest is the solution is adding tags to each file for the number of characters. Eg. 1_character, 2_characters, 3_or_more_characters... This is actually something I do in my own library. Then Coolesding can perform the example "Ex. And" search with this:
Nijika Ryou 2_characters
Unfortunately I can't think of a concise way of getting "Ex. Or" to work, even if full boolean syntax is implemented. With the example given at least, boolean syntax would allow the "Ex. Or" search to be done with this:
( Kita OR Bocchi AND 1_character ) OR ( Kita AND Bocchi AND 2_characters )
But with three characters...
( Kita OR Bocchi OR Nijika AND 1_character ) OR ( Kita AND Bocchi AND 2_characters ) OR ( Kita AND Nijika AND 2_characters ) OR ( Bocchi AND Nijika AND 2_characters) OR ( Kita AND bocchi AND Nijika AND 3_characters )
Depending on the number of character tags, it would probably be easier to just manually exclude every other character. Eg.
Kita OR Bocchi OR Nijika AND NOT ( Ryou OR Gotou_Futari OR Gotou_Michiyo OR Goutou_Naoki OR [...] OR untagged_characters )
Or depending on the number of entries in the library, to just perform a preliminary search, and ignore any entries that aren't relevant.
Kita OR Bocchi OR Nijika AND ( 1_character OR 2_characters OR 3_characters )
Does anyone else have any thoughts on this issue?
Right, I misinterpreted the issue text because I had strong and different interpretation of those terms in my head.
What booru-like systems such as Hydrus can offer to allow a search query like this:
- namespaces, such as
character: - wildcards, such as
character:* - system meta tags to limit on the number of tags,
system:number_of_tags=2
Number of tags within a certain namespace is tricky though. 2_characters is definitely a working workaround, but it might get annoying to maintain. There are certain problems in boorus with those numbering tags...
If we ignore the possibility of other tags, examples from OP can potentially look like this:
(Kita OR Bocchi) AND system:number_of_tags<=2
Nijika AND Ryou AND system:number_of_tags=2
But that's not a very practical assumption - it's natural to expect more tags besides characters.
Limiting the number of tags within a namespace or wildcard can be an interesting design challenge. My momentary thought is that the Set Theory might be helpful alongside the Boolean algebra to describe this. I'll try explain it later, along with some other suggestions I had previously and relevant to this.
That's really interesting @KillyMXI. CyanVoxel actuallly has tag categories as a planned feature: https://github.com/TagStudioDev/TagStudio/blob/main/doc/library/tag_categories.md I don't know how namespaces work in Hydrus, but the concept of tag categories may be similar.
Also, your first example of (Kita OR Bocchi) AND system:number_of_tags<=2 doesn't do exactly what Coolesding asked for, since an entry with Kita and Nijika would match that search as well. Though that's not a unique problem. If you search for black_clothes shirt in a different library then that will match entries with black shirts, but it will also match non-black shirts if there are black clothes elsewhere in the image. There isn't really a solution besides creating tags for every possible combination, doing hardcore boolean reasoning, or just ignoring irrelevant entries with one's own mind.
Dang, I goofed twice in one thread...
So, within the same constraints, the first example can be fixed like this:
((Kita OR Bocchi) AND system:number_of_tags=1) OR ((Kita AND Bocchi) AND system:number_of_tags=2)
I think this creates a stronger case for Set Theory. I'm not aware of it being used the same way, so it might become a strong competitive advantage for TagStudio. But this also means low familiarity and the necessity to invent the syntax for it.
The OP examples can be formulated as following:
the set of file tags is a subset of {Kita, Bocchi}
the set of file tags is equal to {Nijika, Ryou}
This can then be improved by limiting to character tags:
the set of file tags in character namespace is a subset of {character:Kita, character:Bocchi}
the set of file tags in character namespace is equal to {character:Nijika, character:Ryou}
To make this possible, few features needed:
- being able to define sets of literal values (such as
{Kita, Bocchi}or{1, 2}) - being able to define sets of queried values (such as a set of "all file tags" or "tags in a wildcard or a namespace" or satisfying any other predicate)
- being able to use set operations (such as "is a subset/superset" (⊆, ⊇), "is a proper subset/superset" (⊂, ⊃), equivalence, union of sets (⋃), intersection of sets (⋂), difference of sets (\), symmetric difference (△), size of set)
What syntax can look like:
- curly braces are universal syntax for sets
- literals:
{Kita, Bocchi}or{Kita Bocchi}- avoiding punctuation might be desirable, but raises the concern about allowing operator-less syntax elsewhere - AND/OR interpretation is kind of murky
- empty set:
{} - symbols require plain English words or equivalent Boolean operators because there are no alternative symbols present on a common keyboard
- "{A} is a subset of {B}" =
{A} in {B} - "{A} is a superset of {B}" =
{A} includes {B}or{A} contains {B} - "{A} equals {B}" =
{A} = {B}or{A} is {B} - proper (strict) subset/superset can probably be ignored as less useful in practice
- union is equivalent to OR operation but applied to sets
- intersection is equivalent to AND operation but applied to sets
- difference is equivalent to subtraction, except we don't have it, closest Boolean equivalent is the combination
{A} and not {B}, can probably live with that - symmetric difference is equivalent to XOR operation but applied to sets
- note: if one of operands is not a set but literal, it can be made into a set (lifted) implicitly
- need to check for possible implications of this, there might be pros and cons
- "{A} is a subset of {B}" =
- syntax for set size might be tricky
- (here,
num_opis any of supported numeric comparators) - function-like syntax:
size({A}) num_op 2- would work better for prefix grammar, not so natural for infix grammar, but more function-like syntax may appear later for other features - property-like syntax:
{A}.size num_op 2or{A}:size num_op 2- not like anything on the table for the grammar, so creates many questions - extending on reserved keywords:
size_of:{A} num_op 2- I don't like it, I'd prefer any reserved keywords be gated after their own namespace like in Hydrus, but this might look not so bad with other current proposals - implicit:
{A} num_op 2- no new syntax, most clean but can be somewhat obscure, makes impossible to do set operations with size
- (here,
- queried sets might be tricky
-
{character:*}- wildcard inside curly braces is what comes to mind first -
{character:*, Ryou}- can potentially mix with literals -
{all_tags}or{tag:*}or something else - not sure how to go about this, depends on other considerations that are outside of scope of this issue
-
Our examples may look like this:
{all_tags} in {Kita, Bocchi}
{all_tags} = {Nijika, Ryou}
{character:*} in {character:Kita, character:Bocchi}
{character:*} is {character:Nijika, character:Ryou}
And I overlooked one more thing:
Empty set (no tags) should be in any other set, but it is often not practical.
Here, it will also match files without tags.
Can be fixed in query like this:
{all_tags} in {Kita, Bocchi} and {all_tags} != {}
{character:*} in {character:Kita, character:Bocchi} and {character:*} != {}
But this will be a common inconvenience. Empty set might be handy in different situations, and prohibiting it also makes the system unsound, so I don't think it is an option. Instead, it might be practical to introduce some kind of shorthand for non-emptiness of a queried set.
Definitions of proper (strict) subset/superset does not fit this issue exactly - they work at the wrong end of it.
What is needed are variations on subset/superset operator:
- "{A} is a non-empty subset of {B}"
- "{A} is a superset of non-empty {B}"
Asked ChatGPT whether there is a common notation for this, there seems to be none, and ChatGPT suggests introducing custom notation, so:
{all_tags} in! {Kita, Bocchi}
{character:*} in! {character:Kita, character:Bocchi}
This is probably most unambiguous way to introduce the non-emptiness clause at the right place.
I've no idea what separate single English words can be used instead and be clean about the distinction.
This assumes there is no conflict with proper (strict) subset/superset. Even if they are not needed, may be worth to think how they might be distinguished. Maybe p_in, p_includes, or using different suffix symbols for non-emptiness and strictness.
Not really considering {A} < {B}, {A} <= {B}, {A} >= {B} and {A} > {B}, since it might be confusing what is being compared. Size comparison is more expected, so can't repurpose the same symbols.
Attaching non-emptiness condition to queried set rather than operator will create different problems, it doesn't have good behavior there.
I can't comment on Tag Categories. One sentence description gives me no understanding, without also being an active user of TagStudio currently.
So as to not clog up the issue log with yet another issue related to search (as far as I can tell, #202, #272, #325, and this issue are all talking about things that may overlap), I am going to "hijack" this issue for the new search system.
As it stands, I don't see the need for a significant portion of the syntax discussed here to be implemented. Maybe I'm wrong, but it seems like if someone knew enough about set theory to make queries based on it, they could just query the SQL database directly, but maybe that wasn't an option at the time this was discussed.
My idea for the search engine is to be relatively simple, with the queries that are already implemented as a starting point: mediatype, filetype, path, tag; the basic "set operators", if you want to call them that: AND, OR, and NOT; and some basic grouping to allow more advanced queries, such as
mediatype:photoshop AND (path:books/* OR path:magazines/*).
Due to the above (mostly expressions), I don't see regexes as a sane way of implementing the search engine. We will likely need a tokenizer/parser that will generate SQL queries for the user, but that shouldn't be difficult. Either way, it will make it much easier to have a "formal grammar" of the search queries, which I propose as the following: (grammar syntax reference)
expression: query | "(" expression binary_operator expression ")" | unary_operator expression
query: ("mediatype"|"filetype"|"tag"|"path") ":" LITERAL
binary_operator: ("AND" | "OR" | "NOT")
unary_operator "-"
Where LITERAL is a number or a string.
@CyanVoxel based on what we've been talking about I think the query grammar is basically correct, but if you have anything you want to add, let me know.
bruh
While I understand that having a search engine with all the features discussed would be cool, there needs to be a tradeoff. Even if the entire user base would use all of these features, we can't have a search engine so complicated to implement that only the original implementer understands it, and truthfully, I don't see a significant portion of the user base using what was proposed. Of course, maybe I'm wrong. I would love to see an implementation, from you or anybody else, that does everything that was discussed - then I wouldn't have to do it.
I guess I should also clarify, this doesn't mean what I proposed will be all the search engine will ever be. The 9.5 release is coming up, and we really need a search engine that is better than the current one. In the future, though, we could see a lot more added. The metadata search in particular (#272) is something I hadn't thought of, but definitely seems like something that should be added.
Hijacking this issue was a bad idea. No matter what CianVoxel thinks, you are spamming for people who were only interested in this particular feature that points at the limitation of typical AND/OR queries.
I don't care if my proposal gets implemented, that's another question. I'm trying my best to formulate a solution for this specific problem. My proposal doesn't describe the entirety of the grammar, only the novel part specific for this issue (and my optional assumptions about namespaces, which is optional). Neither it touches any underlying implementation.
You came here with irrelevant topic that probably fits #325 better.
In context of this issue, all that matters is whether this problem of this issue is something actual maintainers interested to address. Anything else about implementation details of the search engine only dilutes this topic for no-one's benefit.
Rereading.
the basic "set operators", if you want to call them that: AND, OR, and NOT;
These are not "set operators", these are Boolean operations (https://en.wikipedia.org/wiki/Boolean_algebra). Therefore, your proposal has nothing to do with this issue and my proposal.
This issue exists due to Boolean operations alone being inconvenient for certain searches. Set operations conceptually very simple but distinct from Boolean operations. They can offer a nice way to express certain searches but would require some additional concepts for the grammar and underlying query engine.
Exact implementation of the search engine doesn't matter, as long as no other features hijack the same syntax for sets first. This issue can be considered for implementation at a later date (if deemed valuable enough by maintainers). But if it is of interest, it may make sense to keep the grammar requirements from being occupied by other features accidentally.
I mention my idea of user-defined namespaces in my proposal, but that is nonessential for this issue. If there are only predefined prefix keywords - few lines can be safely taken out from my proposal without much loss. But something else might be needed in that case - more powerful wildcard (while not full regexes) or properties that can act in a similar way.
This title of this feature request is "Add exclusive And/Or search options", a description of an implementation for the requested feature is not hijacking the thread. This was not asking for a discussion that points at the limitation of typical AND/OR queries. If you feel you are being spammed, you can unsubscribe from notification by clicking the unsubscribe button under the "notifications" header. FWIW, I like the described implementation and if you feel it's not up to what you were hoping for, please raise a concern respectfully or open your own feature request. Thank you.
You may notice from previous discussion it took me a while to understand the actual problem behind this issue as well.
The title is not very descriptive of the problem being asked by the author. They are not a computer scientist, and even for me it is not immediately obvious how to name this issue better.
Once I understood the original question, it became obvious that it is best described in the language of Set theory. The set of allowed tags is given, and the author desires to find files consisting of tags belonging to or being exactly this set and no tags outside of this set. Having to describe all combinations of tags meeting the criteria in terms of Boolean algebra becomes inconvenient (Long expression of all possible combinations.)
What author, with the lack of better words, called "exclusive OR" and "exclusive AND" are actually equivalent to Set operations (subset and equal set).
I don't want to unsubscribe from an issue with my proposal in it. I agree Python357-1 were trying to address this issue, but it seems the proposal was as misguided as my initial attempts, due to the issue being challenging to wrap ones head around.
As I mentioned at the top of #325 back in July, this issue here - #314 - was being used to track proper boolean search for the project. The issue description specifically mentions the AND and OR mode dropdown featured in the program's interface as a holdover for a more comprehensive boolean search: "Currently, the search can only be set to either And (Includes all Tags) and Or (Includes any Tag)"
Hijacking this issue was a bad idea. No matter what CianVoxel thinks, you are spamming for people who were only interested in this particular feature that points at the limitation of typical AND/OR queries.
If you take a look at the issue description, you can see that it's referencing the limitations of the pre-existing search system active in the program. What is more likely, that this issue was created as a place to chat about an existing limited feature, or as a place to discuss a more comprehensive boolean search system than an AND-only plus OR-only dropdown modes?
Furthermore, this is a GitHub Issue for an open source project - not some Reddit thread from years ago. A project of this size with a single maintainer doing this in his spare time is going to have some gaps in activity across different feature requests. It is your choice to stay subscribed to this Issue, but it's not even remotely fair to accuse someone of "spamming" you personally with genuine helpful discussion on how to move forward with an important feature. The reason these issues exist at all is to provide an open space for discussion and collaboration about relevant issues - not to complain about getting pinged.
I don't care if my proposal gets implemented, that's another question. I'm trying my best to formulate a solution for this specific problem. My proposal doesn't describe the entirety of the grammar, only the novel part specific for this issue (and my optional assumptions about namespaces, which is optional). Neither it touches any underlying implementation.
You came here with irrelevant topic that probably fits #325 better.
Proposing that binary operators ("AND" | "OR" | "NOT") are irrelevant to a tracked issue for boolean search is simply absurd.
In context of this issue, all that matters is whether this problem of this issue is something actual maintainers interested to address. Anything else about implementation details of the search engine only dilutes this topic for no-one's benefit.
What matters to me is implementing an easy to approach and exceedingly useful search engine for the program I started, not managing forum drama, and the boolean-based search system described in #600 does exactly that. I will be closing this issue as the remainder of useful discussion as moved to #600 and its respective PR, #606.