Stream search finds messages that doesn't belong to its index set
Expected Behavior
Stream search should only find messages that corresponds to its index set
Current Behavior
Stream search finds all messages matching its rules (even if they stored in different index sets, but it's true only for rotated indexes)
Possible Solution
Remove stream tag from the message if that stream doesn't belong to its index set
Steps to Reproduce
- Create two streams with similar rule sets that belong to different index sets
- Enable checkbox "Remove matches from 'All messages' stream" for the stream
- Load messages that should go to both streams
- Rotate their indexes
- Search
Context
We have two streams with similar rules: D2Requests and D2UserRequests (one of them is a subset of the other one), they belong to different index sets (first one -- default "graylog_", second -- custom with a longer lifetime "lt_graylog_").
When you search in D2UserRequests stream there are duplicate messages but only for a rotated "default" indexes:
Default index has been rotated at ~11:20.
Same message, same stream -- different indexes.
If you search "All messages" stream for messages that applied to D2UserRequests rules it finds nothing, so checkbox is working there.
Your Environment
- Graylog Version: 2.4.3+2c41897
- Elasticsearch Version: 5.6.4
- MongoDB Version: 3.4.10
- Operating System: Linux 4.10.0-42-generic #46~16.04.1-Ubuntu SMP Mon Dec 4 15:57:59 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Hey @2fa, thanks for opening this issue. I am not sure I understood the problem, so a few questions:
- What happened at 11:20? Manual deflector cycle for the default index set or the automatic cycle?
- Did you toggle the "Remove matches from 'All messages'" intermittently?
- Did you check the period which shows increased message volume for the stream for duplicate messages?
@dennisoelkers thanks for your time. Here's some info:
- Automatic cycle for the default index set. (It happens every 1-1.5 hours)
- We've enabled "Remove matches from 'All messages'" couple of weeks ago, when we found out that there's duplicate message and didn't toggle setting since that time. It's working only when you search in 'All messages' stream.
- Yes, if i understood you correctly. Every message appear twice from the moment in time older than boundaries of active default index.
@2fa: One last thing, can you post details for two messages which appear twice?
Sure thing, i hope that API version is better:
http://XXX/api/messages/graylog_2136/b7a4d965-2857-11e8-8999-00259061c4c5
{
"message": {
"fields": {
"ctx_route_url": "/l/{postId}",
"__method": "GET",
"gl2_remote_ip": "10.3.1.149",
"gl2_remote_port": 58523,
"source": "BACKEND-P8:/LM/W3SVC/3/ROOT",
"__version_int": 2,
"gl2_source_input": "5a0c4e9162832a0cb06a3e6a",
"ctx_device_ClientType": "Browser",
"ctx_request_ip": "217.66.159.185",
"thread_id": "91",
"identity": "Dnet.Unauthenticated:fsA5-knCAlI",
"from": "Bru.Web.LayeredHttpApplication",
"gl2_source_node": "8991e66e-0104-4555-8e93-dbc84958fc5b",
"ctx_device_Platform": "Android",
"timestamp": "2018-03-15T13:49:44.373Z",
"authentication": "Dnet.Unauthenticated",
"ctx_device_FormFactor": "Smartphone",
"process": "4512:2",
"level": 5,
"ctx_connection_id_ulong": "1214558930",
"streams": [
"5a0ef85762832a790443693a",
"5a8d857de1b1f3511350e078"
],
"__url": "/l/6375658/",
"message": "?",
"event_id": "8",
"ctx_current_path": "/l/6375658/",
"__agent": "Mozilla/5.0 (Linux; U; Android 6.0; zh-CN; MZ-U10 Build/MRA58K) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/40.0.2214.89 MZBrowser/6.9.410-2017122215 UWS/2.11.0.33 Mobile Safari/537.36",
"correlation_id": "02b45de4457b97ce2d992ca5447b1987",
"ctx_request_path": "/l/6375658/",
"_id": "b7a4d965-2857-11e8-8999-00259061c4c5",
"time": 1521121784372.5862
},
"streams": [],
"index_sets": [],
"source_input_id": null,
"filter_out": false,
"journal_offset": -9223372036854776000,
"source": "BACKEND-P8:/LM/W3SVC/3/ROOT",
"timestamp": "2018-03-15T13:49:44.373Z",
"field_names": [
"ctx_route_url",
"__method",
"gl2_remote_ip",
"gl2_remote_port",
"source",
"__version_int",
"gl2_source_input",
"ctx_device_ClientType",
"ctx_request_ip",
"thread_id",
"identity",
"from",
"gl2_source_node",
"ctx_device_Platform",
"timestamp",
"authentication",
"ctx_device_FormFactor",
"process",
"level",
"ctx_connection_id_ulong",
"streams",
"__url",
"message",
"event_id",
"ctx_current_path",
"__agent",
"correlation_id",
"ctx_request_path",
"_id",
"time"
],
"complete": true,
"field_count": 31,
"validation_errors": "",
"fields_entries": [
{
"ctx_route_url": "/l/{postId}"
},
{
"__method": "GET"
},
{
"gl2_remote_ip": "10.3.1.149"
},
{
"gl2_remote_port": 58523
},
{
"source": "BACKEND-P8:/LM/W3SVC/3/ROOT"
},
{
"__version_int": 2
},
{
"gl2_source_input": "5a0c4e9162832a0cb06a3e6a"
},
{
"ctx_device_ClientType": "Browser"
},
{
"ctx_request_ip": "217.66.159.185"
},
{
"thread_id": "91"
},
{
"identity": "Dnet.Unauthenticated:fsA5-knCAlI"
},
{
"from": "Bru.Web.LayeredHttpApplication"
},
{
"gl2_source_node": "8991e66e-0104-4555-8e93-dbc84958fc5b"
},
{
"ctx_device_Platform": "Android"
},
{
"timestamp": "2018-03-15T13:49:44.373Z"
},
{
"authentication": "Dnet.Unauthenticated"
},
{
"ctx_device_FormFactor": "Smartphone"
},
{
"process": "4512:2"
},
{
"level": 5
},
{
"ctx_connection_id_ulong": "1214558930"
},
{
"streams": [
"5a0ef85762832a790443693a",
"5a8d857de1b1f3511350e078"
]
},
{
"__url": "/l/6375658/"
},
{
"message": "?"
},
{
"event_id": "8"
},
{
"ctx_current_path": "/l/6375658/"
},
{
"__agent": "Mozilla/5.0 (Linux; U; Android 6.0; zh-CN; MZ-U10 Build/MRA58K) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/40.0.2214.89 MZBrowser/6.9.410-2017122215 UWS/2.11.0.33 Mobile Safari/537.36"
},
{
"correlation_id": "02b45de4457b97ce2d992ca5447b1987"
},
{
"ctx_request_path": "/l/6375658/"
},
{
"_id": "b7a4d965-2857-11e8-8999-00259061c4c5"
},
{
"time": 1521121784372.5862
}
],
"stream_ids": [
"5a0ef85762832a790443693a",
"5a8d857de1b1f3511350e078"
],
"is_source_inet_address": true,
"inet_address": "10.3.1.149",
"message": "?",
"id": "b7a4d965-2857-11e8-8999-00259061c4c5",
"size": 1192
},
"index": "graylog_2136",
"highlight_ranges": {}
}
http://XXX/api/messages/lt_graylog_177/b7a4d965-2857-11e8-8999-00259061c4c5
{
"message": {
"fields": {
"ctx_route_url": "/l/{postId}",
"__method": "GET",
"gl2_remote_ip": "10.3.1.149",
"gl2_remote_port": 58523,
"source": "BACKEND-P8:/LM/W3SVC/3/ROOT",
"__version_int": 2,
"gl2_source_input": "5a0c4e9162832a0cb06a3e6a",
"ctx_device_ClientType": "Browser",
"ctx_request_ip": "217.66.159.185",
"thread_id": "91",
"identity": "Dnet.Unauthenticated:fsA5-knCAlI",
"from": "Bru.Web.LayeredHttpApplication",
"gl2_source_node": "8991e66e-0104-4555-8e93-dbc84958fc5b",
"ctx_device_Platform": "Android",
"timestamp": "2018-03-15T13:49:44.373Z",
"authentication": "Dnet.Unauthenticated",
"ctx_device_FormFactor": "Smartphone",
"process": "4512:2",
"level": 5,
"ctx_connection_id_ulong": "1214558930",
"streams": [
"5a0ef85762832a790443693a",
"5a8d857de1b1f3511350e078"
],
"__url": "/l/6375658/",
"message": "?",
"event_id": "8",
"ctx_current_path": "/l/6375658/",
"__agent": "Mozilla/5.0 (Linux; U; Android 6.0; zh-CN; MZ-U10 Build/MRA58K) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/40.0.2214.89 MZBrowser/6.9.410-2017122215 UWS/2.11.0.33 Mobile Safari/537.36",
"correlation_id": "02b45de4457b97ce2d992ca5447b1987",
"ctx_request_path": "/l/6375658/",
"_id": "b7a4d965-2857-11e8-8999-00259061c4c5",
"time": 1521121784372.5862
},
"streams": [],
"index_sets": [],
"source_input_id": null,
"filter_out": false,
"journal_offset": -9223372036854776000,
"source": "BACKEND-P8:/LM/W3SVC/3/ROOT",
"timestamp": "2018-03-15T13:49:44.373Z",
"field_names": [
"ctx_route_url",
"__method",
"gl2_remote_ip",
"gl2_remote_port",
"source",
"__version_int",
"gl2_source_input",
"ctx_device_ClientType",
"ctx_request_ip",
"thread_id",
"identity",
"from",
"gl2_source_node",
"ctx_device_Platform",
"timestamp",
"authentication",
"ctx_device_FormFactor",
"process",
"level",
"ctx_connection_id_ulong",
"streams",
"__url",
"message",
"event_id",
"ctx_current_path",
"__agent",
"correlation_id",
"ctx_request_path",
"_id",
"time"
],
"complete": true,
"field_count": 31,
"validation_errors": "",
"fields_entries": [
{
"ctx_route_url": "/l/{postId}"
},
{
"__method": "GET"
},
{
"gl2_remote_ip": "10.3.1.149"
},
{
"gl2_remote_port": 58523
},
{
"source": "BACKEND-P8:/LM/W3SVC/3/ROOT"
},
{
"__version_int": 2
},
{
"gl2_source_input": "5a0c4e9162832a0cb06a3e6a"
},
{
"ctx_device_ClientType": "Browser"
},
{
"ctx_request_ip": "217.66.159.185"
},
{
"thread_id": "91"
},
{
"identity": "Dnet.Unauthenticated:fsA5-knCAlI"
},
{
"from": "Bru.Web.LayeredHttpApplication"
},
{
"gl2_source_node": "8991e66e-0104-4555-8e93-dbc84958fc5b"
},
{
"ctx_device_Platform": "Android"
},
{
"timestamp": "2018-03-15T13:49:44.373Z"
},
{
"authentication": "Dnet.Unauthenticated"
},
{
"ctx_device_FormFactor": "Smartphone"
},
{
"process": "4512:2"
},
{
"level": 5
},
{
"ctx_connection_id_ulong": "1214558930"
},
{
"streams": [
"5a0ef85762832a790443693a",
"5a8d857de1b1f3511350e078"
]
},
{
"__url": "/l/6375658/"
},
{
"message": "?"
},
{
"event_id": "8"
},
{
"ctx_current_path": "/l/6375658/"
},
{
"__agent": "Mozilla/5.0 (Linux; U; Android 6.0; zh-CN; MZ-U10 Build/MRA58K) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/40.0.2214.89 MZBrowser/6.9.410-2017122215 UWS/2.11.0.33 Mobile Safari/537.36"
},
{
"correlation_id": "02b45de4457b97ce2d992ca5447b1987"
},
{
"ctx_request_path": "/l/6375658/"
},
{
"_id": "b7a4d965-2857-11e8-8999-00259061c4c5"
},
{
"time": 1521121784372.5862
}
],
"stream_ids": [
"5a0ef85762832a790443693a",
"5a8d857de1b1f3511350e078"
],
"is_source_inet_address": true,
"inet_address": "10.3.1.149",
"message": "?",
"id": "b7a4d965-2857-11e8-8999-00259061c4c5",
"size": 1192
},
"index": "lt_graylog_177",
"highlight_ranges": {}
}
If there's any additional info you need -- feel free to ask.
Streams settings could be helpful as well (Alert section removed):
{
"id": "5a0ef85762832a790443693a",
"creator_user_id": "admin",
"outputs": [],
"matching_type": "AND",
"description": "DRIVE2 requests",
"created_at": "2017-11-17T14:55:19.641Z",
"disabled": false,
"rules": [
{
"field": "event_id",
"stream_id": "5a0ef85762832a790443693a",
"description": "",
"id": "5a0ef85762832a790443693d",
"type": 1,
"inverted": false,
"value": "8"
},
{
"field": "from",
"stream_id": "5a0ef85762832a790443693a",
"description": "",
"id": "5a0ef85762832a790443693c",
"type": 1,
"inverted": false,
"value": "Bru.Web.LayeredHttpApplication"
},
{
"field": "gl2_source_input",
"stream_id": "5a0ef85762832a790443693a",
"description": "",
"id": "5a0ef85762832a790443693b",
"type": 1,
"inverted": false,
"value": "5a0c4e9162832a0cb06a3e6a"
}
],
"title": "D2Requests",
"content_pack": null,
"remove_matches_from_default_stream": false,
"index_set_id": "5a04825be1b1f3117085b17c",
"is_default": false
}
{
"id": "5a8d857de1b1f3511350e078",
"creator_user_id": "xm",
"outputs": [],
"matching_type": "AND",
"description": "DRIVE2 user requests",
"created_at": "2018-02-21T14:43:09.590Z",
"disabled": false,
"rules": [
{
"field": "event_id",
"stream_id": "5a8d857de1b1f3511350e078",
"description": "",
"id": "5a8d857de1b1f3511350e079",
"type": 1,
"inverted": false,
"value": "8"
},
{
"field": "from",
"stream_id": "5a8d857de1b1f3511350e078",
"description": "",
"id": "5a8d857de1b1f3511350e07a",
"type": 1,
"inverted": false,
"value": "Bru.Web.LayeredHttpApplication"
},
{
"field": "gl2_source_input",
"stream_id": "5a8d857de1b1f3511350e078",
"description": "",
"id": "5a8d857de1b1f3511350e07b",
"type": 1,
"inverted": false,
"value": "5a0c4e9162832a0cb06a3e6a"
},
{
"field": "ctx_crawler",
"stream_id": "5a8d857de1b1f3511350e078",
"description": "",
"id": "5a8d86d6e1b1f3511350e202",
"type": 5,
"inverted": true,
"value": ""
},
{
"field": "identity",
"stream_id": "5a8d857de1b1f3511350e078",
"description": "",
"id": "5a8d8762e1b1f3511350e2a2",
"type": 5,
"inverted": false,
"value": ""
}
],
"title": "D2UserRequests",
"content_pack": null,
"remove_matches_from_default_stream": true,
"index_set_id": "5a0c4d3362832a0cb06a3cfd",
"is_default": false
}
Any updates on this one?
@2fa: We have made fundamental changes to the search in 3.2. Can you retry this issue in the most recent version and update this issue if it is valid or not?
Good day.
We're currently using version 3.1.4. Today i'll update to the latest version and check the search.
We tested the search in 3.2.1 and problem still exists, i even did forced index rotation to check described behavior -- still the same.
P.S. i've downgraded graylog version to 3.1.4, cuz we cannot use 3.2.1-- it's a bloody mess due to "n.find" and "t.find" bugs.
For completeness and confirmation that this behaviour is still present in more recent versions: I have ran into the same behavior on version 4.1.9, as described in this post on the community.graylog.org
The problem is also still present on v 4.3.1
Hey @MarkoSulamagi,
thanks for the heads up. Can you explain your local environment a little bit? We are particularly interested in:
- which streams do you use
- for which streams does this happen/for which not
- which index sets exist
- what are the prefixes for the index sets
- what is the mapping between streams and index sets
We have not been able to reproduce this issue yet, so anything that would help us to reproduce this issue would get this solved much faster/at all.
Hello, i've got the same issue on 4.3.9 One input, two index sets, two streams connected to corresponding index set. Some of messages go to streamA and to streamB. While indexA and indexB not rotated, all seems good. But after indexes rotate, search returns the same message twice - IMHO one time per index. It's the same message, the same time, ID and so on. And more - even if i search only on StreamB, it returns the same messages twice.
The issue is present in 4.3.9 for us as well.
Hey @makovkindenis, can you check to which streams both of the duplicate messages are being routed? What stream rules are you using for both streams? Do you have any pipeline rules configured?
Hey @onyxmaster, can you help us by answering the questions from https://github.com/Graylog2/graylog2-server/issues/4638#issuecomment-1160384493 to narrow down this issue?
Hey @onyxmaster, can you help us by answering the questions from #4638 (comment) to narrow down this issue?
I will answer, we both have the same cluster =)
can you check to which streams both of the duplicate messages are being routed? What stream rules are you using for both streams?
My answer on both questions can be found here and here: both messages have the same ID and routed into the same sets of streams but stored in different indexes. There are two streams which uses different index sets. When you search in any of them they will find duplicate messages stored in index set that doesn't belong to that stream.
I will add that if you have, for example, 4 streams that message routes to (3 of which belongs to an index set X and 1 to an index set Y) then there will be only one duplicate message, so it depends on the total number of index sets that message belongs to.
Also as a reminder: duplicates appear only for a rotated indexes. There are no problem if you trying to find recent messages that belongs to an active index.
Do you have any pipeline rules configured?
I checked and I don't think we have any particular pipeline rules for this streams.
Hey @2fa,
thank you very much for the input. I have not yet managed to reproduce this issue. Can you specify the exact steps how to reproduce it in a minimal scenario? That would help a lot debugging and fixing this issue. I feel like I am missing a crucial step. Thanks a lot!
Hey @dennisoelkers,
I specified steps to reproduce in the issue description, don't think much have changed since then. I can try to recreate it again in the fresh install to confirm that it is not a bug in our old instance. Don't know the ETA though. I'm on a holiday right now.
Well, i try to reproduce it on clear installation. Ran graylog5.0.7 in docker with default settings. Created one TCPSyslog input, created two index sets IndexA and IndexB, created two streams ForA and ForB, connected it to apropriate Index set. Before Indexes are rotated (i.e. when messages are in active indexes) i see my single message twice - it's the same message with the same ID (i think one per stream) and may be it is correct. By the way i can select ForA only and in search result it will be only one message from ForA. But, after i rotate index, filter by stream not working anymore. Search always returns 2 results, even if i select only ForA and search by MessageID
Hi there,
I can confirm the same issue exists for me in v5.0.5 and v5.0.7.
Is there any progress? I'd try v5.1 but I did not find anything related in the corresponding release notes.
Hey @diborodin & @makovkindenis,
thanks for helping with this issue! What are the index set prefixes of the index sets the messages are going into?
Hey @2fa, @diborodin, @makovkindenis:
Okay, after years I have now been able to reproduce the issue and found out it is working as designed. The reason for this is rather simple:
- When a message is being routed to two streams, both stream ids will be in the
streamsfield of both messages - When an index is rotated, a job is running over all documents in it, extracting the stream ids present
- This set of stream ids will be stored as metadata for that index
- When a search is performed, we will iterate over all metadata and find all indices which could have data for the streams which the user is searching, plus the currently open index (deflector) of each stream
This means that when a user searches in Stream A, the currently open index for stream A will be used. In addition, all closed indices which have stream A in their metadata will be used. If a message went in previously, that went to both Stream A & B, it will end up in indices of both their index sets (let's say Index Set A & B). In the future, also the index from Index Set B that contains messages which have ended up in Stream A will be picked up during search, even when only Stream A is searched.
This might seem counter-intuitive, but the reasoning is this: We do not know if Stream A was previously attached to Index Set B. If it was, and we would only take indices from the stream's current index set into account, then messages will become inaccessible. The trade-off is to include duplicates in cases like this.
We are open for improvements to this though. It would be interesting to hear what problems this behavior is causing you and if you have ideas on how we could make it better without sacrificing access to messages in scenarios as previously described.
Hello @dennisoelkers .
Thank you for investigating this. We have a case when our logs go to 2 indices: one dedicated for e.g. nginx logs and one like "all messages for last day" for all sources (not only nginx logs). So after the index is rotated and when I search in the nginx stream for messages older than the point of index rotation, I see duplicate messages what is confusing, especially when you need to gather some analytic. I don't feel it should work this way as I have to add something like AND index:nginx* in the search query.
We are open for improvements to this though. It would be interesting to hear what problems this behavior is causing you and if you have ideas on how we could make it better without sacrificing access to messages in scenarios as previously described.
In our dashboard we count message and since the messages are found multiples times the count value is not correct.
I tried to workaround group by timestamp (so two message count as one) but it produced too many bucked. grouping by gl2_message_id was too slow (and I think might encounter the same issue).
I thought about cardinality instead of count (but for the global count it can't work because it need a field to carnial). Since I count a subpattern the cardinality yield much lower result that actual count/2 (so not same timestamp/message id but same extracted field)
One possible way is a hashmap containing already seens message id. I don't know if it could fit in memory for large gl cluster.
Another possibility is to sort message by timestamp message id then two following message (if timestamps is used we can have AbCACb if A, b and C arrived at exactly the same time) might be a double so we can remove one.
another posibility is a filter currently_in_index:"X" so if users know he didn't moved the message it can works as expected.
Just ran into this ourselves on 4.2x version.
We have log streams created to be ingested by different teams for analysis. Say a dev team is looking at a specific workload and a platform team looking at all workloads.
Stream rules will take the same message and route it into 2 different index sets.
Message arrives and rules route it into 2 different streams:
Stream A looks at Index-Set A Stream B looks at Index-Set B
Search limited to just Stream A will shows results from Index-Set A and B.
Same message ID. Different Indexes. But my search is limited to a single stream that is attached to a single index-set and that index-set has a single instance of this message. Yet I see 2.
Very counterintuitive imo.
Open to being told my pattern is counterintuitive as well. :)
When a message is being routed to two streams, both stream ids will be in the streams field of both messages
Shouldn't each copy of the message only have the stream id for the stream that copy was routed to?
If I understand the problem correctly that would solve the issue (for future messages at least). However I know little about the internal working of graylog, so maybe that is not feasible.
Shouldn't each copy of the message only have the stream id for the stream that copy was routed to?
This would mean that when viewing the message, it would not be visible to the user anymore that the message has been routed into both streams, implying some kind of routing issue when the user actually expects this.