graylog2-server icon indicating copy to clipboard operation
graylog2-server copied to clipboard

Stream search finds messages that doesn't belong to its index set

Open 2fa opened this issue 7 years ago • 25 comments

Expected Behavior

Stream search should only find messages that corresponds to its index set

Current Behavior

Stream search finds all messages matching its rules (even if they stored in different index sets, but it's true only for rotated indexes)

Possible Solution

Remove stream tag from the message if that stream doesn't belong to its index set

Steps to Reproduce

  1. Create two streams with similar rule sets that belong to different index sets
  2. Enable checkbox "Remove matches from 'All messages' stream" for the stream
  3. Load messages that should go to both streams
  4. Rotate their indexes
  5. Search

Context

We have two streams with similar rules: D2Requests and D2UserRequests (one of them is a subset of the other one), they belong to different index sets (first one -- default "graylog_", second -- custom with a longer lifetime "lt_graylog_").

stream_d2r stream_d2ur

When you search in D2UserRequests stream there are duplicate messages but only for a rotated "default" indexes:

stream_search

Default index has been rotated at ~11:20.

stream_search_message1

stream_search_message2

Same message, same stream -- different indexes.

If you search "All messages" stream for messages that applied to D2UserRequests rules it finds nothing, so checkbox is working there.

Your Environment

  • Graylog Version: 2.4.3+2c41897
  • Elasticsearch Version: 5.6.4
  • MongoDB Version: 3.4.10
  • Operating System: Linux 4.10.0-42-generic #46~16.04.1-Ubuntu SMP Mon Dec 4 15:57:59 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

2fa avatar Mar 06 '18 09:03 2fa

Hey @2fa, thanks for opening this issue. I am not sure I understood the problem, so a few questions:

  • What happened at 11:20? Manual deflector cycle for the default index set or the automatic cycle?
  • Did you toggle the "Remove matches from 'All messages'" intermittently?
  • Did you check the period which shows increased message volume for the stream for duplicate messages?

dennisoelkers avatar Mar 09 '18 08:03 dennisoelkers

@dennisoelkers thanks for your time. Here's some info:

  • Automatic cycle for the default index set. (It happens every 1-1.5 hours)
  • We've enabled "Remove matches from 'All messages'" couple of weeks ago, when we found out that there's duplicate message and didn't toggle setting since that time. It's working only when you search in 'All messages' stream.
  • Yes, if i understood you correctly. Every message appear twice from the moment in time older than boundaries of active default index.

2fa avatar Mar 09 '18 09:03 2fa

@2fa: One last thing, can you post details for two messages which appear twice?

dennisoelkers avatar Mar 15 '18 14:03 dennisoelkers

Sure thing, i hope that API version is better:

http://XXX/api/messages/graylog_2136/b7a4d965-2857-11e8-8999-00259061c4c5

{
  "message": {
    "fields": {
      "ctx_route_url": "/l/{postId}",
      "__method": "GET",
      "gl2_remote_ip": "10.3.1.149",
      "gl2_remote_port": 58523,
      "source": "BACKEND-P8:/LM/W3SVC/3/ROOT",
      "__version_int": 2,
      "gl2_source_input": "5a0c4e9162832a0cb06a3e6a",
      "ctx_device_ClientType": "Browser",
      "ctx_request_ip": "217.66.159.185",
      "thread_id": "91",
      "identity": "Dnet.Unauthenticated:fsA5-knCAlI",
      "from": "Bru.Web.LayeredHttpApplication",
      "gl2_source_node": "8991e66e-0104-4555-8e93-dbc84958fc5b",
      "ctx_device_Platform": "Android",
      "timestamp": "2018-03-15T13:49:44.373Z",
      "authentication": "Dnet.Unauthenticated",
      "ctx_device_FormFactor": "Smartphone",
      "process": "4512:2",
      "level": 5,
      "ctx_connection_id_ulong": "1214558930",
      "streams": [
        "5a0ef85762832a790443693a",
        "5a8d857de1b1f3511350e078"
      ],
      "__url": "/l/6375658/",
      "message": "?",
      "event_id": "8",
      "ctx_current_path": "/l/6375658/",
      "__agent": "Mozilla/5.0 (Linux; U; Android 6.0; zh-CN; MZ-U10 Build/MRA58K) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/40.0.2214.89 MZBrowser/6.9.410-2017122215 UWS/2.11.0.33 Mobile Safari/537.36",
      "correlation_id": "02b45de4457b97ce2d992ca5447b1987",
      "ctx_request_path": "/l/6375658/",
      "_id": "b7a4d965-2857-11e8-8999-00259061c4c5",
      "time": 1521121784372.5862
    },
    "streams": [],
    "index_sets": [],
    "source_input_id": null,
    "filter_out": false,
    "journal_offset": -9223372036854776000,
    "source": "BACKEND-P8:/LM/W3SVC/3/ROOT",
    "timestamp": "2018-03-15T13:49:44.373Z",
    "field_names": [
      "ctx_route_url",
      "__method",
      "gl2_remote_ip",
      "gl2_remote_port",
      "source",
      "__version_int",
      "gl2_source_input",
      "ctx_device_ClientType",
      "ctx_request_ip",
      "thread_id",
      "identity",
      "from",
      "gl2_source_node",
      "ctx_device_Platform",
      "timestamp",
      "authentication",
      "ctx_device_FormFactor",
      "process",
      "level",
      "ctx_connection_id_ulong",
      "streams",
      "__url",
      "message",
      "event_id",
      "ctx_current_path",
      "__agent",
      "correlation_id",
      "ctx_request_path",
      "_id",
      "time"
    ],
    "complete": true,
    "field_count": 31,
    "validation_errors": "",
    "fields_entries": [
      {
        "ctx_route_url": "/l/{postId}"
      },
      {
        "__method": "GET"
      },
      {
        "gl2_remote_ip": "10.3.1.149"
      },
      {
        "gl2_remote_port": 58523
      },
      {
        "source": "BACKEND-P8:/LM/W3SVC/3/ROOT"
      },
      {
        "__version_int": 2
      },
      {
        "gl2_source_input": "5a0c4e9162832a0cb06a3e6a"
      },
      {
        "ctx_device_ClientType": "Browser"
      },
      {
        "ctx_request_ip": "217.66.159.185"
      },
      {
        "thread_id": "91"
      },
      {
        "identity": "Dnet.Unauthenticated:fsA5-knCAlI"
      },
      {
        "from": "Bru.Web.LayeredHttpApplication"
      },
      {
        "gl2_source_node": "8991e66e-0104-4555-8e93-dbc84958fc5b"
      },
      {
        "ctx_device_Platform": "Android"
      },
      {
        "timestamp": "2018-03-15T13:49:44.373Z"
      },
      {
        "authentication": "Dnet.Unauthenticated"
      },
      {
        "ctx_device_FormFactor": "Smartphone"
      },
      {
        "process": "4512:2"
      },
      {
        "level": 5
      },
      {
        "ctx_connection_id_ulong": "1214558930"
      },
      {
        "streams": [
          "5a0ef85762832a790443693a",
          "5a8d857de1b1f3511350e078"
        ]
      },
      {
        "__url": "/l/6375658/"
      },
      {
        "message": "?"
      },
      {
        "event_id": "8"
      },
      {
        "ctx_current_path": "/l/6375658/"
      },
      {
        "__agent": "Mozilla/5.0 (Linux; U; Android 6.0; zh-CN; MZ-U10 Build/MRA58K) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/40.0.2214.89 MZBrowser/6.9.410-2017122215 UWS/2.11.0.33 Mobile Safari/537.36"
      },
      {
        "correlation_id": "02b45de4457b97ce2d992ca5447b1987"
      },
      {
        "ctx_request_path": "/l/6375658/"
      },
      {
        "_id": "b7a4d965-2857-11e8-8999-00259061c4c5"
      },
      {
        "time": 1521121784372.5862
      }
    ],
    "stream_ids": [
      "5a0ef85762832a790443693a",
      "5a8d857de1b1f3511350e078"
    ],
    "is_source_inet_address": true,
    "inet_address": "10.3.1.149",
    "message": "?",
    "id": "b7a4d965-2857-11e8-8999-00259061c4c5",
    "size": 1192
  },
  "index": "graylog_2136",
  "highlight_ranges": {}
}

http://XXX/api/messages/lt_graylog_177/b7a4d965-2857-11e8-8999-00259061c4c5

{
  "message": {
    "fields": {
      "ctx_route_url": "/l/{postId}",
      "__method": "GET",
      "gl2_remote_ip": "10.3.1.149",
      "gl2_remote_port": 58523,
      "source": "BACKEND-P8:/LM/W3SVC/3/ROOT",
      "__version_int": 2,
      "gl2_source_input": "5a0c4e9162832a0cb06a3e6a",
      "ctx_device_ClientType": "Browser",
      "ctx_request_ip": "217.66.159.185",
      "thread_id": "91",
      "identity": "Dnet.Unauthenticated:fsA5-knCAlI",
      "from": "Bru.Web.LayeredHttpApplication",
      "gl2_source_node": "8991e66e-0104-4555-8e93-dbc84958fc5b",
      "ctx_device_Platform": "Android",
      "timestamp": "2018-03-15T13:49:44.373Z",
      "authentication": "Dnet.Unauthenticated",
      "ctx_device_FormFactor": "Smartphone",
      "process": "4512:2",
      "level": 5,
      "ctx_connection_id_ulong": "1214558930",
      "streams": [
        "5a0ef85762832a790443693a",
        "5a8d857de1b1f3511350e078"
      ],
      "__url": "/l/6375658/",
      "message": "?",
      "event_id": "8",
      "ctx_current_path": "/l/6375658/",
      "__agent": "Mozilla/5.0 (Linux; U; Android 6.0; zh-CN; MZ-U10 Build/MRA58K) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/40.0.2214.89 MZBrowser/6.9.410-2017122215 UWS/2.11.0.33 Mobile Safari/537.36",
      "correlation_id": "02b45de4457b97ce2d992ca5447b1987",
      "ctx_request_path": "/l/6375658/",
      "_id": "b7a4d965-2857-11e8-8999-00259061c4c5",
      "time": 1521121784372.5862
    },
    "streams": [],
    "index_sets": [],
    "source_input_id": null,
    "filter_out": false,
    "journal_offset": -9223372036854776000,
    "source": "BACKEND-P8:/LM/W3SVC/3/ROOT",
    "timestamp": "2018-03-15T13:49:44.373Z",
    "field_names": [
      "ctx_route_url",
      "__method",
      "gl2_remote_ip",
      "gl2_remote_port",
      "source",
      "__version_int",
      "gl2_source_input",
      "ctx_device_ClientType",
      "ctx_request_ip",
      "thread_id",
      "identity",
      "from",
      "gl2_source_node",
      "ctx_device_Platform",
      "timestamp",
      "authentication",
      "ctx_device_FormFactor",
      "process",
      "level",
      "ctx_connection_id_ulong",
      "streams",
      "__url",
      "message",
      "event_id",
      "ctx_current_path",
      "__agent",
      "correlation_id",
      "ctx_request_path",
      "_id",
      "time"
    ],
    "complete": true,
    "field_count": 31,
    "validation_errors": "",
    "fields_entries": [
      {
        "ctx_route_url": "/l/{postId}"
      },
      {
        "__method": "GET"
      },
      {
        "gl2_remote_ip": "10.3.1.149"
      },
      {
        "gl2_remote_port": 58523
      },
      {
        "source": "BACKEND-P8:/LM/W3SVC/3/ROOT"
      },
      {
        "__version_int": 2
      },
      {
        "gl2_source_input": "5a0c4e9162832a0cb06a3e6a"
      },
      {
        "ctx_device_ClientType": "Browser"
      },
      {
        "ctx_request_ip": "217.66.159.185"
      },
      {
        "thread_id": "91"
      },
      {
        "identity": "Dnet.Unauthenticated:fsA5-knCAlI"
      },
      {
        "from": "Bru.Web.LayeredHttpApplication"
      },
      {
        "gl2_source_node": "8991e66e-0104-4555-8e93-dbc84958fc5b"
      },
      {
        "ctx_device_Platform": "Android"
      },
      {
        "timestamp": "2018-03-15T13:49:44.373Z"
      },
      {
        "authentication": "Dnet.Unauthenticated"
      },
      {
        "ctx_device_FormFactor": "Smartphone"
      },
      {
        "process": "4512:2"
      },
      {
        "level": 5
      },
      {
        "ctx_connection_id_ulong": "1214558930"
      },
      {
        "streams": [
          "5a0ef85762832a790443693a",
          "5a8d857de1b1f3511350e078"
        ]
      },
      {
        "__url": "/l/6375658/"
      },
      {
        "message": "?"
      },
      {
        "event_id": "8"
      },
      {
        "ctx_current_path": "/l/6375658/"
      },
      {
        "__agent": "Mozilla/5.0 (Linux; U; Android 6.0; zh-CN; MZ-U10 Build/MRA58K) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/40.0.2214.89 MZBrowser/6.9.410-2017122215 UWS/2.11.0.33 Mobile Safari/537.36"
      },
      {
        "correlation_id": "02b45de4457b97ce2d992ca5447b1987"
      },
      {
        "ctx_request_path": "/l/6375658/"
      },
      {
        "_id": "b7a4d965-2857-11e8-8999-00259061c4c5"
      },
      {
        "time": 1521121784372.5862
      }
    ],
    "stream_ids": [
      "5a0ef85762832a790443693a",
      "5a8d857de1b1f3511350e078"
    ],
    "is_source_inet_address": true,
    "inet_address": "10.3.1.149",
    "message": "?",
    "id": "b7a4d965-2857-11e8-8999-00259061c4c5",
    "size": 1192
  },
  "index": "lt_graylog_177",
  "highlight_ranges": {}
}

If there's any additional info you need -- feel free to ask.

2fa avatar Mar 15 '18 15:03 2fa

Streams settings could be helpful as well (Alert section removed):

{
  "id": "5a0ef85762832a790443693a",
  "creator_user_id": "admin",
  "outputs": [],
  "matching_type": "AND",
  "description": "DRIVE2 requests",
  "created_at": "2017-11-17T14:55:19.641Z",
  "disabled": false,
  "rules": [
    {
      "field": "event_id",
      "stream_id": "5a0ef85762832a790443693a",
      "description": "",
      "id": "5a0ef85762832a790443693d",
      "type": 1,
      "inverted": false,
      "value": "8"
    },
    {
      "field": "from",
      "stream_id": "5a0ef85762832a790443693a",
      "description": "",
      "id": "5a0ef85762832a790443693c",
      "type": 1,
      "inverted": false,
      "value": "Bru.Web.LayeredHttpApplication"
    },
    {
      "field": "gl2_source_input",
      "stream_id": "5a0ef85762832a790443693a",
      "description": "",
      "id": "5a0ef85762832a790443693b",
      "type": 1,
      "inverted": false,
      "value": "5a0c4e9162832a0cb06a3e6a"
    }
  ],
  "title": "D2Requests",
  "content_pack": null,
  "remove_matches_from_default_stream": false,
  "index_set_id": "5a04825be1b1f3117085b17c",
  "is_default": false
}
{
  "id": "5a8d857de1b1f3511350e078",
  "creator_user_id": "xm",
  "outputs": [],
  "matching_type": "AND",
  "description": "DRIVE2 user requests",
  "created_at": "2018-02-21T14:43:09.590Z",
  "disabled": false,
  "rules": [
    {
      "field": "event_id",
      "stream_id": "5a8d857de1b1f3511350e078",
      "description": "",
      "id": "5a8d857de1b1f3511350e079",
      "type": 1,
      "inverted": false,
      "value": "8"
    },
    {
      "field": "from",
      "stream_id": "5a8d857de1b1f3511350e078",
      "description": "",
      "id": "5a8d857de1b1f3511350e07a",
      "type": 1,
      "inverted": false,
      "value": "Bru.Web.LayeredHttpApplication"
    },
    {
      "field": "gl2_source_input",
      "stream_id": "5a8d857de1b1f3511350e078",
      "description": "",
      "id": "5a8d857de1b1f3511350e07b",
      "type": 1,
      "inverted": false,
      "value": "5a0c4e9162832a0cb06a3e6a"
    },
    {
      "field": "ctx_crawler",
      "stream_id": "5a8d857de1b1f3511350e078",
      "description": "",
      "id": "5a8d86d6e1b1f3511350e202",
      "type": 5,
      "inverted": true,
      "value": ""
    },
    {
      "field": "identity",
      "stream_id": "5a8d857de1b1f3511350e078",
      "description": "",
      "id": "5a8d8762e1b1f3511350e2a2",
      "type": 5,
      "inverted": false,
      "value": ""
    }
  ],
  "title": "D2UserRequests",
  "content_pack": null,
  "remove_matches_from_default_stream": true,
  "index_set_id": "5a0c4d3362832a0cb06a3cfd",
  "is_default": false
}

2fa avatar Mar 15 '18 15:03 2fa

Any updates on this one?

2fa avatar Mar 28 '19 09:03 2fa

@2fa: We have made fundamental changes to the search in 3.2. Can you retry this issue in the most recent version and update this issue if it is valid or not?

dennisoelkers avatar Feb 17 '20 18:02 dennisoelkers

Good day.

We're currently using version 3.1.4. Today i'll update to the latest version and check the search.

2fa avatar Feb 18 '20 08:02 2fa

We tested the search in 3.2.1 and problem still exists, i even did forced index rotation to check described behavior -- still the same.

P.S. i've downgraded graylog version to 3.1.4, cuz we cannot use 3.2.1-- it's a bloody mess due to "n.find" and "t.find" bugs.

2fa avatar Feb 18 '20 13:02 2fa

For completeness and confirmation that this behaviour is still present in more recent versions: I have ran into the same behavior on version 4.1.9, as described in this post on the community.graylog.org

MaxVanDeursen avatar Jan 14 '22 09:01 MaxVanDeursen

The problem is also still present on v 4.3.1

MarkoSulamagi avatar Jun 17 '22 07:06 MarkoSulamagi

Hey @MarkoSulamagi,

thanks for the heads up. Can you explain your local environment a little bit? We are particularly interested in:

  • which streams do you use
  • for which streams does this happen/for which not
  • which index sets exist
  • what are the prefixes for the index sets
  • what is the mapping between streams and index sets

We have not been able to reproduce this issue yet, so anything that would help us to reproduce this issue would get this solved much faster/at all.

dennisoelkers avatar Jun 20 '22 12:06 dennisoelkers

Hello, i've got the same issue on 4.3.9 One input, two index sets, two streams connected to corresponding index set. Some of messages go to streamA and to streamB. While indexA and indexB not rotated, all seems good. But after indexes rotate, search returns the same message twice - IMHO one time per index. It's the same message, the same time, ID and so on. And more - even if i search only on StreamB, it returns the same messages twice.

makovkindenis avatar Jan 17 '23 13:01 makovkindenis

The issue is present in 4.3.9 for us as well.

onyxmaster avatar Jan 17 '23 13:01 onyxmaster

Hey @makovkindenis, can you check to which streams both of the duplicate messages are being routed? What stream rules are you using for both streams? Do you have any pipeline rules configured?

dennisoelkers avatar Mar 03 '23 15:03 dennisoelkers

Hey @onyxmaster, can you help us by answering the questions from https://github.com/Graylog2/graylog2-server/issues/4638#issuecomment-1160384493 to narrow down this issue?

dennisoelkers avatar Mar 03 '23 15:03 dennisoelkers

Hey @onyxmaster, can you help us by answering the questions from #4638 (comment) to narrow down this issue?

I will answer, we both have the same cluster =)

can you check to which streams both of the duplicate messages are being routed? What stream rules are you using for both streams?

My answer on both questions can be found here and here: both messages have the same ID and routed into the same sets of streams but stored in different indexes. There are two streams which uses different index sets. When you search in any of them they will find duplicate messages stored in index set that doesn't belong to that stream.

I will add that if you have, for example, 4 streams that message routes to (3 of which belongs to an index set X and 1 to an index set Y) then there will be only one duplicate message, so it depends on the total number of index sets that message belongs to.

Also as a reminder: duplicates appear only for a rotated indexes. There are no problem if you trying to find recent messages that belongs to an active index.

Do you have any pipeline rules configured?

I checked and I don't think we have any particular pipeline rules for this streams.

2fa avatar Mar 03 '23 16:03 2fa

Hey @2fa,

thank you very much for the input. I have not yet managed to reproduce this issue. Can you specify the exact steps how to reproduce it in a minimal scenario? That would help a lot debugging and fixing this issue. I feel like I am missing a crucial step. Thanks a lot!

dennisoelkers avatar May 08 '23 08:05 dennisoelkers

Hey @dennisoelkers,

I specified steps to reproduce in the issue description, don't think much have changed since then. I can try to recreate it again in the fresh install to confirm that it is not a bug in our old instance. Don't know the ETA though. I'm on a holiday right now.

2fa avatar May 08 '23 21:05 2fa

Well, i try to reproduce it on clear installation. Ran graylog5.0.7 in docker with default settings. Created one TCPSyslog input, created two index sets IndexA and IndexB, created two streams ForA and ForB, connected it to apropriate Index set. Before Indexes are rotated (i.e. when messages are in active indexes) i see my single message twice - it's the same message with the same ID (i think one per stream) and may be it is correct. By the way i can select ForA only and in search result it will be only one message from ForA. But, after i rotate index, filter by stream not working anymore. Search always returns 2 results, even if i select only ForA and search by MessageID

Screenshot 2023-05-10 151548

makovkindenis avatar May 10 '23 12:05 makovkindenis

Hi there,

I can confirm the same issue exists for me in v5.0.5 and v5.0.7.

Is there any progress? I'd try v5.1 but I did not find anything related in the corresponding release notes.

diborodin avatar May 31 '23 21:05 diborodin

Hey @diborodin & @makovkindenis,

thanks for helping with this issue! What are the index set prefixes of the index sets the messages are going into?

dennisoelkers avatar Jun 16 '23 07:06 dennisoelkers

Hey @2fa, @diborodin, @makovkindenis:

Okay, after years I have now been able to reproduce the issue and found out it is working as designed. The reason for this is rather simple:

  • When a message is being routed to two streams, both stream ids will be in the streams field of both messages
  • When an index is rotated, a job is running over all documents in it, extracting the stream ids present
  • This set of stream ids will be stored as metadata for that index
  • When a search is performed, we will iterate over all metadata and find all indices which could have data for the streams which the user is searching, plus the currently open index (deflector) of each stream

This means that when a user searches in Stream A, the currently open index for stream A will be used. In addition, all closed indices which have stream A in their metadata will be used. If a message went in previously, that went to both Stream A & B, it will end up in indices of both their index sets (let's say Index Set A & B). In the future, also the index from Index Set B that contains messages which have ended up in Stream A will be picked up during search, even when only Stream A is searched.

This might seem counter-intuitive, but the reasoning is this: We do not know if Stream A was previously attached to Index Set B. If it was, and we would only take indices from the stream's current index set into account, then messages will become inaccessible. The trade-off is to include duplicates in cases like this.

We are open for improvements to this though. It would be interesting to hear what problems this behavior is causing you and if you have ideas on how we could make it better without sacrificing access to messages in scenarios as previously described.

dennisoelkers avatar Jun 16 '23 08:06 dennisoelkers

Hello @dennisoelkers .

Thank you for investigating this. We have a case when our logs go to 2 indices: one dedicated for e.g. nginx logs and one like "all messages for last day" for all sources (not only nginx logs). So after the index is rotated and when I search in the nginx stream for messages older than the point of index rotation, I see duplicate messages what is confusing, especially when you need to gather some analytic. I don't feel it should work this way as I have to add something like AND index:nginx* in the search query.

diborodin avatar Jun 16 '23 08:06 diborodin

We are open for improvements to this though. It would be interesting to hear what problems this behavior is causing you and if you have ideas on how we could make it better without sacrificing access to messages in scenarios as previously described.

In our dashboard we count message and since the messages are found multiples times the count value is not correct.

I tried to workaround group by timestamp (so two message count as one) but it produced too many bucked. grouping by gl2_message_id was too slow (and I think might encounter the same issue).

I thought about cardinality instead of count (but for the global count it can't work because it need a field to carnial). Since I count a subpattern the cardinality yield much lower result that actual count/2 (so not same timestamp/message id but same extracted field)

One possible way is a hashmap containing already seens message id. I don't know if it could fit in memory for large gl cluster.

Another possibility is to sort message by timestamp message id then two following message (if timestamps is used we can have AbCACb if A, b and C arrived at exactly the same time) might be a double so we can remove one.

another posibility is a filter currently_in_index:"X" so if users know he didn't moved the message it can works as expected.

Et7f3 avatar Aug 27 '24 10:08 Et7f3

Just ran into this ourselves on 4.2x version.

We have log streams created to be ingested by different teams for analysis. Say a dev team is looking at a specific workload and a platform team looking at all workloads.

Stream rules will take the same message and route it into 2 different index sets.

Message arrives and rules route it into 2 different streams:

Stream A looks at Index-Set A Stream B looks at Index-Set B

Search limited to just Stream A will shows results from Index-Set A and B.

Same message ID. Different Indexes. But my search is limited to a single stream that is attached to a single index-set and that index-set has a single instance of this message. Yet I see 2.

Very counterintuitive imo.

Open to being told my pattern is counterintuitive as well. :)

robermar23 avatar Feb 27 '25 17:02 robermar23

When a message is being routed to two streams, both stream ids will be in the streams field of both messages

Shouldn't each copy of the message only have the stream id for the stream that copy was routed to?

If I understand the problem correctly that would solve the issue (for future messages at least). However I know little about the internal working of graylog, so maybe that is not feasible.

innominateAtWork avatar Mar 03 '25 20:03 innominateAtWork

Shouldn't each copy of the message only have the stream id for the stream that copy was routed to?

This would mean that when viewing the message, it would not be visible to the user anymore that the message has been routed into both streams, implying some kind of routing issue when the user actually expects this.

dennisoelkers avatar Oct 13 '25 09:10 dennisoelkers