data-prepper icon indicating copy to clipboard operation
data-prepper copied to clipboard

[BUG] User agent processor gives java.lang.NullPointerException if entry does not contain source field

Open StrategiosP opened this issue 8 months ago • 4 comments

Describe the bug With Data Prepper 2.12.1 installed as Docker container I get java.lang.NullPointerException when using the User agent processor if entry does contain defined source field.

To Reproduce Steps to reproduce the behavior:

  1. Have a pipeline like this configured
  source:
    http:
      port: 2021
      ssl: false
      health_check_service: true

  processor:
    - user_agent:
         source: agent
         target: parsed_agent

  sink:
    - opensearch:
        hosts:
          - https://HOST:PORT
          - https://HOST:PORT
          - https://HOST:PORT
        insecure: true
        username: USER
        password: PASS
        index: TEST
  1. Ingest data which may or may not contain field agent
  2. Following errors get logged (for each ingested entry without field agent I believe)
Reading pipelines and data-prepper configuration files from Data Prepper home directory.
/usr/bin/java
Found openjdk version  of 17.0
2025-08-22T15:33:33,583 [main] INFO  org.opensearch.dataprepper.pipeline.parser.transformer.DynamicConfigTransformer - No transformation needed
2025-08-22T15:33:35,782 [main] INFO  org.opensearch.dataprepper.plugins.kafka.extension.KafkaClusterConfigExtension - Applying Kafka Cluster Config Extension.
2025-08-22T15:33:37,104 [main] WARN  org.opensearch.dataprepper.plugins.source.loghttp.HTTPSource - Creating http source without authentication. This is not secure.
2025-08-22T15:33:37,105 [main] WARN  org.opensearch.dataprepper.plugins.source.loghttp.HTTPSource - In order to set up Http Basic authentication for the http source, go here: https://github.com/opensearch-project/data-prepper/tree/main/data-prepper
-plugins/http-source#authentication-configurations
2025-08-22T15:33:37,798 [main] INFO  org.opensearch.dataprepper.plugins.geoip.extension.GeoIPDatabaseManager - Downloading GeoIP database to /usr/share/data-prepper/data/geoip/blue_database
2025-08-22T15:33:44,367 [main] WARN  org.opensearch.dataprepper.core.pipeline.server.config.DataPrepperServerConfiguration - Creating data prepper server without authentication. This is not secure.
2025-08-22T15:33:44,372 [main] WARN  org.opensearch.dataprepper.core.pipeline.server.config.DataPrepperServerConfiguration - In order to set up Http Basic authentication for the data prepper server, go here: https://github.com/opensearch-project/da
ta-prepper/blob/main/docs/core_apis.md#authentication
2025-08-22T15:33:44,831 [log-ingest-pipeline-sink-worker-2-thread-1] INFO  org.opensearch.dataprepper.plugins.sink.opensearch.OpenSearchSink - Initializing OpenSearch sink
2025-08-22T15:33:44,848 [main] WARN  org.opensearch.dataprepper.core.pipeline.server.HttpServerProvider - Creating Data Prepper server without TLS. This is not secure.
2025-08-22T15:33:44,852 [main] WARN  org.opensearch.dataprepper.core.pipeline.server.HttpServerProvider - In order to set up TLS for the Data Prepper server, go here: https://github.com/opensearch-project/data-prepper/blob/main/docs/configuration.m
d#server-configuration
2025-08-22T15:33:44,867 [log-ingest-pipeline-sink-worker-2-thread-1] INFO  org.opensearch.dataprepper.plugins.sink.opensearch.ConnectionConfiguration - Using the username provided in the config.
2025-08-22T15:33:44,907 [log-ingest-pipeline-sink-worker-2-thread-1] INFO  org.opensearch.dataprepper.plugins.sink.opensearch.ConnectionConfiguration - Using the trust all strategy
2025-08-22T15:33:45,480 [log-ingest-pipeline-sink-worker-2-thread-1] INFO  org.opensearch.dataprepper.plugins.sink.opensearch.OpenSearchSink - Initialized OpenSearch sink
2025-08-22T15:33:45,801 [log-ingest-pipeline-sink-worker-2-thread-1] WARN   com.linecorp.armeria.common.CommonPools - Failed to register the common worker group as non-blocking for Reactor. Please consider upgrading Reactor to 3.7.0 or newer.
2025-08-22T15:33:45,948 [log-ingest-pipeline-sink-worker-2-thread-1] WARN  org.opensearch.dataprepper.plugins.server.CreateServer - Creating http without SSL/TLS. This is not secure.
2025-08-22T15:33:45,949 [log-ingest-pipeline-sink-worker-2-thread-1] WARN  org.opensearch.dataprepper.plugins.server.CreateServer - In order to set up TLS for the http, go here: https://github.com/opensearch-project/data-prepper/tree/main/data-prep
per-plugins/http-source#ssl
2025-08-22T15:33:46,017 [log-ingest-pipeline-sink-worker-2-thread-1] INFO  org.opensearch.dataprepper.plugins.server.CreateServer - HTTP source health check is enabled
2025-08-22T15:33:46,545 [log-ingest-pipeline-sink-worker-2-thread-1] INFO  org.opensearch.dataprepper.plugins.source.loghttp.HTTPSource - Started http source on port 2021...
2025-08-22T15:33:50,241 [log-ingest-pipeline-processor-worker-1-thread-1] ERROR org.opensearch.dataprepper.plugins.processor.useragent.UserAgentProcessor - An exception occurred when parsing user agent data from event [org.opensearch.dataprepper.model.log.JacksonLog@79ad1a39] with source key [agent]
java.lang.NullPointerException: null
        at java.base/java.util.Objects.requireNonNull(Objects.java:209) ~[?:?]
        at org.opensearch.dataprepper.plugins.processor.useragent.UserAgentProcessor.doExecute(UserAgentProcessor.java:57) ~[data-prepper-user-agent-processor-2.12.1.jar:?]
        at org.opensearch.dataprepper.model.processor.AbstractProcessor.lambda$execute$0(AbstractProcessor.java:54) ~[data-prepper-api-2.12.1.jar:?]
        at io.micrometer.core.instrument.composite.CompositeTimer.record(CompositeTimer.java:69) [micrometer-core-1.14.4.jar:1.14.4]
        at org.opensearch.dataprepper.model.processor.AbstractProcessor.execute(AbstractProcessor.java:54) [data-prepper-api-2.12.1.jar:?]
        at org.opensearch.dataprepper.core.pipeline.PipelineRunnerImpl.runProcessorsAndProcessAcknowledgements(PipelineRunnerImpl.java:105) [data-prepper-core-2.12.1.jar:?]
        at org.opensearch.dataprepper.core.pipeline.PipelineRunnerImpl.runAllProcessorsAndPublishToSinks(PipelineRunnerImpl.java:55) [data-prepper-core-2.12.1.jar:?]
        at org.opensearch.dataprepper.core.pipeline.ProcessWorker.doRun(ProcessWorker.java:80) [data-prepper-core-2.12.1.jar:?]
        at org.opensearch.dataprepper.core.pipeline.ProcessWorker.run(ProcessWorker.java:40) [data-prepper-core-2.12.1.jar:?]
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) [?:?]
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
        at java.base/java.lang.Thread.run(Thread.java:840) [?:?]
2025-08-22T15:33:50,251 [log-ingest-pipeline-processor-worker-1-thread-1] ERROR org.opensearch.dataprepper.plugins.processor.useragent.UserAgentProcessor - An exception occurred when parsing user agent data from event [org.opensearch.dataprepper.model.log.JacksonLog@54ba2d01] with source key [agent]
java.lang.NullPointerException: null
        at java.base/java.util.Objects.requireNonNull(Objects.java:209) ~[?:?]
        at org.opensearch.dataprepper.plugins.processor.useragent.UserAgentProcessor.doExecute(UserAgentProcessor.java:57) ~[data-prepper-user-agent-processor-2.12.1.jar:?]
        at org.opensearch.dataprepper.model.processor.AbstractProcessor.lambda$execute$0(AbstractProcessor.java:54) ~[data-prepper-api-2.12.1.jar:?]
        at io.micrometer.core.instrument.composite.CompositeTimer.record(CompositeTimer.java:69) [micrometer-core-1.14.4.jar:1.14.4]
        at org.opensearch.dataprepper.model.processor.AbstractProcessor.execute(AbstractProcessor.java:54) [data-prepper-api-2.12.1.jar:?]
        at org.opensearch.dataprepper.core.pipeline.PipelineRunnerImpl.runProcessorsAndProcessAcknowledgements(PipelineRunnerImpl.java:105) [data-prepper-core-2.12.1.jar:?]
        at org.opensearch.dataprepper.core.pipeline.PipelineRunnerImpl.runAllProcessorsAndPublishToSinks(PipelineRunnerImpl.java:55) [data-prepper-core-2.12.1.jar:?]
        at org.opensearch.dataprepper.core.pipeline.ProcessWorker.doRun(ProcessWorker.java:80) [data-prepper-core-2.12.1.jar:?]
        at org.opensearch.dataprepper.core.pipeline.ProcessWorker.run(ProcessWorker.java:40) [data-prepper-core-2.12.1.jar:?]
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) [?:?]
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
        at java.base/java.lang.Thread.run(Thread.java:840) [?:?]

Environment (please complete the following information):

  • OS: Ubuntu 24.04 LTS
  • Version: Data Prepper 2.12.1

Additional context In my use case I ingest log entries which may or may not contain user agent. I could not figure out any way to pass only entries containing agent information to the User agent processor so I tought I could pass all entries to it without getting erros.

StrategiosP avatar Aug 22 '25 16:08 StrategiosP

@StrategiosP, Thanks for making note of this.

Do you mean that the events coming in do not have the source field present?

Would you be able to produce a fix for this? Or perhaps contribute a unit test that replicates the behavior?

dlvenable avatar Aug 26 '25 19:08 dlvenable

@dlvenable Thanks for response.

Correct, some incoming events do not have the source field present.

Unfortunately I have never programmed in Java so I am not up to the task of contributing code.

StrategiosP avatar Aug 27 '25 20:08 StrategiosP

Hi @dlvenable! 👋

I'd be happy to take a look at this issue and work on a fix. It seems like we need to add a null check before accessing the source field in the user agent processor.

I'm relatively new to contributing to Data Prepper, but I've worked with Java before. Would you mind if I give it a try? I'll make sure to include unit tests that replicate the behavior as you suggested.

Let me know if this issue is still available or if there's anything specific I should keep in mind while working on it!

gmartincor avatar Nov 02 '25 04:11 gmartincor

Hi @dlvenable,

I have gone through the test cases for UserAgentProcessor, and I can see that this scenario is handled in this particular test case.

Regarding the exception, it is just the logging on the processor(added test case output as well). Please let me know if it requires any changes.

Test Case:

@Test
    public void testTagsAddedOnParseFailure() {
        when(mockConfig.getSource()).thenReturn(eventKeyFactory.createEventKey("bad_source"));
        when(mockConfig.getCacheSize()).thenReturn(TEST_CACHE_SIZE);
        when(mockConfig.getTarget()).thenReturn("user_agent");

        final String tagOnFailure1 = UUID.randomUUID().toString();
        final String tagOnFailure2 = UUID.randomUUID().toString();
        when(mockConfig.getTagsOnParseFailure()).thenReturn(List.of(tagOnFailure1, tagOnFailure2));

        final UserAgentProcessor processor = createObjectUnderTest();
        final Record<Event> testRecord = createTestRecord(UUID.randomUUID().toString());
        final List<Record<Event>> resultRecord = (List<Record<Event>>) processor.doExecute(Collections.singletonList(testRecord));
        final Event resultEvent = resultRecord.get(0).getData();

        assertThat(resultEvent.containsKey("user_agent"), is(false));
        assertThat(resultEvent.getMetadata().getTags().contains(tagOnFailure1), is(true));
        assertThat(resultEvent.getMetadata().getTags().contains(tagOnFailure2), is(true));
    }

Test case log

2025-11-18T05:01:52.921082Z Test worker ERROR An exception occurred when parsing user agent data from event [org.opensearch.dataprepper.model.event.JacksonEvent@57cc535] with source key [bad_source]
java.lang.NullPointerException
	at java.base/java.util.Objects.requireNonNull(Objects.java:209)
	at org.opensearch.dataprepper.plugins.processor.useragent.UserAgentProcessor.doExecute(UserAgentProcessor.java:57)
	at org.opensearch.dataprepper.plugins.processor.useragent.UserAgentProcessorTest.testTagsAddedOnParseFailure(UserAgentProcessorTest.java:138)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:569)
	at org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:725)

sabarinathan590 avatar Nov 18 '25 06:11 sabarinathan590

+1 on this issue. I'm running into the same problem, where I'm ingesting data that may or may not have a useragent field populated. I'm running tests with the opensearchproject/data-prepper:2.13.0 (latest) Docker image to deploy data-prepper to possibly replace an existing logstash implementation.

As an end user, my options are:

  1. Drop using the user_agent processor altogether, or parse it myself
  2. Create pipelines and routes to avoid sending null values to user_agent
  3. Use user_agent as-is and accept that I'm going to spam my logs with java.lang.NullPointerException

Option 1 is where I'm at right now. Option 2 is more complicated than I really want to implement, and option 3 is a non-starter since this is a project to replace existing functionality and I can't really justify promoting a solution with so many expected errors to ignore.

I know for a lot of the DP processors, there's a "*_when" conditional option to skip processing. My expertise is on the systems side so I don't know how hard it would be or what time would be required to implement, but I think it would work around the issue (at least in some use cases including mine) to have a ua_when option where we could just use ua_when: /source != null.

doombirdAD avatar Dec 30 '25 22:12 doombirdAD