ApplicationInsights-Java icon indicating copy to clipboard operation
ApplicationInsights-Java copied to clipboard

Sampling override with http.response.status_code doesn't work

Open zwilling79 opened this issue 1 year ago • 5 comments

Expected behavior

If you add a sampling override that filters out all requests with a specific HTTP response status, those requests shouldn't be shown in Application Insights.

Actual behavior

HTTP requests with the specified status code are shown in Application Insights.

To Reproduce

  • Create a simple Spring Boot application with the health actuator endpoint enabled
  • Create a applicationinsights.json that includes the below sampling setting:
  "sampling": {
    "percentage": 100,
    "overrides": [
      {
        "telemetryType": "request",
        "attributes": [
          {
            "key": "http.response.status_code",
            "value": 200,
            "matchType": "strict"
          }
        ],
        "percentage": 0
      }
    ]
  },
  • Do a GET request to the http://localhost:8080/actuator/health endpoint
    • It should return a 200 response code with the following payload: { "status": "UP" }
    • This request shouldn't be shown in Application Insights
  • Do a GET request to the http://localhost:8080/actuator/invalid endpoint
    • This request should be shown in Application Insights because you get a 404 error

System information

Please provide the following information:

  • SDK Version 3.5.1 (Telemetry SDK Version: 1.35.0)
  • OS type and version: Windows 11
  • Application Server type and version (if applicable): Tomcat
  • Using spring-boot? Yes
  • Additional relevant libraries (with version, if applicable): n/a

Logs

2024-04-22 10:28:52.795+02:00 DEBUG c.m.a.a.i.exporter.AgentSpanExporter - exporting span: SpanData{spanContext=ImmutableSpanContext{traceId=0dad507a37d7da13c3a81e9139723846, spanId=4eadcbc1fe5c4e8b, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, parentSpanContext=ImmutableSpanContext{traceId=00000000000000000000000000000000, spanId=0000000000000000, traceFlags=00, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=false}, resource=Resource{schemaUrl=null, attributes={service.name="appinsights", telemetry.sdk.language="java", telemetry.sdk.name="opentelemetry", telemetry.sdk.version="1.35.0"}}, instrumentationScopeInfo=InstrumentationScopeInfo{name=io.opentelemetry.tomcat-10.0, version=2.1.0-alpha, schemaUrl=null, attributes={}}, name=GET /actuator/health, kind=SERVER, startEpochNanos=1713774532712869100, endEpochNanos=1713774532774249100, attributes=AttributesMap{data={thread.id=65, http.request.method=GET, http.route=/actuator/health, http.response.status_code=200, network.peer.address=127.0.0.1, server.address=localhost, client.address=127.0.0.1, url.path=/actuator/health, server.port=8080, network.protocol.version=1.1, user_agent.original=Apache-HttpClient/4.5.14 (Java/17.0.10), network.peer.port=60098, url.scheme=http, thread.name=http-nio-8080-exec-4, applicationinsights.internal.is_pre_aggregated=true}, capacity=128, totalAddedValues=15}, totalAttributeCount=15, events=[], totalRecordedEvents=0, links=[], totalRecordedLinks=0, status=ImmutableStatusData{statusCode=UNSET, description=}, hasEnded=true}
2024-04-22 10:28:57.251+02:00 DEBUG c.a.m.o.e.i.p.TelemetryItemExporter - sending telemetry to ingestion service:
{"ver":1,"name":"Metric","time":"2024-04-22T08:28:57.251Z","iKey":"ec7d4b96-3d1e-405a-8d5f-0d90258b5785","tags":{"ai.internal.sdkVersion":"java:3.5.1","ai.cloud.roleInstance":"...","ai.cloud.role":"appinsights"},"data":{"baseType":"MetricData","baseData":{"ver":2,"metrics":[{"name":"_OTELRESOURCE_","value":0.0}],"properties":{"telemetry.sdk.language":"java","service.name":"appinsights","service.instance.id":"...","telemetry.sdk.version":"1.35.0","telemetry.sdk.name":"opentelemetry"}}}}
{"ver":1,"name":"Request","time":"2024-04-22T08:28:52.712Z","iKey":"ec7d4b96-3d1e-405a-8d5f-0d90258b5785","tags":{"ai.internal.sdkVersion":"java:3.5.1","ai.operation.id":"0dad507a37d7da13c3a81e9139723846","ai.cloud.roleInstance":"...","ai.operation.name":"GET /actuator/health","ai.location.ip":"127.0.0.1","ai.cloud.role":"appinsights","ai.user.userAgent":"Apache-HttpClient/4.5.14 (Java/17.0.10)"},"data":{"baseType":"RequestData","baseData":{"ver":2,"id":"4eadcbc1fe5c4e8b","name":"GET /actuator/health","duration":"00:00:00.061380","success":true,"responseCode":"200","url":"http://localhost:8080/actuator/health","properties":{"_MS.ProcessedByMetricExtractors":"True"}}}}

zwilling79 avatar Apr 22 '24 09:04 zwilling79

@zwilling79 you can use OpenTelemetry Extension to filter telemetry based on http.reponse.status_code. Here is an example how to filter out telemetry based on duration . You can do something similar.

heyams avatar Apr 22 '24 18:04 heyams

Hm, this may work. Nonetheless, I would prefer to have this part of the configuration file so that it can be easily adjusted, especially if it is specific to certain environments. For instance, today I just want to filter out the health checks and the prometheus endpoint requests which have a response code of 200. Tomorrow I want to filter out some additional business application endpoints that have a response code of 200. To compile/package/distribute the otel extension JAR for such changes looks a bit overkill. Furthermore, if you want to use different configurations for different environments, you have to maintain different otel extension JARs or add more complexity to read/evaluate further configuration files.

I think, the problem in the code is that the values of the sampling override attributes are always treated as strings but the actual attribute is of type integer. So it is perhaps similar to #3378.

zwilling79 avatar Apr 23 '24 06:04 zwilling79

Only attributes set at the start of the span are available for sampling, so attributes such as http.response.status_code or request duration won't work for sampling.

Alternatively, you could try to use DCR. A tutorial: https://learn.microsoft.com/en-us/azure/azure-monitor/logs/tutorial-workspace-transformations-portal

jeanbisutti avatar Apr 24 '24 12:04 jeanbisutti

Only attributes set at the start of the span are available for sampling, so attributes such as http.response.status_code or request duration won't work for sampling.

It is very confusing which attributes are available for sampling since 3.5.0. The docs point you to the "exporting span" line but that line is basically useless as it includes the http.status_code and is not printing for example url.full which i am able to use even though it is not included in the "exporting span" line. While the next line warns you that only attributes at the start of the span are available for sampling it would be great to know which attributes are available when i set my loglevel to debug.

potzkovge avatar Apr 26 '24 15:04 potzkovge

Have done exactly the same. Enabled the debug logging to see on which fields I can filter on. And because I saw http.response.status_code=200 in the attributes list, I thought I could filter on this.

zwilling79 avatar Apr 29 '24 12:04 zwilling79

we are thinking to add a warning during startup if there are sampling override attributes used which are known not to be available at span start such as http.response.status_code

trask avatar May 13 '24 17:05 trask