fusionauth-issues icon indicating copy to clipboard operation
fusionauth-issues copied to clipboard

Validate user data depth to prevent Elasticsearch issues

Open bhalsey opened this issue 1 year ago • 3 comments

Validate user data depth to prevent Elasticsearch issues

Description

If the json data in a User.data field has a depth > 20, Elasticsearch will not index it. This causes issues in a reindex and results in high CPU usage. We should not allow data with a depth > 20 to prevent this. There may be additional improvements to be made in the reindex operation (I suspect it's continuously retrying this operation).

The max depth limit is similar to the max number of fields limit in Elasticsearch. See also

  • https://github.com/FusionAuth/fusionauth-issues/issues/2457

Observed versions

Observed in 1.46.0

Affects versions

Steps to reproduce

Expected behavior

Screenshots

If applicable, add screenshots to help explain your problem. Delete this section if it is not applicable.

Platform

Linux and Elasticsearch 7.6.1

Related

  • https://github.com/FusionAuth/fusionauth-issues/issues/1640
  • https://github.com/FusionAuth/fusionauth-issues/issues/2457

Community guidelines

All issues filed in this repository must abide by the FusionAuth community guidelines.

Additional context

Add any other context about the problem here.

bhalsey avatar Mar 06 '24 17:03 bhalsey

Error from the search logs for context:

java.lang.IllegalArgumentException: Limit of total fields [1000] in index [fusionauth_user] has been exceeded
	at org.elasticsearch.index.mapper.MapperService.checkTotalFieldsLimit(MapperService.java:614) ~[elasticsearch-7.6.1.jar:7.6.1]

Will a check for a depth of 20 properly mitigate this? I'm not sure that this is restricted to an issue in the data column and so we may need to validate the whole user object to ensure that it fits, although we will need to mind the performance of any such approach so a depth of 20 might be a reasonable approximation

lyleschemmerling avatar Mar 06 '24 17:03 lyleschemmerling

Will a check for a depth of 20 properly mitigate this? I'm not sure that this is restricted to an issue in the data column and so we may need to validate the whole user object to ensure that it fits, although we will need to mind the performance of any such approach so a depth of 20 might be a reasonable approximation

The log for the depth of 20 being exceeded was

java.lang.IllegalArgumentException: Limit of mapping depth [20] in index [fusionauth_user] has been exceeded due to object field [redacted]

Elasticsearch threw an IllegalArgument for two different (but related) violations

bhalsey avatar Mar 06 '24 17:03 bhalsey

Gotcha. In that case we may want to examine the set of restrictions at https://www.elastic.co/guide/en/elasticsearch/reference/7.17/mapping-settings-limit.html

I'm not sure if there are equivalent settings in opensearch. The closest I could find was this https://opensearch.org/docs/latest/install-and-configure/install-opensearch/index/

lyleschemmerling avatar Mar 06 '24 17:03 lyleschemmerling