dynamoid icon indicating copy to clipboard operation
dynamoid copied to clipboard

Performance issues when getting all records.

Open bruno-trindade opened this issue 4 years ago • 9 comments

Hello, we're new to using Dynamoid, and DynamoDB in general. I'm hoping someone can explain why there is a large difference in latency between these calls:

MyDynamoidModel.where(the_hash_key: key).all.to_a

-VS-

@aws_ddb_resouce_table.query(
  table_name: "dynamoid_my_dynamoid_models",
  key_condition_expression: "the_hash_key = :the_hash_key",
  expression_attribute_values: {
    ":the_hash_key" => the_hash_key
   }
)

The first version runs in about 28s. The second one about 1.5s. Having to call to_a is a little weird, maybe there's a preferred way?

bruno-trindade avatar Feb 11 '21 14:02 bruno-trindade

I would say there is only one valid case when this situation can happen. Query result is paginated with 1 MB size limit (documentation). #where method loads lazily by demand all the pages. I assume that #query does not.

So please check result of each call and compare number of returned records. If they equal - we have performance issue in Dynamoid. Otherwise it's expected result.

andrykonchin avatar Feb 11 '21 14:02 andrykonchin

I would say there is only one valid case when this situation can happen. Query result is paginated with 1 MB size limit (documentation). #where method loads lazily by demand all the pages. I assume that #query does not.

So please check result of each call and compare number of returned records. If they equal - we have performance issue in Dynamoid. Otherwise it's expected result.

The number of returned records is exactly the same, a total of 1113

bruno-trindade avatar Feb 11 '21 15:02 bruno-trindade

Well, I would like to check logs and compare API requests in both cases to ensure the performance issue is related to Dynamoid internals and not related to generated API requests to DynamoDB.

Could you please provide logged requests to DynamoDB for both cases?

For Dynamoid you can enable logging this way:

Dynamoid::Config.logger.level = :debug

andrykonchin avatar Feb 11 '21 17:02 andrykonchin

I have another idea what could happen. If a hash key name in a table is not default id then it should be specified explicitly in a model class:

table key: :the_hash_key

Dynamoid doesn't analyze a DynamoDB table schema (maybe we have to change this) and relies on a schema declared explicitly in code. If the_hash_key name isn't recognized as a hash key in the expression where(the_hash_key: key) the Scan operation will be used. If hash key is recognized - then will be used more efficient Query operation.

It will be clear when we have logged API requests to DynamoDB.

andrykonchin avatar Feb 12 '21 08:02 andrykonchin

I'm not seeing any log output even after adding that configuration. But maybe it's helpful to note that I have a range key as well as a hash key, and my hash key is also declared as a field which maybe it doesn't need to be - could this be the problem? (I'll keep trying for some log output.)

table capacity_mode:      :provisioned,
          key:                :the_hash_key,
          read_capacity:      350,
          write_capacity:     100
range :the_sort_key,  :string
field :the_hash_key, :integer

bruno-trindade avatar Feb 12 '21 12:02 bruno-trindade

Confirming that removing that unneeded field declaration seems to fix the problem, and now it runs blazing fast. This is obviously a bug in my code, but I wonder if there are other dummies like me who might benefit from the docs saying you shouldn't declare the field for the hash key or something. :)

bruno-trindade avatar Feb 12 '21 13:02 bruno-trindade

Actually, without that I'm getting this error:

Aws::DynamoDB::Errors::ValidationException: One or more parameter values were invalid: Condition parameter type does not match schema type

Edit I think here's a clue. The field I had was field :the_hash_key, :integer. Now that I removed the field, I can use find to locate the record, and I can see that the attributes before typecast have the hash ID as a decimal, but after it's a string:

@attributes=
  {:created_at=>Mon, 18 Jan 2021 09:12:30 +0000,
   :updated_at=>Mon, 18 Jan 2021 09:12:30 +0000,
   :the_hash_key=>"209963.0",
  },
   @attributes_before_type_cast=
  {:created_at=>Mon, 18 Jan 2021 09:12:30 +0000,
   :updated_at=>Mon, 18 Jan 2021 09:12:30 +0000,
   :the_hash_key=>0.209963e6,
}

Is there any way I can drop the field :user_id and not lose my ability to query via where?

bruno-trindade avatar Feb 12 '21 18:02 bruno-trindade

Yeah so removing the field just means that Dynamoid now just assumes the key is of the "string" type, which causes the mismatch with DynamoDB. My initial tests were flawed, so I think this long query lag is still a problem. I'll continue to try to get log output.

bruno-trindade avatar Feb 12 '21 19:02 bruno-trindade

Regarding logging. This code snippet

class MyDynamoidModel
  include Dynamoid::Document

  table capacity_mode:      :provisioned,
            key:                :the_hash_key,
            read_capacity:      350,
            write_capacity:     100
  range :the_sort_key,  :string
  field :the_hash_key, :integer
end

Dynamoid.logger.level = :debug
MyDynamoidModel.create_table
MyDynamoidModel.where(the_hash_key: 1).all.to_a

# => []

produces the following logs:

D, [2021-02-12T23:28:58.087077 #43894] DEBUG -- : [Aws::DynamoDB::Client 200 0.007128 0 retries] list_tables(exclusive_start_table_name:nil)

D, [2021-02-12T23:28:58.087569 #43894] DEBUG -- : (18.69 ms) LIST TABLES
D, [2021-02-12T23:28:58.087618 #43894] DEBUG -- : (19.08 ms) CACHE TABLES
I, [2021-02-12T23:28:58.087669 #43894]  INFO -- : Creating dynamoid_mydynamoidmodels table. This could take a while.
D, [2021-02-12T23:28:58.110728 #43894] DEBUG -- : [Aws::DynamoDB::Client 200 0.022555 0 retries] create_table(table_name:"dynamoid_mydynamoidmodels",key_schema:[{attribute_name:"the_hash_key",key_type:"HASH"},{attribute_name:"the_sort_key",key_type:"RANGE"}],attribute_definitions:[{attribute_name:"the_hash_key",attribute_type:"N"},{attribute_name:"the_sort_key",attribute_type:"S"}],billing_mode:"PROVISIONED",provisioned_throughput:{read_capacity_units:350,write_capacity_units:100})

D, [2021-02-12T23:28:58.110923 #43894] DEBUG -- : (23.25 ms) CREATE TABLE
D, [2021-02-12T23:28:58.119548 #43894] DEBUG -- : [Aws::DynamoDB::Client 200 0.008031 0 retries] describe_table(table_name:"dynamoid_mydynamoidmodels")

D, [2021-02-12T23:28:58.135503 #43894] DEBUG -- : [Aws::DynamoDB::Client 200 0.01518 0 retries] query(consistent_read:false,scan_index_forward:true,table_name:"dynamoid_mydynamoidmodels",key_conditions:{"the_hash_key"=>{comparison_operator:"EQ",attribute_value_list:[{n:"1"}]}},query_filter:{},attributes_to_get:nil)

I assume that AWS client log formatter could not be set so please try to set any from the list described here https://github.com/Dynamoid/dynamoid/pull/459.

Anyway, you can just provide complete code snippets to reproduce the issue - both for Dynamoid (with model class declaration) and AWS SDK.

andrykonchin avatar Feb 12 '21 21:02 andrykonchin