efcore.pg icon indicating copy to clipboard operation
efcore.pg copied to clipboard

Adding support for postgres-xl table distribution

Open jonmorgs opened this issue 4 years ago • 4 comments

This adds support for specifying a distribution strategy (docs) when creating tables against postgres-xl.

[ 
  DISTRIBUTE BY { REPLICATION | ROUNDROBIN | { [HASH | MODULO ] ( column_name ) } } |
  DISTRIBUTED { { BY ( column_name ) } | { RANDOMLY } |
  DISTSTYLE { EVEN | KEY | ALL } DISTKEY ( column_name )
]

jonmorgs avatar Feb 13 '21 13:02 jonmorgs

Thanks for your feedback here! I did some further testing against a local postgres-xl instance and saw that the following queries were possible:

CREATE TABLE distribute_test.test_distribute_by_replication (id int, first_name varchar(100), last_name varchar(100), age int) DISTRIBUTE BY REPLICATION ;
CREATE TABLE distribute_test.test_distribute_by_roundrobin (id int, first_name varchar(100), last_name varchar(100), age int) DISTRIBUTE BY ROUNDROBIN ;
CREATE TABLE distribute_test.test_distribute_by_hash_id (id int, first_name varchar(100), last_name varchar(100), age int) DISTRIBUTE BY HASH (id);
CREATE TABLE distribute_test.test_distribute_by_modulo_id (id int, first_name varchar(100), last_name varchar(100), age int) DISTRIBUTE BY MODULO (id);

-- DOES NOT WORK: CREATE TABLE distribute_test.test_distribute_by_id (id int, first_name varchar(100), last_name varchar(100), age int) DISTRIBUTE BY (id);

CREATE TABLE distribute_test.test_distributed_by_id (id int, first_name varchar(100), last_name varchar(100), age int) DISTRIBUTED BY (id);
CREATE TABLE distribute_test.test_distributed_randomly (id int, first_name varchar(100), last_name varchar(100), age int) DISTRIBUTED RANDOMLY;

CREATE TABLE distribute_test.test_diststyle_even (id int, first_name varchar(100), last_name varchar(100), age int) DISTSTYLE EVEN;
CREATE TABLE distribute_test.test_diststyle_all (id int, first_name varchar(100), last_name varchar(100), age int) DISTSTYLE ALL;
CREATE TABLE distribute_test.test_diststyle_key_distkey_id (id int, first_name varchar(100), last_name varchar(100), age int) DISTSTYLE KEY DISTKEY (id);

Looks like DISTRIBUTED BY ({columnname}) is valid, while DISTRIBUTE BY ({columnname}) is invalid, as well as DISTSTYLE EVEN or ALL also being invalid with a distribution key.

jonmorgs avatar Feb 28 '21 19:02 jonmorgs

I've answered some questions, let me know if you're blocking on anything from my side (note there's still also my questions in https://github.com/npgsql/efcore.pg/pull/1697#pullrequestreview-590620542 which may need addressing).

roji avatar Mar 05 '21 11:03 roji

Looks like DISTRIBUTED BY ({columnname}) is valid, while DISTRIBUTE BY ({columnname}) is invalid, as well as DISTSTYLE EVEN or ALL also being invalid with a distribution key.

Yeah, it seems... complicated :) But I think the fluent APIs we expose should correspond to what is actually supported on the database side.

roji avatar Mar 05 '21 11:03 roji

@Jmorjsm just to say that when this is ready for another look, please re-request a review (preferably after this passes the build too).

roji avatar Apr 07 '21 12:04 roji