Support associations with composite foreign keys
This PR introduces support for associations on composite foreign keys.
Motivation
It's possible to define composite foreign keys in migrations using refences(..., with: [...]), but Ecto doesn't currently support multi-column references out-of-the-box. Splitting the primary key into multiple columns allows using a part of the primary key for partitioning, constraints/validation, etc.
Rough idea
- Allow lists of atoms (in addition to just an atom) for
:reference,:foreign_keys,:join_keysand:typein schema macros, and - Allow only lists of atoms
:owner_keyand:related_keyin association structs
Status
This is a work in progress with very limited functionality. Early feedback is highly appreciated! What has been worked on so far:
- [x] Schema support
- [x] has_one
- [x] has_many
- [x] has_through
- [x] belongs_to
- [x] many_to_many
- [x] Changeset support
- [x] Integration tests
- [x] has_one
- [x] has_many
- [x] has_through
- [x] belongs_to
- [x] many_to_many
- [x] Update docs
Might be related to #3351
Ping. If this is ready for another review, please let me know. :)
I'll have time to work on this next week. Thank you for the reminder!
Two days of work later, the PR is somewhat further into the woods but quite some work is still needed around supporting composite keys for many-to-many associations. The most puzzling thing so far has been finding a good way to join tables on a variable list of columns. Tried and works (at least with Postgres):
- Join on just the first column, then add the rest in a where clause (e.g. with
Enum.reduce/3). Postgres' planner doesn't care if the join condition is defined in the JOIN clause or in a WHERE clause, but I'm not sure if that's the case with other databases. - Use a recursive function together with
Ecto.Query.dynamic/2to generate theon:option for the join. I'm not sure about the runtime overhead of doing this; it's also hard to generalize this because the bindings used bydynamicare often in different positions ("first and last" vs. "the last two"). However, it is much easier to understand what's happening and there's no "hd(keys)goes into the join,tl(keys)goes into a reduce withwhere" shenanigans.
I'm still very interested in getting this finished and would appreciate your thoughts on:
- is this going in the right direction, at all?
- what's the best way to handle a slow contributor? 😅 I'm committed to eventually finishing this up, but I'm not sure on timelines.
@soundmonster my suggestion is to not support composite foreign keys on many_to_many for now. What do you think? it will allow us to merge and review part of the work and you can take your own pace. :)
Thank you for your reply, José. I've tried backtracking on ManyToMany, but the amount of changes needed to get the query planner and the HasThrough associations to work with both atoms and lists of atoms is more work than what I think is needed to complete support in ManyToMany. The current state of this branch passes all unit tests and only fails 6 integration tests (only checked with Postgres so far). I would have liked to split it into better reviewable chunks but I'm afraid that would be 2x more work 😅
we look forward to completion!
Status update:
- composite keys work for all associations 🥳
- unit tests are green (on my machine, workflows need approval)
- integration tests are green with Postgres; I haven't yet gotten Earthly to run properly to test MySQL and MSSQL, maybe it's worth trying it out on CI?
- there's still quite a few TODOs for cleanup, esp. removing
List.wrap/1everywhere - docs are still open.
Where I need help:
- I'm using dynamic fragments to generate fields for association
joins andwheres. This doesn't only affect composite keys, but all associations.- Is this expected to have a significantly negative performance impact?
- Is there a better/other way to do this, apart from returning plain AST?
- It looks to me like all detailed association docs are in
association.exand there's no dedicated guide, but I might be missing something. Could you please point me to documentation / examples on associations outside orassociation.ex? - Probably workflow approval here and maybe for the companion PR in ecto_sql.
I am looking forward to completion as well!
That’s great! There are many ways you can help:
- review the code and give feedback
- run it locally. Do all tests pass too?
- use it in a project and give it a try! Is the behavior what you expect?
Looking forward to your thoughts!
--
José Valimhttps://dashbit.co/ https://dashbit.co/
Just to say I am hugely looking forward to this feature and very happy to test it out!
Thank you very much for the review! It's been super helpful!
A quick update:
- CI flagged that the pre-2022 state of this brunch didn't work on MySQL as neither the DB engine nor the driver supports the
arraydata type. I've put in some work into making it run on both MySQL and Postgres, at least to the point where all integration and unit tests pass, including the new ones written for this specific feature. I'll update on MSSQL when I have it running, too. - I have produced quite a heap of spaghetti while trying to get everything to work. I'll have to refactor large parts of
association.exwith these goals in mind:- address the issue with binding positions being incorrect
- either get rid of
dynamicby generating AST directly, or at least not calldynamicfor the "normal" case of single-field associations.
- I'll also go over the other places I've changed and try to minimize all the "wrap/unwrap" dances I've produced.
Hence explicitly not asking for a review at this point. This is very much a work in progress still.
Another quick update: I've finally managed to run the whole test matrix with Earthly locally and it is passiing all tests so far 😌
I’ve recently been testing this out with some great results so far!
Some things I noticed when introducing this branch as a dependency in my project:
-
owner_keyandrelated_keyonEcto.Association.BelongsTo,Ecto.Association.Hasare always lists- it looks like in the current ecto version they are atoms
- this caused a regression in absinthe’s
dataloader, which expects atoms
Was the decision to always force owner_key and related_key to be lists intentional? I saw a bit of discussion about it above, but was curious for more insight.
Great work on this, it’s been a huge unlock and let me know how I can help to test things out more.
I’ve recently been testing this out with some great results so far!
Thank you for looking into it! This is awesome!
Some things I noticed when introducing this branch as a dependency in my project:
owner_keyandrelated_keyonEcto.Association.BelongsTo,Ecto.Association.Hasare always lists
- it looks like in the current ecto version they are atoms
- this caused a regression in absinthe’s
dataloader, which expects atomsWas the decision to always force
owner_keyandrelated_keyto be lists intentional? I saw a bit of discussion about it above, but was curious for more insight.
My understanding of the decision is that the two fields are written once and read all over the place; it's easier and less bug-prone to write them in the expected format once, instead of having to List.wrap them everywhere they're accessed.
Great work on this, it’s been a huge unlock and let me know how I can help to test things out more.
Happy to hear this, and hopefully we'll have new iterations to test soon!
@josevalim I've cleaned up a bit and also corrected the assumptions about bindings/offsets. I would like to tackle removing dynamic and using something better instead; I'm not sure how to proceed though. One straightforward option I see would be supporting up to three columns per association, and have hard-coded cases for that. A middle ground would be hard-coding up to three or four, and using dynamic for everything above. What do you think?
I will give updating docs a try in the meanwhile.
I've managed to get rid of dynamic for constructing association queries. Will work on docs next. As always, testing is very welcome!
Hey there @soundmonster any progress on this by any chance? Super excited to see this upstream!
I've managed to get rid of
dynamicfor constructing association queries. Will work on docs next. As always, testing is very welcome!
Hey @soundmonster, yeah, me too. I've arrived at a point in my project where I have a very clear use-case for a composite primary key in all several associations and I'd love to help get your PR code tested so it can get marged into the main branch.
Unfortunately, I've extremely limited bandwidth and have never even built any version of erlang, OTP, Elixir, Phoenix or such on my in my own environment, let alone dabbled with alternative versions. I'm willing to help and offer my user-case as real-world validation of the need and implementation. In short, please talk me through what I'd need to do to get your version on my server so I can put it to the test in a way that helps you and your case get this important adaptation into the main-stream code.
I'm close to taking a first version of my new service live and I could conceivably do that without composite keys since I'd be running it in a single region whereas my use-case comes into play where I have parts of the data replicating glocally using the database itself and parts of the data remaining mastered in a specific region unless it becomes cached as such in one or more other regions. If possile I'd love to build the region login into the first release already to save on migration implications down the line so my first objective is to get a feel for how close to "production quality" this PR is in terms of time and otherwise.
Hi @MarthinL,
To test his branch you can point your Ecto and Ecto SQL dependencies at his branches in mix.exs and then update your deps:
{:ecto, git: "https://github.com/soundmonster/ecto.git", branch: "composite_foreign_keys"} {:ecto_sql, git: "https://github.com/soundmonster/ecto_sql.git", branch: "composite_foreign_keys"}
As a general comment, I'm not sure you would need to block features in your app because of this. You could always have a unique id column that's used for the association.
Hi @MarthinL,
To test his branch you can point your Ecto and Ecto SQL dependencies at his branches in
mix.exsand then update your deps:{:ecto, git: "https://github.com/soundmonster/ecto.git", branch: "composite_foreign_keys"} {:ecto_sql, git: "https://github.com/soundmonster/ecto_sql.git", branch: "composite_foreign_keys"}
As a general comment, I'm not sure you would need to block features in your app because of this. You could always have a unique id column that's used for the association.
Thanks, I was unaware that the deps format in mix.exs allowed for such syntax
However, when I try to use your two lines as is, I run into a series of problems starting with the inconvenient fact defp deps do in mix.exs has {:phoenix_ecto, "~> 4.4"}, {:ecto_sql, "~> 3.6"},
i.e. :ecto isn't a direct dependency but brought in by :ecto_sql but :phoenix_ecto is a direct dependency that is incompatible with the :ecto direct dependency.
If I substitute your two lines for the two I've listed above, the result of running mix deps.get is:
Dependencies have diverged:
-
ecto (https://github.com/soundmonster/ecto.git - origin/composite_foreign_keys) the dependency ecto in mix.exs is overriding a child dependency:
In mix.exs: {:ecto, [env: :prod, git: "https://github.com/soundmonster/ecto.git", branch: "composite_foreign_keys"]}
In deps/ecto_sql/mix.exs: {:ecto, [env: :prod, path: "../ecto"]}
Ensure they match or specify one of the above in your deps and set "override: true" ** (Mix) Can't continue due to errors on dependencies
If I only substitute the :ecto_sql lines mix deps.get reports:
Dependencies have diverged:
-
ecto (../ecto) different specs were given for the ecto app:
In deps/ecto_sql/mix.exs: {:ecto, [env: :prod, path: "../ecto"]}
In deps/phoenix_ecto/mix.exs: {:ecto, "~> 3.3", [env: :prod, hex: "ecto", repo: "hexpm", optional: false]}
Ensure they match or specify one of the above in your deps and set "override: true" ** (Mix) Can't continue due to errors on dependencies
LIke I stated before, I'm quite unfamiliar with working with anything but the latest and greatest released versions of anything, so I'm afraid I'm going to need a little more help to get this done. Thank you.
As for your suggestion of a unique id for the association, and provided I understand you correctly, I have considered that, but the benefit of having region_id as an explicit part of the composite key is significant as it allows any region encountering a reference to such a piece of data to determine which region to fetch it from. If I used a UUID the region would need to send each request for data not known locally off to a central directory (which would nullify distributed data benefits) to determine the region or each region would need a pre-emptive local copy of at least a record that maps the UUID to a region. Since the data items are small, that would mean having all data everywhere just like it would be if the database was doing to replication.
You can still have foreign keys while maintaining a unique ID for the association itself.
For your deps issue it looks like it's asking you to put override: true so it overrides those nested deps.
You can still have foreign keys while maintaining a unique ID for the association itself.
I suspect we're talking cross-purposes with regards to my use case. If it is of particular interest to you and/or would be valuable to others, I can lay out the use case involging a mixture of database replication based gloval data and regional data stored also in database though not replicated by the database engine but "manually" in the distributed server which requests copies directly from the region where the data originates. The bottom line is that I have a number of relationships, including a self-referencing many-to-many association where the association entity has additional fields in my project. Given that the entity involved in the many-to-many is also the one with the composite primary key, it should be quite a stress test for @soundmonster's code.
I'm sure your use case would benefit from this a lot and would be a good test. I'd just hate to see your app be limited if this is not merged anytime soon. I don't believe there is any situation that can't be solved by introducing a unique field in addition to the primary keys you currently have. But I'll leave it up to you what you think is the best way forward.
For your deps issue it looks like it's asking you to put
override: trueso it overrides those nested deps.
I solved it another way:
I cloned phoenix_ecto from the main branch, ecto_sql and ecto from soundmonster's pr and then used the path: "/local/dir/to/code" deps syntax to draw in the code I needed which meant that I could update the mix.exs in the now local phoenix_ecto directory to refer to the right version of ecto using the technique involving the environment variable ECTO_PATH I found in ecto_sql/mix.exs. I don't know how well this would play if I tried building a release with it, but it gets the job done.
For reference:
In my ~/.zshrc I have
export ECTO_PATH=/Users/marthin/projects/pr-workdir/ecto
In my (test project called Bennie)'s mix.exs i have
defp deps do
[
{:phoenix, "~> 1.6.15"},
{:phoenix_ecto, path: "/Users/marthin/projects/pr-workdir/phoenix_ecto"},
{:ecto_sql, path: "/Users/marthin/projects/pr-workdir/ecto_sql"},
...
and in .../phoenix_ecto/mix.exs I have:
defp deps do
[
{:phoenix_html, "~> 2.14.2 or ~> 3.0", optional: true},
ecto_dep(),
# {:ecto, "~> 3.3"},
....
]
end
defp ecto_dep do
if path = System.get_env("ECTO_PATH") do
{:ecto, path: path}
else
{:ecto, "~> 3.8.0-dev", github: "elixir-ecto/ecto"}
end
end
which has been copied verbatim from .../ecto_sql/mix.exs
mix deps.get and mix deps.compile runs without issues and in the one test so far I've cooked up the PR seems to work as expected. I'd be nice to chat with Leo B, Jose Valim or anyone involved to know more definitively what how the PR is intended to get activated. I'll write another comment asking more directly to clarify my understanding.
Hi @soundmonster and @josevalim,
So far so good, I have set up this PR's code on my machine and it seems to be running as expected. I'm not sure I fully understand under what conditions it comes into play though. Is there a specific syntax or set of options I need to specify somewhere in the migration and/or the schema where I define the primary keys and the associations? If so, what are those?
My current understanding is as follows: a) In the migration file I need to ensure that all the fields comprising the composite primary key is present. For my case, where i have an items table with the usual autonumbering id field as well as a region_id field which is in fact a foreign key to another table, I'd have this in the migration
create table(:items) do
add :region_id, :integer, [primary_key: true, references: "regions"]
add :name, :string
timestamps()
end
b) In the schema file, something similar is happening: Again the usual id field is kept as is and the region_id is added to that as in:
schema "items" do
field :region_id, :integer, [primary_key: true]
field :name, :string
belongs_to :region, Region, [define_field: false]
timestamps()
end
c) With the above in place, Item now has a composite key. When I now introduce a third schema Note (table notes) I need the migration to contain a region_id as well as an item_id field
def change do
create table(:notes) do
add :line, :string
add :region_id, :integer
add :item_id, references("items", with: [region_id: :region_id])
timestamps()
end
and I need the schema to read:
schema "notes" do
field :line, :string
belongs_to :region, Region, [primary_key: true]
belongs_to :item, Item
timestamps()
end
It works as expected, but is gets a little fuzzy there. Do I need the belongs_to :region, Region, [primary_key: true] in schema "notes" or is it optional?
It is my current understanding that the belongs_to :item, Item line in Note has the effect of adding the fields item_id (derived from the association name) as well as region_id because region_id is part of Item's primary key. Is that an accurate understanding, is the region_id field present as a result of the belongs_to :region, Region line or do I have the whole thing backwards?
Once I've have the belongs_to association sorted, I want to move onto the other associations and build my way up towards the self-referencing many-to-many with added fields for the associative entity (i.e. where the many-to-many association is read-only and one uses two has-many/belongs_to pairs to work with the associative entity's content) .
Perhaps my little journey of discovery can help the efforts to write documentation for this feature. I suspect it is going to get a lot more hairy with options to set foreign keys and source fields as the use case ramps up.
I'm sure your use case would benefit from this a lot and would be a good test. I'd just hate to see your app be limited if this is not merged anytime soon. I don't believe there is any situation that can't be solved by introducing a unique field in addition to the primary keys you currently have. But I'll leave it up to you what you think is the best way forward.
Yes, adding a unique field in addition to the existing primary key is exactly what I am doing. (I thought you wanted to replace the existing id plus the unique field with a single globally unique field, which I don't think would work for the reasons I've explained). The only thing is that I make the unique additional field a little more meaningful by having it as a foreign key to a regions table where I can store pertinent information about the region such as the ip addresses and/or dns names where I the region's backend servers can be reached. The region table would form part of what the database replicates to all the distributed nodes, and I can then also arrange the regions into a hierarchy of regions by adding a parent_id reference.
I'm sure your use case would benefit from this a lot and would be a good test. I'd just hate to see your app be limited if this is not merged anytime soon. I don't believe there is any situation that can't be solved by introducing a unique field in addition to the primary keys you currently have. But I'll leave it up to you what you think is the best way forward.
On that: It is my understanding that it isn't a new feature to have additional fields as part of primary keys. The purpose and essence of @soundmonster's PR is to have associations honour those composite primary keys more or less transparently. Is that a fair assessment?
On that: It is my understanding that it isn't a new feature to have additional fields as part of primary keys. The purpose and essence of @soundmonster's PR is to have associations honour those composite primary keys more or less transparently. Is that a fair assessment?
I'm using composite primary keys in a mutli-tenant app currently and the whole ecto-sql portion of it already works. You can prepare a database with composite primary keys and composite foreign keys today, and it works as you would expect. I believe this PR is just for teaching the association functionality built into ecto to understand those composite keys so it can join and preload associations using the correct primary key (instead of just one of the fields).
I'm stuck on syntax and semantics and need help.
My little exploratory project came along just fine as long as I stick to simple belongs_to :owner, Owner and has_many :children, Child relationships, i.e. where the field names derived from the association name and the (composite) primary key fields defined on the Child schema.
The relevant parts of my test code is as follows:
from the migration files
create table(:items) do
add :region_id, :integer, [primary_key: true, references: :regions]
add :name, :string
end
create index(:items, [:region_id])
create table(:items) do
add :region_id, :integer, [primary_key: true, references: :regions]
add :name, :string
end
create index(:items, [:region_id])
create table(:notes) do
add :line, :string
add :region_id, :integer
add :item_id, references("items", with: [region_id: :region_id])
end
And from the schema files:
Region:
schema "regions" do
field :label, :string
has_many :items, Item
end
Item:
schema "items" do
field :region_id, :integer, primary_key: true
field :name, :string
belongs_to :region, Region, [define_field: false]
has_many :notes, Note
end
Note:
schema "notes" do
field :line, :string
belongs_to :region, Region, [primary_key: true]
belongs_to :item, Item
end
I've put all of that through its paces and it all seems to work well.
But I get stuck when I need to start using options of the associations such as foreign_key: and references: to override the default field names. It might be required in other situations as well, but in my specific use case the situation where I have to refer to different field names as foreign keys is when I try to implement a self-referencing many-to-many relationship such as I have in my read system. Now, because my associative entity has attributes of its own, the many_to_many association I use is a :through association (and thus read-only) and I hardly ever actually refer to it, so if it needs to go it's no big deal.
What I use instead is to create two has_many relationships in the schema I aim to have the self-referencing many-to-many behaviour for with two (now sets of) foreign key fields and in the associative entity I define two belongs_to associations referring to those same fields. So far in my real project that has left me with very functional relationships that responds well to all the preload and build_assoc calls I throw at it.
Having now introduced the composite primary key into the mix, I cannot seem to figure out how to modify the foreign_key: options I've used before so they would refer to both primary key fields and two differently named fields in the associative entity.
Let's call the associative entity Relship. Before declaring Item's region_id as part of its primary key, Relship's code looked like this:
create table(:relships) do
add :label, :string
add :parent_item_id, references(:items)
add :child_item_id, references(:items)
end
create index(:relships, [:parent_item_id])
create index(:relships, [:child_item_id])
and the schema like this:
schema "relships" do
field :label, :string
field :parent_item_id, :id
field :child_item_id, :id
belongs_to :parent_item,
Item,
foreign_key: :parent_item_id,
define_field: false
belongs_to :child_item,
Item,
foreign_key: :child_item_id,
define_field: false
end
which (i.e. the above code for Relship) is obviously broken as it would only work for the one default region because the relships themselves have id's that doesn't include region_id and the foreign keys in Relship doesn't store each Item's region_id. That's the problem I set out to try solve before I go live to avoid the hassle of having to migrate the database and code before I can introduce a second region.
So, adding region support in Relship, I get this far:
In migration:
create table(:relships) do
add :region_id, :integer, [references: :regions, primary_key: true]
add :label, :string
add :parent_item_region_id, :integer, [references: :regions]
add :parent_item_id, references(:items, with: [parent_item_region_id: :region_id])
add :child_item_region_id, :integer, [references: :regions]
add :child_item_id, references(:items, with: [child_item_region_id: :region_id])
end
create index(:relships, [:parent_item_region_id, :parent_item_id])
create index(:relships, [:child_item_region_id, :child_item_id])
and the schema like this:
Relship:
schema "relships" do
field :region_id, :integer, [primary_key: true]
field :label, :string
field :parent_item_region_id, :integer
field :parent_item_id, :id
field :child_item_region_id, :integer
field :child_item_id, :id
belongs_to :parent_item,
Item,
foreign_key: [parent_item_region_id: :region_id, parent_item_id: :id],
define_field: false
belongs_to :child_item,
Item,
foreign_key: [child_item_region_id: :region_id, child_item_id: :id],
define_field: false
end
Item: (added to schema)
has_many :child_rels,
Relship,
foreign_key: [parent_item_region_id: :region_id, parent_item_id: :id]
has_many :parent_rels,
Relship,
foreign_key: [child_item_region_id: :region_id, child_item_id: :id]
which is obviously invalid syntax and the compiler tells me so, loudly. But I haven't been able to figure out what the approriate syntax would be, or if it's more a matter of semantics than syntax, how I would get ecto to refer to the correct sets of primary and foreign key fields.
Please, anyone, what is the intended way to specify foreign keys in associations when using composite primary keys?
Hi @soundmonster and @josevalim,
So far so good, ... except, not really.
When I couldn't seem to get the more complex variants of associations working I went back to trace through the simpler ones I reported on earlier. I know I'm running the PR code - my IO.inspect statements I insert in the code in my local copy of the code cloned from soundmonster's repository tells me so.
I have a schema called Items, which has been set up with an additional field as its primary key called region_id. Each Item belongs_to a Region, and it works correct in that I can have the Items with the same id in different regions in my data and it associates 100% correctly. What threw me off at first was that when I generate those items on the same machine using the same database sequence to autogenerate the id field for Item, I naturally got unique id values. That mean that the association with the next schema, Notes, defined in the Item schema as has_many :notes, Notes, would relate the notes of each item based on the item id which was unique through the system.
But as soon as I changed the ids of the Items so the id parts were duplicates and the primary keys only made unique by the region_id, the association code broke and returned the same notes for all Items with the same id regardless of the region_id.
My assumption is that I am still doing something wrong here, but I cannot figure out what that would be. There is also a chance that I do have the correct version of the code and correct association definitions, but that the code simply doesn't take the whole primary key into account when resolving an association.
Can we please figure this out? I'm very keen on using associations with composite key schemas in my project.
Just a note about my use-case which differs significatly from a multi-tenancy use-case in that with multi-tenancy one would expect a single tenant-id to identify all the data of a tenant with no chance for cross-tenant references. My use case is closer to data partitioning than multi-tenancy, and I strongly rely on references which cross partition boundaries. Each partition, which I call a region, has its own database with own sequence from which id values are generated using nextval (PostgreSQL) and a "home" region configured in the runtime.exs for that region. Only a portion, mostly administrative data, of the database is replicated from a master region to the others at the database level. Beyond that, each region operates independently and requests data from other regions when such reference is encoutered. this is merely context, and not part of the problem I am facing.