fluent-kit Partial models

FluentKit currently allows a few ways to get instances of a model with uninitialized properties:

1: Non eager-loaded relations (i.e., not using with(\.$star))
2: Partial query read (i.e., Planet.query(on: db).field(\.$name).all())
3: Not setting all properties in model's convenience init (or calling empty init())

Uninitialized properties can be a footgun if not handled correctly as access yields a fatalError. There's not much we can do about (3) given the limitations of property wrappers. However, we can make (1) and (2) easier to avoid. I believe these are also the most common source of misuse.

I present the following API as a possible solution:

import Fluent

final class Planet: Model {
    static let schema = "planets"

    @ID
    var id: UUID?

    @Field(key: "name")
    var name: String

    @Parent(key: "star_id")
    var star: Star

    init() { }

    init(id: UUID? = nil, name: String, star: Star) {
        self.id = id
        self.name = name
        self.star = star
    }
}

// A subset of the Planet model
final class PlanetWithoutStar: PartialModel {
    typealias Base = Planet

    @Partial(id: \.$id)
    var id: UUID?

    @Partial(field: \.$name)
    var name: String

    init() { }
}

// SELECT * FROM planets
// SELECT * FROM stars WHERE id IN (...)
let planets = try Planet.query(on: db).all().wait()
for planet in planets {
    print(planet.id)
    print(planet.name)
    print(planet.star) // Eager loaded by default
}

// SELECT id, name FROM planets
let planetsWithoutStars = try PlanetWithoutStar.query(on: db).all().wait()
for planet in planets {
    print(planet.id)
    print(planet.name)
    print(planet.star) // Compiler error
}

Note that in addition to the introduction of PartialModel and @Partial (naming aside for now), this proposal also makes query builder eager load all relations by default. To opt out of eager loading, you create a partial model. (There could of course be something like unsafeWithout(\.$relation) but that's beside the point).

This change solves (1) and (2) by allowing Fluent to require that any model it gives you has been fully initialized. However, the cost is added verbosity. Complex projects may need to define several "partial models" for each model, and for large models this could be a pain.

The change as proposed is major breaking, so it would need to go in the next release. That is still a ways off, but it doesn't hurt to talk about it now. Thoughts?

Edit: Some additional ideas for how this could look:

`Query` protocol

User defines `Query` and `QueryResult` conformers. `Query` specifies `QueryResult` that will define which data is fetched. `Query` also defines things like `filters` and `sorts` which determine which data is referenced by the query. Function builders are used to create simple syntax. Joins / eager loads are determined and decided automatically by the framework.

struct PlanetWithStar: QueryResult {
    typealias Model = Planet 

    @Fetch(\.$id)
    var id: UUID

    @Fetch(\.$name)
    var name: String

    // fetch entire star
    @Fetch(\.$star)
    var star: Star

    // or just some fields
    @Fetch(\.$star)
    var star: StarNameOnly

    struct StarNameOnly: QueryResult {
        typealias Model = Star
        @Fetch(\.$name)
        var name: String
    }
}

struct EarthLikePlanets: Query {
    @FilterBuilder
    var filters: Filters {
        \.$type == .smallRocky
        \.$numMoons >= 1 && \.$numMoons <= 2
        \.$name == "Earth" || \.$star.$name == "Sun"
    }

    @SortBuilder
    var sorts: Sorts {
        \.$name
    }

    typealias Result = PlanetWithStar
}

Aug 19 '20 20:08 tanner0101

I really like this idea, but I am really dislike the idea of auto-eager loading relations. To me, it feels like trading the danger of shooting your foot off with a handgun and instead giving everyone the ability to shoot their foot off with a bazooka.

Here's a concrete example - we have a model called Caseload which defines which users a certain administrator can access. For example, administrator 1 might only be able to access Student 1, Student 3, Student 6, etc.

final class Caseload: Content {
    @ID(custom: .id)
    var id: Int?

    /// The user (non-student) who can access a student
    @Parent(key: "user_id")
    var user: User

    /// The student who belongs to this caseload
    @Parent(key: "student_id")
    var student: User
}

Now, imagine a case where an administrator has access to thousands of students (a frequent case in our dataset). A simple query now becomes absolutely detrimental to our performance and stability:

Caseload.query(on: db).filter(\.$user.$id == 1).all()

In Vapor 4, this runs a simple SQL select query:

select * from caseloads where user_id = 1

whereas in Vapor 5 I assume it would do something like:

select * from caseloads 
join users students on students.id = caseloads.student_id
join users admins on admins.id = caseloads.user_id
where caseloads.user_id = 1

Obviously case 2 is way worse. And as a result of the design there is no type safety built in to warn us about that impact - it falls on the developer/code review to remember to exclude every relationship. And what happens if you add a new relationship? Every query that doesn't have a partial model needs to be updated to exclude it? Granted, our project is really large (hundreds of models) so a change like this would certainly disproportionately hurt us, but I also think it's bad for beginners.

All that being said, I am a big +1 for explicit partial model selects, that would clean up a ton of repetitive code for us.

Aug 19 '20 20:08 jdmcd

And what happens if you add a new relationship? Every query that doesn't have a partial model needs to be updated to exclude it?

This is a very good point. Either model should not eager load everything by default or it should not be possible to use model to make a query directly. It seems like allowing model to be used directly without eager loading (unsafe) and allowing partial models to be declared that require all properties (safe) would be a good compromise.

Another thing worth considering here is joins. Maybe we should think about this less as PartialModel and more as QueryResult where the contents can span multiple models.

(More context from Discord discussion: https://discordapp.com/channels/431917998102675485/438857881970933780/745741871594340518)

Aug 19 '20 21:08 tanner0101

It seems like allowing model to be used directly without eager loading (unsafe) and allowing partial models to be declared that require all properties (safe) would be a good compromise.

I love this!! Feels like a great middle ground.

Aug 19 '20 22:08 jdmcd

I wonder about the ergonomics of the property being a Future<Model> instead. With async/await on the way with Swift 6 (hopefully) it would be quite elegant in use. The model could still be eager loaded (defaulting to not eager loading), you just care less about it in practice. And no more fatals.

Making the developer have to declare partial models seems acceptable also, but less pleasant in use overall.

Though admittedly we only use https://github.com/candor/sublimate so futures don’t irk us in an ergonomics sense.

Edit: I guess it would suck in use for most people. But would be fine if Swift 6 has async/await.

Aug 20 '20 20:08 mxcl

I wonder about the ergonomics of the property being a Future<Model> instead.

@mxcl that's a good point and something I've considered as well. The relation could be an async func (or maybe even an async property if Swift ever allowed that).

Hypothetically, this could look like:

let earth = await Planet.find(...)
let stars = await earth.stars

With this syntax, you're always using the await keyword. If the relation was eager loaded (or cached from a previous load), it's synchronous. Otherwise it does the fetch right there.

The problem with this is that forgetting with in the query can still be a bug. In this case its moved from being a fatalError bug to a performance issue which can range from mild to server melting. So really the problem has not been solved.

Aug 28 '20 18:08 tanner0101