Implement iterator to handle large amount or rows
- [ ] better-sqlite3
- [ ] bun-sqlite
- [ ] d1
- [ ] libsql
- [x] mysql2
- [ ] neon-serverless
- [ ] node-postgres
- [ ] planetscale-serverless
- [ ] postgres-js
- [ ] sql-js
- [ ] sqlite-proxy
Any ETA on adding this to postgres?
EDIT: my pg implementation https://github.com/drizzle-team/drizzle-orm/issues/456#issuecomment-2466655794 @christophemarois's postgres impmlementation https://github.com/drizzle-team/drizzle-orm/issues/456#issuecomment-2299294351
This seems very useful for v1
I built a pretty straightforward higher-order implementation of result streaming with the postgres-js driver.
Posting this here so people can use it while waiting for native .iterator() support. Maybe this can even help someone put up a PR? I'm too unfamiliar with drizzle internals to add it myself.
import { PgSelect } from 'drizzle-orm/pg-core'
import postgres from 'postgres'
/** Allow using a drizzle query with postgres.js' cursor-based AsyncIterator. To
* be removed once postgres iterator is added to drizzle
* https://github.com/drizzle-team/drizzle-orm/issues/456
*
* @example
* const streamQuery = createPostgresDrizzleStreamer(postgresClient)
* const documentsReader = streamQuery(db.select().from(documents).$dynamic(), 2000)
* for await (const documentsChunk of documentsReader) {}
*/
export function createPostgresDrizzleStreamer(postgresClient: postgres.Sql) {
return async function* streamQuery<
Qb extends PgSelect,
Result extends Qb['_']['result'][number],
>(qb: Qb, chunkSize?: number): AsyncGenerator<Result[]> {
const q = qb.toSQL()
const chunkReader = postgresClient
.unsafe<Result[]>(
q.sql,
q.params as Parameters<typeof postgresClient.unsafe>[1],
)
.cursor(chunkSize)
for await (const chunk of chunkReader) {
yield chunk.map((result) => {
return Object.fromEntries(
Object.entries(result).map(([k, v]) => {
const col = qb._.selectedFields[k]
if (col === undefined) {
throw new Error(`col ${k} does not exist in schema`)
}
return [k, col.mapFromDriverValue(v)] as const
}),
) as Result
})
}
}
}
Please implement iterator not only db.select.from(table).iterator() also to db.query.table.findMany().
Since this issue is quite old, I'll update the status of this feature:
MySQL
- [x] mysql2
planetscale and tidb don't support streaming.
Postgres
- [ ] pg - Requires package
pg-cursor - [ ] postgres - Built in cursor API
- [ ] neon-websockets - Since it's
pgbased, it might support it, but there's no official resource on streaming or a cursor API. - [ ] vercel-postgres - Same comment as
neon-websockets.
neon-http, xata, pglite and aws-data-api don't have any docs about streaming data.
SQLite
- [ ] expo-sqlite - Has getEachAsync method.
- [ ] bun-sqlite - Built in iterate method.
op-sqlite, turso and d1 don't have any docs about streaming.
@christophemarois Hey, first of thanks so much for your example implementation! I'm curious whether you managed to use this for more complex queries fed into it as well? I'm getting things like "Can't find mapFromDriverValue on undefined" etc. when using subselects and sql`` operations.
@L-Mario564 Do you happen to have a rough idea whether this is something we can hope to see land in drizzle postgres soon? Thank you!
@L-Mario564 Do you happen to have a rough idea whether this is something we can hope to see land in drizzle postgres soon? Thank you!
Currently, this is not that high of a priority, but it will be done at some point. Not much else I can say at the moment.
Been hacking away all day with pg-cursor trying to get the column mapping working when I figured why not just use drizzle and cursor pagination... I got to this point, it seems to be a start.
You need to make sure the column you're using as a cursor is at the lowest level in your object.
Benefit is this should be DB agnostic (assuming you change some of the types)
export async function* createPostgresDrizzleStreamer<
OrderColumn extends PgColumn,
Query extends PgSelect,
Result = Query["_"]["result"][number],
>(
col: OrderColumn,
query: Query,
fetchBatchSize: number = 50
): AsyncGenerator<Result> {
let rows: object[];
let lastSeenOrderCol: unknown;
do {
const r = await query
.orderBy(asc(col))
.where(...[lastSeenOrderCol ? gt(col, lastSeenOrderCol) : undefined])
.limit(fetchBatchSize);
for (const row of r) yield row as Result;
lastSeenOrderCol = r[r.length - 1]?.[col.name];
rows = r;
} while (rows.length > 0);
}
Maybe someone who knows drizzle internals could say if this is decent start?
EDIT: Multiple where doesn't work in Drizzle? #1644 -- maybe a subquery is the next step?
pg alternative to @christophemarois 's code, also works with aliased columns and returns rows rather than chunks of rows. I have not tested it heavily, but for the time being this is what i'm using in my codebase.
issues
- Relies on selectedFields and returned columns to be in the same order (should be fine?) -- I couldn't figure out how to extract absolute column name from pg returned rows. pg and pg-cursors documentation are not only inaccurate but are also incomplete.
- Doesn't work with nested results.
/*
* @example
* const documentsReader = createPostgresDrizzleStreamer(db.select().from(documents).$dynamic(), 2000)
* for await (const documentsChunk of documentsReader) {}
*/
export async function* createPostgresDrizzleStreamer<
Q extends PgSelect,
R = Q["_"]["result"][number],
>(qb: Q, fetchBatchSize: number = 50): AsyncGenerator<R> {
let rows: string[][];
const { sql, params } = qb.toSQL();
using c = await dbConnectionWithCleanup();
const chunkReader = c.connection.query(
new Cursor(sql, params, { rowMode: "array" })
);
try {
while (true) {
rows = await chunkReader.read(fetchBatchSize);
if (rows.length === 0) break;
yield* rows.map(
(row) =>
Object.fromEntries(
Object.keys(qb._.selectedFields).map((schemaKey, index) => [
schemaKey,
qb._.selectedFields[schemaKey].mapFromDriverValue
? qb._.selectedFields[schemaKey].mapFromDriverValue(row[index])
: row[index],
])
) as R
);
}
} finally {
await chunkReader.close();
}
}
edit: been running this in prod for months with no problems.
chatgpt hallucinated this API for me, but I kind of like it
const stream = db
.select({
id: DDUser.id,
clerkUserId: DDUser.clerkUserId,
})
.from(DDUser))
.stream(); // fake api
for await (const user of stream) {
// ...
}
following because I want this feature...for Postgres...
Does the newly added bun-sql driver for PostgreSQL support this? It would be great to use iterators with PostgreSQL.
Does the newly added bun-sql driver for PostgreSQL support this? It would be great to use iterators with PostgreSQL.
I've been using https://github.com/drizzle-team/drizzle-orm/issues/456#issuecomment-2466655794 in prod for months now with no problems. If you'd like bun-sql you will have to adapt one of our examples to your driver.
chatgpt hallucinated this API for me, but I kind of like it
const stream = db .select({ id: DDUser.id, clerkUserId: DDUser.clerkUserId, }) .from(DDUser)) .stream(); // fake api for await (const user of stream) { // ... }following because I want this feature...for Postgres...
Drizzle already has iterator syntax. https://orm.drizzle.team/docs/select#iterator Just a matter of implementing on all pg connectors.
I don't know what the internals of drizzle looks like and I haven't seen anyone mention this yet, but if this feature is to be made GA (MySQL is the only one that has some support for this) it should return a ReadableStream and not be a generator function.
Using the iterator() generator function does not allow easy implementation of backpressure (without something hacky) which arguably defeats the entire reason to stream results from the DB in the first place, to reduce memory usage by not loading the entire result in memory. Without backpressure, overwhelming the GC is trivial and I've experienced it first hand with iterator() trying to stream ~2M rows. The program OOMs and I'm forced to increase my instance's RAM allocation.
The hacky solution as a drizzle user is to just stop reading from the generator periodically to mimic backpressure. However, this is not a proper solution, rather a workaround.
edit: I realized that I can just create a Readable wrapper around the generator to solve the backpressure problem, but I still believe that drizzle should expose a stream interface themselves
Why would an async generator function not handle backpressure? The JS runtime pauses and resumes the generator depending on how often new results are fetched from the consumer, so if you only fetch results slowly, then the generator is not going to fetch any more items into memory than needed (apart from probably some chunk size you should set for performance)
I see this warning with mysql2 driver
Calling conn.end() to release a pooled connection is deprecated. In next version calling conn.end() will be restored to default conn.end() behavior. Use conn.release() instead.
Hey everyone!
I've created this message to send in a batch to all opened issues we have, just because there are a lot of them and I want to update all of you with our current work, why issues are not responded to, and the amount of work that has been done by our team over ~8 months.
I saw a lot of issues with suggestions on how to fix something while we were not responding – so thanks everyone. Also, thanks to everyone patiently waiting for a response from us and continuing to use Drizzle!
We currently have 4 major branches with a lot of work done. Each branch was handled by different devs and teams to make sure we could make all the changes in parallel.
First branch is drizzle-kit rewrite
All of the work can be found on the alternation-engine branch. Here is a PR with the work done: https://github.com/drizzle-team/drizzle-orm/pull/4439
As you can see, it has 167k added lines of code and 67k removed, which means we've completely rewritten the drizzle-kit alternation engine, the way we handle diffs for each dialect, together with expanding our test suite from 600 tests to ~9k test units for all different types of actions you can do with kit. More importantly, we changed the migration folder structure and made commutative migrations, so you won't face complex conflicts on migrations when working in a team.
What's left here:
- We are finishing handling defaults for Postgres, the last being geometry (yes, we fixed the
sridissue here as well). - We are finishing commutative migrations for all dialects.
- We are finishing up the command, so the migration flow will be as simple as
drizzle-kit upfor you.
Where it brings us:
- We are getting drizzle-kit into a new good shape where we can call it
[email protected]!
Timeline:
- We need ~2 weeks to finish all of the above and send this branch to beta for testing.
Second big branch is a complex one with several HUGE updates
- Bringing Relational Queries v2 finally live. We've done a lot of work here to actually make it faster than RQBv1 and much better from a DX point of view. But in implementing it, we had to make another big rewrite, so we completely rewrote the drizzle-orm type system, which made it much simpler and improved type performance by ~21.4x:
(types instantiations for 3300 lines production drizzle schema + 990 lines relations)
TS v5.8.3: 728.8k -> 34.1k
TS v5.9.2: 553.7k -> 25.4k
You can read more about it here.
What's left here:
- We have 1 issue with TS that is already in progress of being fixed. The issue and Post about fixing.
Where it brings us:
- We are getting drizzle-orm into a new good shape where we can call it
[email protected]!
Breaking changes:
- We will have them, but we will have open channels for everyone building on top of drizzle types, so we can guide you through all the changes.
Third branch is adding support for CockroachDB and MSSQL dialects
Support for them is already in the alternation-engine branch and will be available together with the drizzle-kit rewrite.
Summary
All of the work we are doing is crucial and should be done sooner rather than later. We've received a lot of feedback and worked really hard to find the best strategies and decisions for API, DX, architecture, etc., so we can confidently mark it as v1 and be sure we can improve it and remain flexible for all the features you are asking for, while becoming even better for everyone building on top of the drizzle API as well.
We didn't want to stay with some legacy decisions and solutions we had, and instead wanted to shape Drizzle in a way that will be best looking ahead to 2025–2026 trends (v1 will get proper effect support, etc.).
We believe that all of the effort we've put in will boost Drizzle and benefit everyone using it.
Thanks everyone, as we said, we are here to stay for a long time to build a great tool together!
Timelines
We are hoping to get v1 for drizzle in beta this fall and same timeline for latest. Right after that we can go through all of the issues and PRs and resond everyone. v1 for drizzle should close ~70% of all the bug tickets we have, so on beta release we will start marking them as closed!
The backlog says "considered to be implemented" and it's not even in the "prioritized" column. Is there anything one could do to help here or get it prioritized?
I guess everyone has settled for work arounds for now - but TBH they are ugly as hell.