deadlock detected
I went to run specs on my app locally, and I got a ton of deadlock errors:
Unhandled exception in spawn: deadlock detected (PQ::PQError)
from lib/pg/src/pq/connection.cr:203:7 in 'handle_error'
from lib/pg/src/pq/connection.cr:186:7 in 'handle_async_frames'
from lib/pg/src/pq/connection.cr:162:7 in 'read'
from lib/pg/src/pq/connection.cr:414:18 in 'expect_frame'
from lib/pg/src/pq/connection.cr:398:9 in 'read_next_row_start'
from lib/pg/src/pg/result_set.cr:39:8 in 'move_next'
from lib/db/src/db/result_set.cr:39:13 in 'from_rs'
from lib/avram/src/avram/save_operation.cr:367:17 in 'insert'
from lib/avram/src/avram/save_operation.cr:349:7 in 'insert_or_update'
from lib/avram/src/avram/save_operation.cr:297:9 in 'save'
from lib/avram/src/avram/save_operation.cr:321:8 in 'save!'
from lib/breeze/src/breeze/operations/save_breeze_sql_statement.cr:1:1 in 'create!:breeze_request_id:statement:args:model:elapsed_text'
from lib/breeze/src/breeze.cr:14:3 in '->'
from /usr/local/Cellar/crystal/0.36.1_2/src/primitives.cr:255:3 in 'run'
from /usr/local/Cellar/crystal/0.36.1_2/src/fiber.cr:92:34 in '->'
I have it set to only enable while in development, so in theory breeze should be skipped during tests... I'll try to dig in more to see what specifically is causing this.
Ok, my issue may not actually be breeze related directly... However, it does worry me that it was so easy to his this error. I'll leave it open for now so we can track it. Maybe someone can think of a fix.
Ok, just ran in to something similar and I was doing something completely different here:
Unhandled exception in spawn: (DB::ConnectionRefused)
from Exception::CallStack::unwind:Array(Pointer(Void))
from Exception::CallStack#initialize:Array(Pointer(Void))
from Exception::CallStack::new:Exception::CallStack
from raise<DB::ConnectionRefused>:NoReturn
from PG::Connection#initialize<DB::Database>:Bool
from PG::Connection::new<DB::Database>:PG::Connection
from PG::Driver#build_connection<DB::Database>:PG::Connection
from ~procProc(DB::Connection)@lib/db/src/db/database.cr:56
from DB::Pool(DB::Connection+)@DB::Pool(T)#build_resource:DB::Connection+
from DB::Pool(DB::Connection+)@DB::Pool(T)#checkout:DB::Connection+
from DB::Database#checkout:DB::Connection+
from Breeze::SaveBreezeSqlStatement@Avram::SaveOperation(T)#save:Bool
from Breeze::SaveBreezeSqlStatement@Avram::SaveOperation(T)#save!:Breeze::BreezeSqlStatement
from Breeze::SaveBreezeSqlStatement::create!:breeze_request_id:statement:args:model:elapsed_text<(Int64 | Nil), String, (String | Nil), (String | Nil), String>:Breeze::BreezeSqlStatement
from ~procProc(Nil)@lib/breeze/src/breeze.cr:14
from Fiber#run:(IO::FileDescriptor | Nil)
from ~proc2Proc(Fiber, (IO::FileDescriptor | Nil))@/usr/local/Cellar/crystal/0.36.1_2/src/fiber.cr:92
Caused by: no PostgreSQL user name specified in startup packet (PQ::PQError)
from Exception::CallStack::unwind:Array(Pointer(Void))
from Exception::CallStack#initialize:Array(Pointer(Void))
from Exception::CallStack::new:Exception::CallStack
from raise<PQ::PQError>:NoReturn
from PQ::Connection#handle_error<PQ::Frame::ErrorResponse>:NoReturn
from PQ::Connection#handle_async_frames<(PQ::Frame+ | PQ::Frame::Unknown)>:Bool
from PQ::Connection#read<(Char | Nil)>:(PQ::Frame+ | PQ::Frame::Unknown)
from PQ::Connection#read:(PQ::Frame+ | PQ::Frame::Unknown)
from PQ::Connection#expect_frame<PQ::Frame::Authentication.class, Nil>:PQ::Frame::Authentication
from PQ::Connection#expect_frame<PQ::Frame::Authentication.class>:PQ::Frame::Authentication
from PQ::Connection#connect:Bool
from PG::Connection#initialize<DB::Database>:Bool
from PG::Connection::new<DB::Database>:PG::Connection
from PG::Driver#build_connection<DB::Database>:PG::Connection
from ~procProc(DB::Connection)@lib/db/src/db/database.cr:56
from DB::Pool(DB::Connection+)@DB::Pool(T)#build_reUnhandled exception in spawn: (DB::ConnectionRefused)
from Exception::CallStack::unwind:Array(Pointer(Void))
from Exception::CallStack#initialize:Array(Pointer(Void))
from Exception::CallStack::new:Exception::CallStack
from raise<DB::ConnectionRefused>:NoReturn
from PG::Connection#initialize<DB::Database>:Bool
from PG::Connection::new<DB::Database>:PG::Connection
from PG::Driver#build_connection<DB::Database>:PG::Connection
from ~procProc(DB::Connection)@lib/db/src/db/database.cr:56
from DB::Pool(DB::Connection+)@DB::Pool(T)#build_resource:DB::Connection+
from DB::Pool(DB::Connection+)@DB::Pool(T)#checkout:DB::Connection+
from DB::Database#checkout:DB::Connection+
from Breeze::SaveBreezeSqlStatement@Avram::SaveOperation(T)#save:Bool
from Breeze::SaveBreezeSqlStatement@Avram::SaveOperation(T)#save!:Breeze::BreezeSqlStatement
from Breeze::SaveBreezeSqlStatement::create!:breeze_request_id:statement:args:model:elapsed_text<(Int64 | Nil), String, (String | Nil), (String | Nil), String>:Breeze::BreezeSqlStatement
from ~procProc(Nil)@lib/breeze/src/breeze.cr:14
from Fiber#run:(IO::FileDescriptor | Nil)
from ~proc2Proc(Fiber, (IO::FileDescriptor | Nil))@/usr/local/Cellar/crystal/0.36.1_2/src/fiber.cr:92
It seems it's pretty easy to hit DB errors with this. In this case I was calling this code:
File.read_lines(filename).each do |domain|
SaveRestrictedDomain.create(text: domain.strip) do |_o, _d|
# ignore if it fails
end
end
This code was in a task that I was running locally, and filename is a file with about 120,000 lines in it. Looks like it's coming from this file. It doesn't really matter where the subscribe is, it's basically global and will run from anywhere once it's been defined. If I'm blasting my database, and it has to run this block on everyone, I'm assuming that the threads are just backing up and dogpiling because I'm pushing queries faster than this block can run. Too many spawn calls...
The original post was caused from me running specs where I was essentially doing the same. I was pushing more queries than what could be handled. These are probably edge cases, but the fact that they are preventing me from doing what I need to is an issue.
For now, I was able to get around the first part because when I ran specs earlier, it thought I was in development. For this case I am in development, but we have a Lucky::Env.task? method that I can use to disable breeze in tasks.
So are you saying that you think it's because we are saving things to the database using spawn?
https://github.com/luckyframework/breeze/blob/5a98f72cd0d5282b6aeb71d1e02fe70ae7c06456/src/breeze/actions/mixins/action_helpers.cr#L16-L25
That's my assumption, yeah. It seems like It's kicking off more spawns than doing saves, and they are piling up. I guess since it's easy to reproduce, I can take them out of the spawn to see if it still does it. I'll give that a shot tomorrow and see if that makes a difference.
Here's at least one thing we connect to the current fiber https://github.com/luckyframework/avram/blob/5e29f75371dca5a2bc16858163dc3790cabfbfcb/src/avram/database.cr#L6
I'm not too familiar with doing multi-threading stuff. I wonder if that holds us back in some way by restricting to a single thread 🤔