Should not visit pages that have already been visited
How can I make it not visit the same page multiple times?
How can I make it so that it doesn't visit any pages outside of the domain?
Also when I ran it with a memo, I got an error eventually
.../gems/ruby-3.1.2/gems/ferrum-0.11/lib/ferrum/browser/web_socket.rb:19:in `initialize': Too many open files - socket(2) for "127.0.0.1" port 65073 (Errno::EMFILE)
Also when I ran it with a memo, I got an error eventually
.../gems/ruby-3.1.2/gems/ferrum-0.11/lib/ferrum/browser/web_socket.rb:19:in `initialize': Too many open files - socket(2) for "127.0.0.1" port 65073 (Errno::EMFILE)
I think you should tune your OS for example for Linux
As for the issue I have a plan to intro an option for request but unfortunately it won't work for all the websites. So it's going to be very optional.
Also when I ran it with a memo, I got an error eventually
.../gems/ruby-3.1.2/gems/ferrum-0.11/lib/ferrum/browser/web_socket.rb:19:in `initialize': Too many open files - socket(2) for "127.0.0.1" port 65073 (Errno::EMFILE)I think you should tune your OS for example for Linux
What is the root cause for this? It seems to me that while opening a TCP Socket connection, ferrum opens a file but never closes it? Shouldn't this not happen since the number of pages being processed at once is at most the number of processors (unless overridden).
Ferrum opens only one connection per page and closes it when page is processed releasing the page and connection. So something is wrong with the crawler most likely