ferrum icon indicating copy to clipboard operation
ferrum copied to clipboard

Is it possible to configure this with a remote service like Browserless.io?

Open jklina opened this issue 4 years ago • 6 comments

I was curious if it was possible to configure the protocol to work with a remote service rather than a local instance of Chrome. Thanks so much!

jklina avatar Nov 12 '21 02:11 jklina

@jklina There's :url https://github.com/rubycdp/ferrum#customization it should be like browserWSEndpoint for browserless. Try it and let us know!

route avatar Nov 12 '21 12:11 route

Thanks for the tip! I had tried that, but I think it expects an http URL, not the wss url:

irb(main):006:0> b = Ferrum::Browser.new(url: "wss://chrome.browserless.io?token=sometoken")          
Traceback (most recent call last):
       16: from (irb):6:in `rescue in irb_binding'
       15: from (irb):6:in `new'
       14: from /home/joshklina/.asdf/installs/ruby/2.7.2/lib/ruby/gems/2.7.0/gems/ferrum-0.11/lib/ferrum/browser.rb:63:in `initialize'
       13: from /home/joshklina/.asdf/installs/ruby/2.7.2/lib/ruby/gems/2.7.0/gems/ferrum-0.11/lib/ferrum/browser.rb:125:in `start'
       12: from /home/joshklina/.asdf/installs/ruby/2.7.2/lib/ruby/gems/2.7.0/gems/ferrum-0.11/lib/ferrum/browser/process.rb:30:in `start'
       11: from /home/joshklina/.asdf/installs/ruby/2.7.2/lib/ruby/gems/2.7.0/gems/ferrum-0.11/lib/ferrum/browser/process.rb:30:in `new'
       10: from /home/joshklina/.asdf/installs/ruby/2.7.2/lib/ruby/gems/2.7.0/gems/ferrum-0.11/lib/ferrum/browser/process.rb:61:in `initialize'
        9: from /home/joshklina/.asdf/installs/ruby/2.7.2/lib/ruby/2.7.0/net/http.rb:458:in `get'
        8: from /home/joshklina/.asdf/installs/ruby/2.7.2/lib/ruby/2.7.0/net/http.rb:481:in `get_response'
        7: from /home/joshklina/.asdf/installs/ruby/2.7.2/lib/ruby/2.7.0/net/http.rb:606:in `start'
        6: from /home/joshklina/.asdf/installs/ruby/2.7.2/lib/ruby/2.7.0/net/http.rb:933:in `start'
        5: from /home/joshklina/.asdf/installs/ruby/2.7.2/lib/ruby/2.7.0/net/http.rb:483:in `block in get_response'
        4: from /home/joshklina/.asdf/installs/ruby/2.7.2/lib/ruby/2.7.0/net/http.rb:1393:in `request_get'
        3: from /home/joshklina/.asdf/installs/ruby/2.7.2/lib/ruby/2.7.0/net/http.rb:1393:in `new'
        2: from /home/joshklina/.asdf/installs/ruby/2.7.2/lib/ruby/2.7.0/net/http/request.rb:15:in `initialize'
        1: from /home/joshklina/.asdf/installs/ruby/2.7.2/lib/ruby/2.7.0/net/http/generic_request.rb:17:in `initialize'
ArgumentError (not an HTTP URI)
irb(main):007:0> 

It looks like the library uses an http url to get the wss url. It seems the host and port options might be what I'm looking for, but they seem to be ignored. I'll keep poking around in the mean time!

jklina avatar Nov 12 '21 12:11 jklina

Oh I'm afraid that it currently is not possible because we expect url for the browser rather than ws. Take a look at https://github.com/rubycdp/ferrum/blob/master/lib/ferrum/browser/process.rb#L67 I think we might also accept ws_url. Just try to fix it locally and if it works we can merge a PR

route avatar Nov 12 '21 13:11 route

We were able to use remote Chrome in a separate Docker container (from Browserless' images) with Cuprite by passing the URL in to the url option when configuring the driver.

geoffharcourt avatar Dec 02 '21 18:12 geoffharcourt

It looks like this might be possible via this section that mentions via host and port in their documentation, I don't currently use Browserless though.

nickhammond avatar Dec 14 '21 20:12 nickhammond

I was also using capybara + cuprite + ferrum with browserless.io docker image. It worked, just had to specify browser url for ferrum (cuprite) driver.

ktimothy avatar Aug 17 '22 11:08 ktimothy

Self-hosted docker images from Browserless do not experience the problem the OP is describing. I spent some time on this today and found myself out of my depth, but here's what I know:

  • Browserless requires SSL connections, Ferrum enforces HTTP in Ferrum::Browser::Process#parse_browser_versions and also builds a non-SSL socket in Ferrum::Browser::WebSocket#initialize

  • Both HTTPS and WSS connections to Browserless require you to pass a query param with your API token, which is stripped out during Ferrum::Browser::Process#initialize, in Ferrum::Browser::Process#parse_browser_versions, and various other places.

I tried hacking something together but this is out of my nerd comfort zone. Hope this helps someone else figure it out.

dhnaranjo avatar Nov 09 '22 19:11 dhnaranjo

I'll implement this when I find time, but the main issue now is that browserless uses only one connection, but ferrum uses many (one per browser and one per page) which seemed as a good design decision to me at the moment and even now.

route avatar Nov 10 '22 05:11 route

For now I'm just managing my own lil Chrome services, but I will definitely appreciate it when I can turn an infrastructure problem into a money problem.

Ferrum already simplifies things plenty by letting me skip Puppetteer. I appreciate y'alls work.

dhnaranjo avatar Nov 10 '22 15:11 dhnaranjo

First of all thanks to @route for this amazing library, i'vve been using it for years and the api is great.

I have come to this issue after trying to connect ferrum with a chrome instance provided by a third-party provider (Brighdata 'Scraping Browser' product)

I have modified some methods to include basic authentication in the first communication with the browser (json/version) and add OpenSSL socket to open a connection with their websockers (I don't know why the TCP connection did not work).

After getting the WebSockets connection working, it returns an exception when trying to enable the page (by calling the "Page.enable" method)

I have tried to compare the commands that puppeteer sends via WebSockets and I have tried to patch the ferrum code to call the same methods in the same order (using sessionId in addition to contextId and targetId), but also when calling "Page.enable", the call times out and raises DeadBrowserError exception.

I would love to understand how ferrum uses many connections to try to work with third party providers, I could even take charge of developing it and add a pull request, but I am lost on how to do this.

borlafdev avatar Sep 14 '23 14:09 borlafdev

@borlafdev I think the difference is that for every page Ferrum opens up a new connection, which is not the case for Puppeteer. To multiplex and use the same WebSocket they used to call something like sendMessageToTarget https://chromedevtools.github.io/devtools-protocol/tot/Target/#method-sendMessageToTarget which is now deprecated. Instead of using a dedicated WebSocket we might do the same thing they do with sessions, but I'm afraid it requires RND and things like Target.setAutoAttach or Target.autoAttachRelated to start using sessions.

but also when calling "Page.enable", the call times out and raises DeadBrowserError exception.

I think this is due to the page not being available on the websocket because it's behind the service.

route avatar Sep 14 '23 14:09 route

Both HTTPS and WSS connections to Browserless require you to pass a query param with your API token, which is stripped out during Ferrum::Browser::Process#initialize, in Ferrum::Browser::Process#parse_browser_versions, and various other places.

Is this still the case? I am struggling to get Ferrum to pass on the query/url params, it seems to be stripping them..?

Aubermean avatar Feb 22 '24 14:02 Aubermean