Bypassing scrape protection on etherscan
Recently they have deployed Cloudflare script that returns 403 if you are accessing the website from scripts.
This is such a dick move from Etherscan, but it doesn't really matter as Etherscan scraping has been disabled for a while with the current Etherface deployment. That being said, I'd be happy to accept a PR for this issue if you're interested in working on this.
Do you have any recommendations on where to begin @volsa ? On the top of my head this situation could be handled with Selenium. Not sure if there's a workaround for rust.
Yeah, Selenium was the first solution that popped into my mind. The other was embedding Python code using PyO3 to use cloudscraper because no such Rust libraries exist, but I'm not sure if the library is even working atm. Long-term, Selenium is probably the better solution though.
I did some quick research to see if this can be accomplished in Rust using ChromeDriver, and it kind of works. Key findings were:
- The ChromeDriver has to be patched before it can be used because CloudFlare otherwise blocks the request. To do that download https://chromedriver.storage.googleapis.com/index.html?path=112.0.5615.49/ then apply the following https://github.com/ultrafunkamsterdam/undetected-chromedriver/blob/bf7dcf8b5713020de7454844fb80036b8c456503/undetected_chromedriver/patcher.py#L217-L239
- Flags
--disable-blink-featuresand--disable-blink-features=AutomationControlledmust be set; haven't tested if either one alone is sufficient but should be? - (MacOS ARM only) Patching the ARM ChromeDriver will result in panics, thus the x86_64 version is needed using Rosetta
Calling the following code using the fantoccini library should then bypass the CF protection.
use fantoccini::{ClientBuilder, Locator};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let mut caps = serde_json::map::Map::new();
caps.insert(
"goog:chromeOptions".to_string(),
serde_json::json!({
"args": [
// "--headless=new",
"--disable-blink-features",
"--disable-blink-features=AutomationControlled",
]}
),
);
let client = ClientBuilder::native().capabilities(caps).connect("http://localhost:4444").await?;
client.goto("https://etherscan.io/contractsVerified").await?;
let res = client.wait().for_element(Locator::Css("#content > section.container-xxl.pt-5.pb-12")).await?;
let html = res.html(true).await.unwrap();
println!("{html}");
Ok(())
}
https://user-images.githubusercontent.com/29666622/233849890-57bd2463-0079-46d9-b945-c4101e346ca2.mov
Ideally this can be merged with https://github.com/volsa/etherface/blob/master/etherface-lib/src/api/etherscan.rs