rdrview icon indicating copy to clipboard operation
rdrview copied to clipboard

Not working with NixCraft

Open simonhughxyz opened this issue 2 years ago • 5 comments

rdrview does not work with NixCraft articles (https://www.cyberciti.biz/). Produces this output: rdrview: no content could be extracted.

It would be nice to be able to read NixCraft articles on my terminal.

simonhughxyz avatar Apr 07 '23 11:04 simonhughxyz

I'm using a function to download the articles first and then passing them to rdrview, something like this:

function rdr {
  readonly u=${1:?"The url must be specified."}
  curl -A "Mozilla Firefox" -sL "$u" | rdrview -B lynx --disable-sandbox
}

AbeEstrada avatar Apr 07 '23 17:04 AbeEstrada

I also found that cyberciti.biz is using CloudFlare and you need to pass a JavaScript challenge before loading the content:

<!DOCTYPE html>
<html lang="en-US">
<head>
    <title>Just a moment...</title>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
    <meta http-equiv="X-UA-Compatible" content="IE=Edge">
    <meta name="robots" content="noindex,nofollow">
    <meta name="viewport" content="width=device-width,initial-scale=1">
    <link href="/cdn-cgi/styles/challenges.css" rel="stylesheet">


</head>
<body class="no-js">
    <div class="main-wrapper" role="main">
    <div class="main-content">
        <noscript>
            <div id="challenge-error-title">
                <div class="h2">
                    <span class="icon-wrapper">
                        <div class="heading-icon warning-icon"></div>
                    </span>
                    <span id="challenge-error-text">
                        Enable JavaScript and cookies to continue
                    </span>
                </div>
            </div>
        </noscript>

AbeEstrada avatar Apr 07 '23 17:04 AbeEstrada

curl -A "Mozilla Firefox" -sL "$u" | rdrview -B lynx --disable-sandbox

This does not work. rdrview still cant extract content

simonhughxyz avatar Apr 07 '23 17:04 simonhughxyz

I also found that cyberciti.biz is using CloudFlare and you need to pass a JavaScript challenge before loading the content:

Is there a way to pass the CloudFlare challenge? Or at least circumvent it?

simonhughxyz avatar Apr 07 '23 17:04 simonhughxyz

I found a workaround to cloudflare, I used curl-impersonate and that seems to work.

simonhughxyz avatar Apr 07 '23 18:04 simonhughxyz

Sorry for the long delay, but I guess you fixed this yourself and there's nothing much for me to say here.

eafer avatar Mar 02 '24 22:03 eafer