php-article-extractor
php-article-extractor copied to clipboard
A PHP library to extract article text from web pages
I've added php-article-extractor via `"crscheid/php-article-extractor": "2.5.1"` and also have `"fivefilters/readability.php": "dev-master",` in my composer.json I am getting - Root composer.json requires crscheid/php-article-extractor 2.5.1 -> satisfiable by crscheid/php-article-extractor[2.5.1]. - crscheid/php-article-extractor 2.5.1...
This occurs in version 2.4, not 2.3, which has the upgraded html parser. Needs investigation. Example site: https://newsroom.bmo.com/2021-07-21-BMO-Congratulates-the-Milwaukee-Bucks-on-Winning-the-2021-NBA-Championship ``` [2021-07-21 13:14:48] production.ERROR: Call to a member function name() on string...
Example: https://www.businesstimes.com.sg/life-culture/olympics-first-covid-case-found-at-athletes-village-stoking-fears-ahead-of-games
https://arstechnica.com/?p=1700858 should resolve to https://arstechnica.com/gaming/2020/08/microsoft-backs-epic-against-apple-in-legal-fight-over-unreal-engine-on-ios/ as it does in a browser, but it does not.
It would be nice to have the ability to detect ahead of time and read PDF content into text. Library should be able to follow redirect links and determine when...
Occurs with this piece of content: https://www.marketscreener.com/news/Reserve-Bank-of-Fiji-Quarterly-Review-March-2020--30599543/
* https://markets.businessinsider.com/news/stocks/western-union-and-the-western-union-foundation-expand-funding-for-global-covid-19-relief-1029134457 * http://www.digitaljournal.com/pr/4663553
For example: https://www.hln.be/nieuws/binnenland/prins-laurent-in-beroep-tegen-dotatiesanctie-opgelegd-door-regering~a81f63c8/ Has a pre-screen to accept cookies, so it's trying to parse that, rather than the actually article. Can anything be done against this?
Temporary redirect 307 ``` Greylift:php-html-parser cscheide$ more /tmp/out.txt 307 Temporary Redirect 307 Temporary Redirect nginx ``` https://www.bloomberg.com/news/articles/2018-07-12/jpmorgan-wells-fargo-may-go-back-to-basics-with-loans-in-focus
https://tech.slashdot.org/story/17/04/30/1821230/e-commerce-is-clogging-city-streets-with-delivery-trucks