Better "Generic" parser support.
@Belldandu, @typhoon71, @toshiya44, @dreamer2908 Some ideas I've been having/thinking of stealing along the lines of being able to parse arbitrary sites. What do you think?
- On "advanced" settings, have a drop down to allow user to explicitly select the parser to be used.
- Have a set of controls for the "Default" parser, so user can specify how to parse the data using a set of XPATH expressions.
- How to harvest the hyperlinks from the page for the set of chapters.
- Where to get the Story title from on the first page
- Where to get the Author's name from on the first page
- Element holding title of each chapter.
- Element holding the story content on each chapter
- Additional garbage elements to remove from the story content. (e.g. Next/Previous chapter.)
Is XPATH going to work, or is there a better way to handle how to give these rules? (Would allowing javascript as well be a good ides?) Are there any other configuration options I should supply?
- About the "drop down to allow user to explicitly select the parser to be used", I would have it in the main page, not on the "advanced" settings.
As I see it, the addon should guess the right parser to be used, and let the user change it in the dropdown (this would happen more often where the default parser gets chosen by the addon, obv)
- Since you're talking about customizing the default parser... you could go all the way and make something like a config / parser editor, so users will be able to create customized parsers. Those "customized parsers" should be selectable from that dropdown in point 1.
The main problem I we with this is that I don't think you can actually create new parsers and store them, so it would be limited to configs for the default parser.
Ok as an expert on Databases i would think of saving the parser settings to a file or database now i dont know if we can use db file or json file to read and write data via javascript to a file, but if its possible then we could add a new tab that contain all parser and its settings. with this we would be able to delete the parser folder and only have a generic parser that read from the db/json file for the specific site.
if we use the db file then something like this table will be created
Parsers contain
1- SiteUrl/Name eg https://m.wuxiaworld.co Or m.wuxiaworld.co to identify the parser site
ParserSettings
1- TitleCss eg css path for title
2- AutherCss eg path for the auther
-- etc
and lastly
Parser_Id eg a forgenkey to the parser table
We could also use jsonFile instead
Now on GUI/Design
We could have a tab that is called Parsers that present all the availble parser with the ability to edit, delete or create new parser.
we could also add button to restore to default in case an idiot makes to many mistakes.
Take into account that the db File or the json file should be local and not on server so we dont need to have hosting service.
On my part i could defiantly help if @dteviot assign me to something, i really have to much time to kill so just ask 👍
@AlenToma You're welcome to have a go at any of the issues. Note, most of them are still around because they're not easy. So might pay to ask if I can remember anything about them before you start.
As regards your suggestion of putting config in a file. Sorry, doesn't work. There's a number of sites (lnmtl immediately comes to mind) where the content isn't in the page. Instead they make REST calls to get the content, and populate the HTML that way. So, you need javascript logic to get the content. And, Mozilla and Google get VERY upset when an extension uses exec() or similar to compile code dynamically. Because it bypasses their checks, and so is used by malware extensions.
Additional, reading this, it seems it's describing the "Default" parser, This has been implemented.
I see your point. Well it was worth a try.
I well very much like to help, i will ask before i get started with somthing