Provide a way to pass in the custom parser during add_resource so it can be used in stream_remote_resources
In order to submit an issue, please ensure you can check the following. Thanks!
- [3.6 ] Declare which version of Python you are using (
python --version) - [Ubuntu ] Declare which operating system you are using
Currently I have to create a sseparate task in order to use a custom parser like this:
https://github.com/strets123/frictionless-pres/blob/master/smdataproject/stream_remote_resources_custom.py
This breaks pep8
Hey @strets123 - can you explain your use case here?
I would like to be able to make datapackage pipelines connecting to many disparate JSON, XML and HTML data sources. Often this requires changes to custom parsers of tabulator but I cannot then easily re-use the stream_remote_resources code. I do not want to copy and paste a whole module just to change one line. Therefore I resorted to the above hack of importing the dpp mopdule at the abottom of the file.
In the above case I had a JSON parser for the JSON API spec that also did pagination. This follows a similar pattern to the SQL data parser. I also have a similar one for SPARQL endpoints.
To frame the issue in another way problem when reusing code from the dpp project is that the CSV dump code is class based and can be easily overridden but the stream_remote_resources module has import time logic making its re-use difficult.
If there is a desire to retain the simplicity of functional approach for dpp modules, might it be possible to have a magic "run" function. This would retain backwards compatibility but if users wanted they could put their import time logic in the run function instead.
This would allow users to import specific functions and override others without a full class-based approach.