corpuscrawler icon indicating copy to clipboard operation
corpuscrawler copied to clipboard

Shorten project structure

Open hugolpz opened this issue 4 years ago • 3 comments

Related to #80. Suggestion. Mainly, move the core codes up so it is more visible. The crawlers are kept into their own folder.

  • [ ] Reoganize project structure from :
corpuscrawler
├─ README.md
├─ LICENSE
├─ LICENSE.md
├─ CONTRIBUTING.md
├─ corpuscrawler
└─ Lib
   └─ corpuscrawler
      ├─ *.py : utilities
      └─ crawl_{iso}.py : crawlers

to

corpuscrawler
├─ README.md
├─ LICENSE
├─ LICENSE.md
├─ CONTRIBUTING.md
├─ corpuscrawler
├─ *.py : utilities
└─ crawlers
   └─ crawl_{iso}.py : crawlers

Would such changes disturb some complementary toolchain ?

hugolpz avatar Mar 01 '21 17:03 hugolpz

Hello @sffc . I noticed you made some py change https://github.com/google/corpuscrawler/commit/10adaecf4ed5a7d0557c8e692c186023746eb001 and are active on this project, so allow me to cc you on this minor issue.

hugolpz avatar Feb 15 '24 10:02 hugolpz

The project is currently structured as a PIP module, and it should stay a PIP module. However I would support reorganizing the utilities and crawlers into separate directories, but more along the lines of:

corpuscrawler
├─ README.md
├─ LICENSE
├─ LICENSE.md
├─ CONTRIBUTING.md
├─ corpuscrawler
└─ Lib
   └─ corpuscrawler
      ├─ util
      |   └─ *.py: utilities
      └─ crawlers
          └─crawl_{iso}.py : crawlers

sffc avatar Feb 26 '24 18:02 sffc

This would add clarity yes. This current project lacks clear on-boarding manuals and pointers. A clean structure splitting the few utils from the 1000+ crawlers files would be an improvement for clarity and on-boarding.

hugolpz avatar Feb 27 '24 09:02 hugolpz