beast2 Knowledge-based package manager

@alexeid @rbouckaert @tgvaughan Alexei and I are proposing a knowledge-based package manager, and I think you should join us. The current package manager is still not informative, which does not provide enough information about features and models. We already have the features table in beast2.org, but they are not synchronised each other.

The idea is to eventually build up a system that retrieves those information from packages (e.g. CABN and version.xml) and displays them in the website (e.g. the new features table), and also takes over the management function (maybe plus submission of packages?).

This is complex, so I split into 4 stages:

Create a Jekyll framework to generate BEAST features table.

We now use the *.md files to contain the information, which are used to generate table rows. This is mostly done, but I still need to correct some data. You can have a look at http://127.0.0.1:4000/features/. The "rule" is declared in the first paragraph in that web page.

Auto-update BEAST features.

We are proposing to create a (python) script initially, which reads CABN and version.xml to create *.md files. I am going to start it manually and then make it automatically (using https://git-scm.com/book/en/v2/Customizing-Git-Git-Hooks ?).

Introduce new entries to CABN and version.xml.

Any idea what are essential besides those we already had and I listed below?

Java version (#769)

The entries below are required by the features table, which could be multiple values in one package and could be under < features > tag.

url_beast2_imp
label_beast2_imp
pr_beast2_imp
url_theory
label_theory
url_source
label_source
url_example_xml

Replace Java package manager.

Apr 11 '18 00:04 walterxie

While I like these ideas, I wonder whether it might be more useful (while requiring probably a similar amount of effort) to set up a proper package repository database+server? I think static page generators like Jekyll aren't really the best fit for this: you really do want something dynamic that's backed by a real database and that can automatically field submissions from third-party package developers.

(I actually developed a proof-of-concept repository app using the python Django framework a while ago: https://github.com/tgvaughan/BeastPackageRepository)

Apr 12 '18 09:04 tgvaughan

@tgvaughan The only tangible benefit of a database mentioned is automatic updating of fields when the package is updated, but having some script run, say, once a day on flat files will lead to a lag of at most 24 hour in updating. Not having to maintain a database by having all info in flat files and thus making it easily accessible to any other technology pleads for leaving out a database. Perhaps I am missing some other benefits of employing a database for what to me seems a rather straightforward application?

Apr 12 '18 18:04 rbouckaert

Hey Remco, a few thoughts: It just feels to me like we're basically hacking together a database ourselves based on our custom packages.xml file format. That file still needs to be maintained by hand and modified by hand by one of us whenever new packages are submitted. Yes we can accept PRs from outside developers, but we have to by eye check these over to ensure the changes don't break anything. If we turned this all upside down and use a proper database managed by a web app that automatically handles submissions, these problems go away. Furthermore, once we have an automatic submission system it becomes feasible to actually store the package binaries (and source, etc etc) centrally - which is really the only way we can make it possible to always have older versions of packages available. (Not everybody uses github.)

Just my 2c, I'm not opposed to the jekyll+python solution, just wondering whether there's maybe a better way.

On Thu, 12 Apr 2018 at 20:27 Remco Bouckaert [email protected] wrote:

@tgvaughan https://github.com/tgvaughan The only tangible benefit of a database mentioned is automatic updating of fields when the package is updated, but having some script run, say, once a day on flat files will lead to a lag of at most 24 hour in updating. Not having to maintain a database by having all info in flat files and thus making it easily accessible to any other technology pleads for leaving out a database. Perhaps I am missing some other benefits of employing a database for what to me seems a rather straightforward application?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/CompEvol/beast2/issues/770#issuecomment-380901193, or mute the thread https://github.com/notifications/unsubscribe-auth/AAfSGrNRYAIuNcvK0OjCMYieWA0Jd9_cks5tn5ypgaJpZM4TPMln .

Apr 12 '18 19:04 tgvaughan

Hi, Tim, the 3-tier (web+server+database) solution has some advantages comparing what we are doing now, but it will cost a lot of efforts to develop and maintain. We only have about 30 packages at the moment. It may be worth to do when the number is doubled.

Apr 13 '18 00:04 walterxie

Thanks for your thoughts, Tim! Not quite convinced yet whether we need a full blown DBMS, so here are some alternatives; I agree having a central place to manage package binaries would be good to ensure old packages don't disappear. Based on the CBAN packages file, we could perhaps centrally store them in a github repo? A cron script could check for changes on a regular basis and notify if packages go missing.

As far as package validation is concerned, at the moment there is a bit more going on than just XML format checking, since especially new developers tend to misplace some things in the zip file, the format of that package zip file also need checking + only published methods are allowed in the main repository, so that needs checking as well. The first two can be automated, the latter requires a bit of human intervention. I think currently, this process is manual and quite manageable. Having a HTML form (hosted on beast2.org) with a mailto option that can be handled via a script can help with the first two. I think that should be sufficient for the next few years.

Apr 13 '18 03:04 rbouckaert

If you both are looking for a central place to manage package binaries, I think you'd better to look at Zenodo first. You can retain every version of release including binaries, and you have doi to cite. For example, I did this for codon subs model: https://zenodo.org/record/1217169#.WtAmVy-B0Qk (login may be required)

Apr 13 '18 03:04 walterxie

@rbouckaert Fair enough, as long as it's manageable and mostly automated then it should be okay I guess. Regarding validation, I wasn't just meaning XML syntax checking - checking of ZIP files and version.xml structure is of course also easy to automate if you have a server, I was even doing this in my Django mock-up.

Btw, just curious: when did the published methods only rule get introduced? I didn't know about this. What does this mean for utility packages that don't provide a model/algorithm? Or packages that aren't published yet, but are under review?

@walterxie Zendo looks intreresting, thanks!

Apr 13 '18 08:04 tgvaughan

@tgvaughan the idea to only have published packages in the main repository is to not clutter the main repository too much -- if it would be better organised (e.g. by having packages grouped by function and marked published/experimental) that would solve some of this.

So for now, existing packages can stay, packages in development or under review should go into a separate package repository -- I put mine in packages-extra.xml on CBAN.

If a published package requires a utility package, both of them should be in the main repository. If no published package requires it, but it will be required soon, it is a bit of a grey area. What utility packages do you have in mind?

Apr 15 '18 20:04 rbouckaert

@tgvaughan Python is not easy to maintain, while BEAST is still in Java. I created simple Javascript (your favourite ?:-)) UI in CompEvol/CBAN#30. You are welcome to improve it if you are interested.

May 16 '18 03:05 walterxie

@walterxie Nice! Looks good.

(BTW I don't know what you mean by "python is not easy to maintain". Perhaps you're talking about the difficulty of deploying python desktop applications? I completely agree, the package system is a mess. However, using an established python framework to serve a website is completely different.)

May 16 '18 07:05 tgvaughan