deep-rules icon indicating copy to clipboard operation
deep-rules copied to clipboard

Outstanding issues not specific to any tips

Open SiminaB opened this issue 5 years ago • 10 comments

This is to discuss any issues that we may think are not currently adequately covered. If they relate to specific tips, use #242 #243 #244 #245 #246 #247 #248 #249 #250 #251

SiminaB avatar Oct 07 '20 18:10 SiminaB

In re-rereading this, there are 2 issues that I thought about that we may want to cover. At minimum, I think many people reading this paper will expect them to be covered. I think they can be included in the Intro or Conclusion or as part of existing tips:

  1. How does one go about fitting these models and is special software always required? We can at least give some good references for how to do this and note the main packages and computational requirements. I know this isn't a "getting started with DL" paper, but we can still spend 2-3 sentences on it.
  2. Can DL be inadvertently used to perpetuate existing stereotypes eg racist and sexist ones? We know this can happen either because of the training set (eg training set consists exclusively of individuals of European descent, then model is used on a more diverse population) or because of the predictions are incorrectly interpreted due to confounding (eg the training set has doctors and nurses and most doctors are men and most nurses are women, therefore going forward gender is either explicitly or implicitly used to play an outsized role in predicting career choice.) The paper focuses on biology, so perhaps one good example would be the performance of face recognition approaches on individuals of European vs. non-European descent.

SiminaB avatar Oct 07 '20 18:10 SiminaB

Some thoughts in response:

  1. We should mention them as well as mention using auto-ML tools like TPOT.
  2. DL fairness should probably be mentioned in the interpretation or privacy tips. Which place do you think is better?

Benjamin-Lee avatar Oct 08 '20 14:10 Benjamin-Lee

We could change Tip 10 to be about ethics I guess? That way both fairness and privacy would fit.

SiminaB avatar Oct 08 '20 15:10 SiminaB

@SiminaB I just addressed your first point in the PR for #241. Specifically, I mentioned TF and PyTorch as well as Keras, AutoKeras, Turi Create, and TPOT. If there are any other tools you think are worth mentioning, do let me know.

Benjamin-Lee avatar Oct 11 '20 00:10 Benjamin-Lee

Looks good! One question as someone who doesn't use DL in research - can you actually run meaningful DL models on a laptop? The implication is that it would be hard to do so, eg in:

In contrast, traditional ML training can often be done on a laptop (or even a $5 computer [@arxiv:1809.00238]) in seconds to minutes.

SiminaB avatar Oct 11 '20 02:10 SiminaB

It's doable in some cases but not really ideal. In my experience, I've always ended up having to use a cloud machine for training all but the simplest models. I've never done transfer learning so I can't comment on whether that brings things down to consumer-grade laptop level. @rasbt probably knows more than I do about that.

Benjamin-Lee avatar Oct 11 '20 02:10 Benjamin-Lee

I think it would be helpful to clarify this as it would help inform someone whether they can actually do DL. If it is appropriate to their problem but not really doable on their device, of course they can look into using the cloud or initiating a collaboration.

SiminaB avatar Oct 11 '20 02:10 SiminaB

Definitely a good idea to speak affirmatively to what DL needs.

Benjamin-Lee avatar Oct 11 '20 02:10 Benjamin-Lee

I'm copying my comment from https://github.com/Benjamin-Lee/deep-rules/pull/313#issuecomment-760316895 here so we don't lose track of it.

  • There is a lot of existing guidance about best practices for machine learning and deep learning that we do not reference
  • The examples we provide in the intro and elsewhere are pretty arbitrary and not necessarily representative or the most impressive applications
  • Some tips still have no biology examples
  • Second person is not used consistently (#237)
  • Some tips (e.g. 4) aren't very specific to deep learning
  • There is some redundancy across tips

These are all minor enough to address after the initial submission.

agitter avatar Jan 26 '21 15:01 agitter

Thank you for adding it here and glad to see nothing else is blocking. I'll work on #237 once we do the content freeze since that is cosmetic.

Benjamin-Lee avatar Jan 27 '21 04:01 Benjamin-Lee