one-container limitation
the current implementation has this limitation we've discussed: because a user is linked to a specific container, if I try to run two tabs with LOLAweb at the same time, it doesnt' really work; one is unresponsive while the other is running. this is actually a substantial limitation... is there a way to relax it?
One thought is that if when you click "Run LOLA", instead of doing all the processing there in that session, it just immediately forwarded you to the result page -- which would just display "waiting" until it could find the result. Then we wouldn't need for there to be sessions at all.
I feel this would really improve the user experience.
Just some more thoughts on this...
This idea is circling back to some of our original thoughts about splitting the app into the "processing" component and the "visualization" component.
It seems to me the processing should happen in the back, not in a session that is tied to a user web session. that would solve this issue.
Proposal: The LOLAweb landing page uploads the file, assigns the cache ID (example: C2CO3K8 or whatever), and creates a temporary "processing" file (in a different folder) named with that cacheID: processing/C2CO3K8.txt. It then forwards the user to the results url (key=?C2CO3K8) immediately. This is almost what it's already doing; it just changes one thing: instead of doing any actual processing, it just registers the job in that file and then forwards the user as it already does. this is much faster, of course.
The results page checks for the C2CO3K8.Rdata cache to load and display if it exists; if not, it checks for processing/C2CO3K8.txt and then displays a message "still processing".
Meanwhile, a separate container with an R process is running a while loop that is constantly looking in /processingfor new "jobs". When it finds one, it identifies the uploaded file(s) (which could be recorded in theC2CO3K8.txt` file and processes them. It just stays running, waiting for the next "job".
This is a very simple job submission cache that has the following advantages:
- We can use one (or more) R container with all databases pre-loaded. No more duplicated memory at all...this also solves the problem we're trying to use
redisfor. - User is never interfacing with a long computation, which is what is causing UI response issues.
Disadvantages:
- web page no longer displays "loading cache... calculating overlaps..." etc... Not a big deal.
- anything else?
@vpnagraj @nmagee any thoughts on this?
Here's some code that actually implements a good chunk of this:
#' Function that monitors a folder and run a function when a file arrives in the
#' monitored folder.
#' @param folder The folder to watch for new job files
#' @param FUN the function to run on a new job file when found. Should take the absolulte file name as the sole argument.
lurk = function(folder, FUN) {
while(TRUE) {
jobs = list.files(pattern="*.yaml")
# If no new jobs were found, just wait a minute
if (len(jobs) < 1) {
Sys.sleep(60)
continue
}
# If new jobs were found, process in time order
details = file.info(jobs)
jobs = jobs[order(as.POSIXct(details$mtime))]
for (job in jobs) {
# Run the function that will process that job
FUN(job)
}
}
}
initiateJob(jobID) {
# creates a temp file so the HTMl form can say "processing"
}
# A function that will run the LOLAweb process for a given "job" file.
# The job file should specify the path to the user uploaded file and
# any other user-selections provided
processLOLAwebJob = function(file, outfolder="/path/to/results/", resources=LWResources) {
job = yaml::yaml.load(file)
initiateJob(job$jobID)
# Access the pre-loaded data
regionDB = resources$regionDBs[jobs$genome]
result = runLOLA(job$query, regionDB)
# Now, store that result in the output folder,
# which should then render correctly
save(result, outfolder)
}
# First, load up some data to save in this container
LWResources = list()
LWResources$regionDBs = list()
for (genome in genomes) {
LWResources$regionDBs[[genome]] = LOLA::loadRegionDB("/path/to/genome")
}
LWResources$universes = list()
for (universe in universes) {
LWResources$universes[[universe]] = LOLA::readbed("/path/to/universe")
}
# Now, just lurk, waiting for new jobs:
lurk("/path/to/job/folder", FUN=processLOLAwebJob)
@nsheff thanks for checking back in and providing some code to illustrate the idea ... very interesting
i've been working through some of the other issues on the dev branch and haven't had a chance to think too carefully about this
i may have missed something in the comments above ... but this would require us to store the region sets (and custom universes, if any) that users submit, correct?
this would require us to store the region sets (and custom universes, if any) that users submit, correct?
well -- for about 60 seconds, yes... those can be deleted at the end of the job run (or even after loading them), though.
This is also related to #30... it would be a full implementation of that idea. Our original approach was to use tabs, but didn't fully complete the idea in #30. It would really require a full decoupling to solve the container limitation issue, I think.
I am formalizing this with a new R package that will abstract away the splitting, called shinyDepot
@nsheff have you seen this post:
http://blog.rstudio.com/2018/06/26/shiny-1-1-0/
not sure if this is exactly what you have in mind, but may be useful
yeah, that's very very similar... I am approaching it in a different way, which is a little more complicated, but more powerful... that will get us part of the way there but for our needs, I think the shinyDepot idea will be superior... I updated the README there which explains in a bit more detail now.
@vpnagraj do you want to take a look and see if you can figure out what I'm trying to do there? This would really open the door to higher performance apps and I think maybe even negate the need for the redis concept, among other advantages... let me know if something is confusing. there's a lot of not-quite-working pseudocode and ideas there but I think it's a good start