FastRWeb icon indicating copy to clipboard operation
FastRWeb copied to clipboard

echoed GET data comes back corrupted

Open dtenenba opened this issue 12 years ago • 1 comments

Hello, When I POST a bunch of data to a simple FastRWeb script that echoes it back, it sometimes comes back slightly altered, depending on how I did the POST.

Example output:

$ diff cmdline.txt rcurl.txt 
178c178
< ensembl/release-74/fasta/erinaceus_europaeus/dna/Erinaceus_europaeus.HEDGEHOG.74.dna.toplevel.fa.rz

---
> ensembl/release-74/fasta/erinaceus_europaeus/dna/EErinaceus_europaeus.HEDGEHOG.74.dna.toplevel.fa.rz
350c350
< ensembl/release-74/fasta/pelodiscus_sinensis/dna/Pelodiscus_sinensis.PelSin_1.0.74.dna_rm.toplevel.fa.rz

---
> ensembl/release-74/fasta/pelodiscus_sinensis/dna/Pelodiscus_sinensis.PelSin_11.0.74.dna_rm.toplevel.fa.rz

See how Erinaceus becomes EErinaceus and 1.0 becomes 11.0.

Here's the simple FastRWeb script (save this as test.R in your web.R directory):

run <- function(...) {
  params <- list(...)
  out(params$RDataPath)
}

I compared the results with an equivalent python CGI script:

import cgi
print "Content-Type: text/html"
print   
form = cgi.FieldStorage()
print(form["RDataPath"].value)

and the problem does not occur with python so it seems to be happening in FastRWeb.

How to reproduce:

Download these files: https://s3.amazonaws.com/fastrweb-problem/paths.txt https://s3.amazonaws.com/fastrweb-problem/paths2.txt

They are identical except paths2.txt has "RDataPath=" at the beginning so it's suitable for POSTing from the curl command line.

1: POSTing from the command line:

curl -X POST --data @paths2.txt http://localhost/cgi-bin/R/test.R > cmdline.txt

2: POSTing from RCurl:

library(RCurl)
pathStr <- readLines("paths.txt")
res <- postForm("http://localhost/cgi-bin/test.cgi", RDataPath=pathStr)
cat(res, file="rcurl.txt")

3: evaluate results:

edit both cmdline.txt and rcurl.txt, replacing all spaces with newlines. Then diff them:

diff cmdline.txt rcurl.txt

You should see something like the diff output at the beginning of this email. It seems to corrupt the data in slightly different ways each time.

I hope you can fix this. It isn't clear to me how to work around it because I can't predict where the corruption will occur. Unfortunately using command-line curl is not a good workaround since I need to POST from a Bioconductor package which may be running on systems (e.g. windows) where command-line curl is not available. I'm not an RCurl expert but maybe there's a way to make my RCurl post look more like the command-line curl post and thereby not confuse FastRWeb so much?

Thanks! Dan

BTW, the bugzilla link on http://rforge.net/ is broken, it points to http://rforge.net/bugzilla/ which is not found.

@mtmorgan @mrjc42

dtenenba avatar Jan 24 '14 17:01 dtenenba

I think this mainly an issue caused by RCurl - it does NOT actually use POST in the example above, instead it attempts to encode the payload into a GET request. Simply setting style="POST" in postForm seems to fix any problems I encountered with postForm() (the original files are gone, so I just used 62k bytes of payload).

I cannot actually replicate the problem as described, because GETs with too large payload fail in other ways as there are limitations on CGI on those. POST is the preferred way since the body is passed as a raw vector and thus guaranteed to be intact, while query strings are passed as a string that is escaped and then parsed in R.

s-u avatar Nov 02 '21 23:11 s-u