panoptes-python-client icon indicating copy to clipboard operation
panoptes-python-client copied to clipboard

Correct mime types for json subjects

Open srallen opened this issue 6 years ago • 6 comments

The TESS project should be using json files as one of their subject locations, with a json file extension and a mime type of application/json. Currently libmagic is not correctly detecting these file types and staging subjects have been uploaded as txt files with a mime type of text/plain.

We can set workflow configs to load a particular subject viewer, but I still would like to do validation on the expected json structure so we don't attempt to render something that has something wrong with its data. We typically expect text file subjects to be rendered just as plain text and are typically transcription projects, not data that should be plotted.

srallen avatar Feb 26 '19 15:02 srallen

We'll need to think about how best to implement this. Presumably we'll need to check the filenames for a .json extension.

I think we have three options:

  1. Add a list of known file extensions/mime types. A lot of people seem to be having trouble installing libmagic, so maybe it would be best to only use it as a fallback if the file extension is unknown.
  2. Specifically add an exception for JSON. i.e. if the type is text/plain, check if the filename ends in .json.
  3. Add a way to manually override the mime type.

What do people think?

adammcmaster avatar Feb 27 '19 11:02 adammcmaster

The problem files for the TESS project have a .txt extension, so we should try this with .json and see if that extension causes problems. I think it's correct behaviour to have text/plain when the extension is .txt.

eatyourgreens avatar Feb 27 '19 11:02 eatyourgreens

I think option 1 the better option then falling back to libmagic if it's installed. Looks like mimetypes package? https://docs.python.org/3/library/mimetypes.html#module-mimetypes

camallen avatar Feb 27 '19 11:02 camallen

I’ve run into this again today for SLSN. My workaround is to explicitly set the MIME type and file contents (apologies for my terrible Python):

subject.locations.append('application/json')
json_data = open('data/subject-1234.json', 'rb')
subject._media_files.append(json_data.read())
json_data.close()

eatyourgreens avatar Feb 21 '23 02:02 eatyourgreens

This same problem also occurs with .svg files. They’re converted to .txt.

eatyourgreens avatar Feb 21 '23 02:02 eatyourgreens

The Python CLI uses subject.add_location to add file names from a manifest to an upload, which also runs into this bug when libmagic generates the wrong type.

eatyourgreens avatar Feb 21 '23 09:02 eatyourgreens