ro-crate icon indicating copy to clipboard operation
ro-crate copied to clipboard

Some questions from a specific use case

Open ieguinoa opened this issue 6 years ago • 4 comments

Hi all,

I'm working on the use case described in https://github.com/ResearchObject/ro-crate/issues/20 Basically I want to wrap a reference workflow obtained from a Galaxy instance so the main file would be a .ga file here named "workflow_Galaxy.ga". Besides this file I will include a CWL-abstract version of it in a file named here as "workflow_abstract.cwl". This would be more like an extra metadata file as it is not really executable or interpretable by software. Those are the 2 payload files included in the RO-crate and would both be in the root dir.
I built a template for the ro-crate-metadata.jsonld file of this create:

{ "@context": "https://w3id.org/ro/crate/1.0/context",
 "@graph": [
   {
       "@type": "CreativeWork",
       "@id": "ro-crate-metadata.jsonld",
       "conformsTo": {"@id": "https://w3id.org/ro/crate/1.0"},
       "about": {"@id": "./"}
 },  
 {
   "@id": "./",
   "@type": "Dataset",
   "hasPart": [
     {
       "@id": "workflow_Galaxy.ga"
     },
     {
       "@id": "workflow_abstract.cwl"
     },
     ],
  },
 {
   "@id": "workflow_Galaxy.ga",
   "@type":["File", "SoftwareSourceCode", "Workflow"],
   "contentSize": "****Fill at runtime***",
   "description": "Workflow description in Galaxy format 2",
   "encodingFormat": "text/yaml",
   "programmingLanguage": {"@id": "https://galaxyproject.org/"},  
 },
 {
   "@id": "workflow_abstract.cwl",
   "@type":  ["File", "Workflow"],    
   "contentSize": "****Fill at runtime***",
   "description": "Workflow description in CWL-abstract format",
   "encodingFormat": "text/yaml" ,
   "programmingLanguage": {"@id": "https://w3id.org/cwl/v1.1/"},   
 }
 {
   "@id": "#history-01",
   "@type": "CreateAction",
   "object": { "@id": "workflow_Galaxy.ga" },
   "name": "Workflow file created",
   "endTime": "2020-01-27",
   "agent": { "@id": "human agent responsible for this" },
   "instrument": { "@id": "https://usegalaxy.be" },
   "actionStatus":  { "@id": "http://schema.org/CompletedActionStatus" }
 },
 {
   "@id": "https://usegalaxy.be", 
   "@type": "SoftwareApplication",
   "name": "The Belgian Galaxy instance",
   "url": "http://usegalaxy.be",
   "version": "2020-01-27"   
 }
]

But still have a few specific questions that couldn't find in the specification:

  • Is it ok to use the encodingFormat property for the workflow file/s? the specification references to the programmingLanguage property to describe the software that creates/runs the workflow but I think it's also useful to define the format itself if possible (yaml in the case of Galaxy format 2).
  • I would like to represent the fact that the workflow file was created in a specific server instance (in this case usegalaxy.be) but could, in theory, be run in any server running Galaxy. Is it correct to have different entities for each? or in the case of webservices should i create a single entity with the software name (Galaxy) and the specific url of the instance as a property? Also for web services, would it make sense to use the date when the service was used as a version?
  • Would it make sense to add the "SoftwareSourceCode" in the abstract cwl? it's not really executable/interpretable by any software.

Hope someone can help me with these details. Thanks, Ignacio

ieguinoa avatar Jan 27 '20 16:01 ieguinoa

@stain can you help with this?

ptsefton avatar Feb 12 '20 23:02 ptsefton

Hi @ieguinoa apologiez for late reply.

I think you are somewhat right in not calling the abstract CWL a SoftwareSourceCode in this case as it is not executable while having abstract steps - but if we think of it like a *.h header file rather than a *.c - then both would still be source code? Perhaps we should not dwell on that - there could be other non-executable workflow formats that are mainly diagrammatic/descriptive.

stain avatar Mar 06 '20 11:03 stain

I'm wondering whether there is such a CWL validator which checks whether a CWL workflow is abstract or runnable. In other words, which elements should not appear in any CWL workflow in order to be considered truly abstract?

jmfernandez avatar Mar 06 '20 12:03 jmfernandez

@jmfernandez, at the moment the cwltool version 2.0.20200224214940 or later supports the abstract Operation with cwlVersion: v1.2.0-dev1 (example) - this will be included in v1.2.

This will let both variants --validate, but at execution the workflow will fail if Operation is part of the workflow.

(cwltool) stain@biggie:~/src/cwltool/cwltool$ cwltool --enable-dev --validate https://raw.githubusercontent.com/common-workflow-language/cwl-v1.2/master/tests/operation.cwl
INFO /home/stain/miniconda3/envs/cwltool/bin/cwltool 2.0.20200303141624
https://raw.githubusercontent.com/common-workflow-language/cwl-v1.2/master/tests/operation.cwl is valid CWL.

It would be a bit awkward to need to attempt to run the workflow, perhaps simplest detection is to just do --print-rdf and grep for cwl:Operation?

(cwltool) stain@biggie:~/src/cwltool/cwltool$ cwltool --quiet --enable-dev --print-rdf https://raw.githubusercontent.com/common-workflow-language/cwl-v1.2/master/tests/operation.cwl |grep cwl:Operation
<https://raw.githubusercontent.com/common-workflow-language/cwl-v1.2/master/tests/operation.cwl#reverse> a cwl:Operation ;

Determining if it's "fully abstract" I guess would be trickier, then you would not allow CommandLineTool nor ExpressionTool?

stain avatar Mar 06 '20 14:03 stain