I have the following workflow:
#!/usr/bin/env nextflow
nextflow.enable.dsl=2
process ALIGN {
input:
val(sample)
output:
tuple val(sample), path("*.bam")
script:
"""
touch ${sample}-1.bam
touch ${sample}-2.bam
"""
}
process CALL_VARIANTS {
output:
path ("*.vcf.gz")
script:
"""
touch final.vcf.gz
"""
}
workflow {
# Run processes.
Channel.of('SAMPLE1', 'SAMPLE2')
| ALIGN
CALL_VARIANTS()
# Create output manifest channel.
ALIGN.out
.map { sample, bam ->
[
sample: sample,
bam: bam
]
}
.collect()
.merge(CALL_VARIANTS.out) { per_sample, jointcall -> [per_sample: per_sample, jointcall: jointcall]}
.dump(pretty: true, tag: 'results_ch')
.set { results_ch }
}
When I run nextflow run main.nf -dump-channels results_ch, I get the following channel dump:
[DUMP: results_ch] {
"per_sample": [
{
"sample": "SAMPLE1",
"bam": [
"/path/to/SAMPLE1-1.bam",
"/path/to/SAMPLE1-2.bam"
]
},
{
"sample": "SAMPLE2",
"bam": [
"/path/to/SAMPLE2-1.bam",
"/path/to/SAMPLE2-2.bam"
]
}
],
"jointcall": "/path/to/final.vcf.gz"
}
I would like to use this channel in workflow outputs to create a single json manifest/index file, and to publish outputs to the create target directories. However, this nested structure presents a problem when attempting to publish outputs because the result.per_sample.bam >> "bam/" statement expects each entry in the list to be a filepath, not another list:
workflow {
...workflow continued from above...
publish:
results = results_ch
}
output {
results {
path { result ->
result.per_sample.bam >> "bam/"
}
index {
path "manifest.json"
}
}
}
ERROR ~ Cannot cast object '["/path/to/SAMPLE1-1.bam", "/path/to/SAMPLE1-2.bam"]' with class 'java.util.ArrayList' to class 'java.nio.file.Path' due to: groovy.lang.GroovyRuntimeException: Could not find matching constructor for: java.nio.file.Path(sun.nio.fs.UnixPath, sun.nio.fs.UnixPath)
The problem is that path { result -> result.per_sample.bam >> "bam/" } yields this output structure:
[
[
"/path/to/SAMPLE1-1.bam",
"/path/to/SAMPLE1-2.bam"
],
[
"/path/to/SAMPLE2-1.bam",
"/path/to/SAMPLE2-2.bam"
]
]
instead of the structure I need, which is:
[
"/path/to/SAMPLE1-1.bam",
"/path/to/SAMPLE1-2.bam"
]
[
"/path/to/SAMPLE2-1.bam",
"/path/to/SAMPLE2-2.bam"
]
In the main body of the workflow, I would use the flatMap() operator to obtain independent lists from a list of lists:
results_ch
.flatMap { result ->
result.per_sample.bam
}
[
"/path/to/SAMPLE1-1.bam",
"/path/to/SAMPLE1-2.bam"
]
[
"/path/to/SAMPLE2-1.bam",
"/path/to/SAMPLE2-2.bam"
]
but I don’t think I can use flatMap() in the output block. Is there another way to achieve this?