Using workflow outputs to publish files from a nested map channel

I have the following workflow:

#!/usr/bin/env nextflow
nextflow.enable.dsl=2

process ALIGN {

  input:
  val(sample)

  output:
  tuple val(sample), path("*.bam")

  script:
  """
  touch ${sample}-1.bam
  touch ${sample}-2.bam
  """
}

process CALL_VARIANTS {

  output:
  path ("*.vcf.gz")

  script:
  """
  touch final.vcf.gz
  """
}

workflow {

  # Run processes.
  Channel.of('SAMPLE1', 'SAMPLE2')
  | ALIGN

  CALL_VARIANTS()

  # Create output manifest channel.
  ALIGN.out
  .map { sample, bam ->
    [
      sample: sample,
      bam: bam
    ]
  }
  .collect()
  .merge(CALL_VARIANTS.out) { per_sample, jointcall -> [per_sample: per_sample, jointcall: jointcall]}
  .dump(pretty: true, tag: 'results_ch')
  .set { results_ch }
}

When I run nextflow run main.nf -dump-channels results_ch, I get the following channel dump:

[DUMP: results_ch] {
    "per_sample": [
        {
            "sample": "SAMPLE1",
            "bam": [
              "/path/to/SAMPLE1-1.bam",
              "/path/to/SAMPLE1-2.bam"
            ]
        },
        {
            "sample": "SAMPLE2",
            "bam": [
              "/path/to/SAMPLE2-1.bam",
              "/path/to/SAMPLE2-2.bam"
            ]
        }
    ],
    "jointcall": "/path/to/final.vcf.gz"
}

I would like to use this channel in workflow outputs to create a single json manifest/index file, and to publish outputs to the create target directories. However, this nested structure presents a problem when attempting to publish outputs because the result.per_sample.bam >> "bam/" statement expects each entry in the list to be a filepath, not another list:

workflow {

  ...workflow continued from above...

  publish:
  results = results_ch
}

output {
  results {
    path { result ->
      result.per_sample.bam >> "bam/"
    }
    index {
      path "manifest.json"
    }
  }
}
ERROR ~ Cannot cast object '["/path/to/SAMPLE1-1.bam", "/path/to/SAMPLE1-2.bam"]' with class 'java.util.ArrayList' to class 'java.nio.file.Path' due to: groovy.lang.GroovyRuntimeException: Could not find matching constructor for: java.nio.file.Path(sun.nio.fs.UnixPath, sun.nio.fs.UnixPath)

The problem is that path { result -> result.per_sample.bam >> "bam/" } yields this output structure:

[
  [
    "/path/to/SAMPLE1-1.bam",
    "/path/to/SAMPLE1-2.bam"
  ],
  [
    "/path/to/SAMPLE2-1.bam",
    "/path/to/SAMPLE2-2.bam"
  ]
]

instead of the structure I need, which is:

[
  "/path/to/SAMPLE1-1.bam",
  "/path/to/SAMPLE1-2.bam"
]
[
  "/path/to/SAMPLE2-1.bam",
  "/path/to/SAMPLE2-2.bam"
]

In the main body of the workflow, I would use the flatMap() operator to obtain independent lists from a list of lists:

results_ch
.flatMap { result ->
  result.per_sample.bam
}
[
  "/path/to/SAMPLE1-1.bam",
  "/path/to/SAMPLE1-2.bam"
]
[
  "/path/to/SAMPLE2-1.bam",
  "/path/to/SAMPLE2-2.bam"
]

but I don’t think I can use flatMap() in the output block. Is there another way to achieve this?

It looks like you are doing result.per_sample.bam but this is not correct because per_sample is a list of maps where each map has a bam property

You can use the collectMany method for lists, it works the same way as flatMap:

result.per_sample.collectMany { s -> s.bam } >> "bam/"
1 Like

As a side note, the use of merge here is a disaster waiting to happen.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.