There is excellent presentation made by snakemake authors, which you should read, since the code below closely follows it. Official tutorial describes more advanced example.
First, install SnakeMake:
condaactivateenvname# In case you don't have it yetcondainstallsnakemake
Basic example
Let's start with very basic example - sorting some files.
mkdirsnake_testcdsnake_test# Following lines create two files with numbers randomly generated# in a range from 1 to 10python-c$'import random\nfor i in range(5): print(random.randint(1,10))'>A.txtpython-c$'import random\nfor i in range(5): print(random.randint(1,10))'>B.txt
Now, let's modify Snakefile, so it will process both A and B files. By default snakemake executes the first rule in the snakefile. This gives rise to pseudo-rules at the beginning of the file that can be used to define build-targets similar to GNU Make. So, in a way in the all rule we request all the output files to be present, and Snakemake recognizes automatically that these can be created by multiple applications of the rule sort:
But what is peculiar about this output? Rule sort sorted only B file, right? That's because we already sorted A and we have A output already in our folder. We can force all tasks execution and see if the output is different:
So, now we see that sort processed both A and B files.
NB: -f flag will force execute firth rule regardless of the output, and -F will force execute all the rules.
Some more useful commands:
# execute the workflow with target A.sorted.txtsnakemakeA.sorted.txt# dry-runsnakemake-n# dry-run, print shell commandssnakemake-n-p# dry-run, print execution reason for each jobsnakemake-n-r
Amazing feature of the snakemake is pipeline diagram plotting:
Then we can start snakemake with the following parameters:
# Execute the workflow with 8 coressnakemake--cores8# Prioritize the creation of a certain filesnakemake--prioritizeA.sorted.txt--cores8# Execute the workflow with 8 cores and 100MB memory.# Will execute only one job at a time, since in the Snakefile we set memory limit.snakemake--cores8--resourcesmem_mb=100
Finally, we would want to read the input list from external file as the current design is not customizable enough. First, create config file:
condaactivateenvnamecondainstallfastqcsubversionsvnexporthttps://github.com/sysbio-vo/sysbio-course/trunk/examples/snake_qc/cdsnake_qc/mkdirrawsmkdirfastqccdraws# From http://www.ebi.ac.uk/ena/data/view/SRR1750053wgetftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR175/003/SRR1750053/SRR1750053_1.fastq.gzwgetftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR175/003/SRR1750053/SRR1750053_2.fastq.gzcd..snakemake-sworkflows/qc.wf-p--configfileconfigs/config.yaml--cores4-rlsfastqc/snakemake--dag-sworkflows/qc.wf--configfileconfigs/config.yaml|dot-Tsvg>dag.svgeogdag.svg
Check fastqc html report. What can you tell about it? Closely examine rules and workflows folders. What is different about this snakemake workflow design if you compare with previous simple example?