PAP Functional Workflow v2020.09

Annotation: Functional Phage annotation workflow, please see the guide at https://cpt.tamu.edu/training-material/topics/phage-annotation-pipeline/ for help

StepAnnotation
Step 1: Input dataset
select at runtime
Step 2: Input dataset
select at runtime
Step 3: Split GFF3+Fasta into separate parts
Output dataset 'output' from step 1
Step 4: Retrieve JBrowse
Autodetect from Apollo JSON
Output dataset 'output' from step 2
Step 5: Remove Description
Output dataset 'fasta_out' from step 3
Step 6: Reformat GFF3 File
Output dataset 'gff_out' from step 3
Step 7: Get open reading frames (ORFs) or coding sequences (CDSs)
Output dataset 'out' from step 5
11. Bacterial
Look for ORFs (check for stop codons only, ignore start codons)
30
Search both the forward and reverse strand
Step 8: NCBI BLAST+ blastn (CPT Latest)
Output dataset 'out' from step 5
Locally installed BLAST database
nt_current
Empty.
Empty.
dc-megablast - Discontiguous megablast used to find more distant (e.g., interspecies) sequences
0.001
Tabular (extended 25 columns)
12333 1714270 10656 10472 10659 10841 10860 28883 11989 10877 79205 2
Show Advanced Options
True
Both
0
Not available.
0.0
Not available.
False
False
No restriction, search the entire database
0.0
Not available.
Not available.
Not available.
Step 9: GFF3 Feature Sequence Export
From History
Output dataset 'out' from step 5
Output dataset 'default' from step 6
CDS
False
Step 10: GFF3 Add Gene to CDS
Output dataset 'out_gff3_file' from step 7
Step 11: Blast Tabular Dice Filter
Output dataset 'output1' from step 8
0.2
Step 12: Start Codon Statistics
Output dataset 'default' from step 9
Step 13: Stop Codon Statistics
Output dataset 'default' from step 9
Step 14: Fasta Translate
Output dataset 'default' from step 9
Protein
[11] The Bacterial, Archaeal and Plant Plastid Code
True
True
Step 15: GFF3 Filter: Require Phage Start
From History
Output dataset 'out' from step 5
Output dataset 'output' from step 10
Step 16: BlastN Results to GFF3
Blast 25 Column Table"
Output dataset 'output' from step 11
Step 17: TMHMM (GFF3)
Output dataset 'default' from step 14
Step 18: LipoP
Output dataset 'default' from step 14
False
Step 19: Interproscan functional predictions of ORFs
Output dataset 'default' from step 14
True
True
True
TIGRFAM: protein families based on Hidden Markov Models or HMMs PIRSF: non-overlapping clustering of UniProtKB sequences into a hierarchical order (evolutionary relationships) Panther: Protein ANalysis THrough Evolutionary Relationships SMART: identification and analysis of domain architectures based on Hidden Markov Models or HMMs PROSITE Profiles: protein domains, families and functional sites as well as associated profiles to identify them PROSITE Pattern: protein domains, families and functional sites as well as associated patterns to identify them HAMAP: High-quality Automated Annotation of Microbial Proteomes PfamA: protein families, each represented by multiple sequence alignments and hidden Markov models PRINTS: group of conserved motifs (fingerprints) used to characterise a protein family SUPERFAMILY: database of structural and functional annotation Coils: Prediction of Coiled Coil Regions in Proteins Gene3d: Structural assignment for whole genes and genomes using the CATH domain structure database SignalP Gram Positive Bacteria SignalP Gram Negative Bacteria Phobius: combined transmembrane topology and signal peptide predictor
Step 20: NCBI BLAST+ blastp (CPT Latest)
Output dataset 'default' from step 14
Locally installed BLAST database
nr
Empty.
Empty.
blastp - Traditional BLASTP to compare a protein query to a protein database
0.001
BLAST XML
12333 1714270 10656 10472 10659 10841 10860 28883 11989 10877 79205
Hide Advanced Options
Step 21: NCBI BLAST+ blastp (CPT Latest)
Output dataset 'default' from step 14
Locally installed BLAST database
canonical_2021
Empty.
Empty.
blastp - Traditional BLASTP to compare a protein query to a protein database
0.001
BLAST XML
Empty.
Hide Advanced Options
Step 22: NCBI BLAST+ blastp (CPT Latest)
Output dataset 'default' from step 14
Locally installed BLAST database
sprot
Empty.
Empty.
blastp - Traditional BLASTP to compare a protein query to a protein database
0.001
BLAST XML
Empty.
Hide Advanced Options
Step 23: Gff3 Filter: Require SD
From History
Output dataset 'out' from step 5
Output dataset 'stdout' from step 15
Step 24: Rebase Wig Analysis Results
Output dataset 'default' from step 6
Output dataset 'bw_i' from step 17
True
ID
Step 25: Rebase GFF3 features
Output dataset 'default' from step 6
Output dataset 'output' from step 17
False
True
ID
Step 26: Rebase Wig Analysis Results
Output dataset 'default' from step 6
Output dataset 'bw_o' from step 17
True
ID
Step 27: Rebase Wig Analysis Results
Output dataset 'default' from step 6
Output dataset 'bw_m' from step 17
True
ID
Step 28: LipoP to GFF3
Output dataset 'html_file' from step 18
Output dataset 'default' from step 6
True
True
Step 29: Rebase GFF3 features
Output dataset 'default' from step 6
Output dataset 'output_gff3' from step 19
True
True
ID
Step 30: Interrupted gene detection tool
Output dataset 'default' from step 6
Output dataset 'output1' from step 20
10
2000
0.3
Step 31: BlastP Results to GFF3
BlastXML
Output dataset 'output1' from step 20
True
Step 32: BlastP Results to GFF3
BlastXML
Output dataset 'output1' from step 21
True
Step 33: BlastP Results to GFF3
BlastXML
Output dataset 'output1' from step 22
True
Step 34: Identify Lipoboxes
Output dataset 'stdout' from step 23
Output dataset 'out' from step 5
10
60
Step 35: GFF3 Feature Sequence Export
From History
Output dataset 'out' from step 5
Output dataset 'stdout' from step 23
CDS
True
Step 36: Wig to BigWig
Output dataset 'output' from step 24
Output dataset 'out' from step 5
Step 37: Wig to BigWig
Output dataset 'output' from step 26
Output dataset 'out' from step 5
Step 38: Wig to BigWig
Output dataset 'output' from step 27
Output dataset 'out' from step 5
Step 39: Remove Annotation Feature
Output dataset 'stdout' from step 28
True
True
Step 40: Rebase GFF3 features
Output dataset 'default' from step 6
Output dataset 'output' from step 31
False
True
ID
Step 41: Rebase GFF3 features
Output dataset 'default' from step 6
Output dataset 'output' from step 32
False
True
ID
Step 42: Rebase GFF3 features
Output dataset 'default' from step 6
Output dataset 'output' from step 33
False
True
ID
Step 43: Fasta Translate
Output dataset 'default' from step 35
Protein
[11] The Bacterial, Archaeal and Plant Plastid Code
True
True
Step 44: LipoP
Output dataset 'default' from step 43
False
Step 45: NCBI BLAST+ blastp (CPT Latest)
Output dataset 'default' from step 43
Locally installed BLAST database
spanindbv2
Empty.
Empty.
blastp - Traditional BLASTP to compare a protein query to a protein database
0.001
BLAST XML
Empty.
Hide Advanced Options
Step 46: TMHMM (GFF3)
Output dataset 'default' from step 43
Step 47: LipoP to GFF3
Output dataset 'html_file' from step 44
Output dataset 'stdout' from step 23
True
True
Step 48: Rebase GFF3 features
Output dataset 'stdout' from step 23
Output dataset 'output' from step 46
False
True
ID
Step 49: Intersect and Adjacent
Output dataset 'default' from step 48
Output dataset 'default' from step 34
50
True
Step 50: Intersect and Adjacent
Output dataset 'default' from step 48
Output dataset 'stdout' from step 47
50
True
Step 51: Remove Annotation Feature
Output dataset 'oa' from step 49
True
True
Step 52: Remove Annotation Feature
Output dataset 'ob' from step 49
True
True
Step 53: Remove Annotation Feature
Output dataset 'ob' from step 50
True
True
Step 54: Remove Annotation Feature
Output dataset 'oa' from step 50
True
True
Step 55: JBrowse
Use a genome from history
Output dataset 'fasta_out' from step 3
False
11. The Bacterial, Archaeal and Plant Plastid Code
Update exising JBrowse Instance
Output dataset 'jbrowse' from step 4
Track Groups
Track Group 1
#date# Functional Annotation / Sequence Analysis / Phage
Annotation Tracks
Annotation Track 1
GFF/GFF3/BED Features
Output dataset 'output' from step 30
False
2000000
False
Canvas Features
CanvasFeatures Options [Advanced]:
Empty.
Empty.
False
JBrowse Styling Options [Advanced]:
feature
Empty.
note,description
10px
600
JBrowse Feature Score Scaling & Coloring Options [Advanced]:
Ignore score
Automatically selected
JBrowse Custom Track Config [Advanced]:
Custom Track Config Options
JBrowse Contextual Menu options [Advanced]:
Track Menus
Off for new users
No - Do not Override
No - Do not Override
Track Group 2
#date# Functional Annotation / Sequence Analysis / Structural
Annotation Tracks
Annotation Track 1
GFF/GFF3/BED Features
Output dataset 'default' from step 25
False
2000000
False
HTML Features
HTMLFeatures Options [Advanced]:
Empty.
JBrowse Styling Options [Advanced]:
feature
product,name,id
note,description
10px
600
JBrowse Feature Score Scaling & Coloring Options [Advanced]:
Ignore score
Automatically selected
JBrowse Custom Track Config [Advanced]:
Custom Track Config Options
JBrowse Contextual Menu options [Advanced]:
Track Menus
Off for new users
No - Do not Override
No - Do not Override
Annotation Track 2
GFF/GFF3/BED Features
Output dataset 'default' from step 29
False
2000000
False
Canvas Features
CanvasFeatures Options [Advanced]:
Empty.
Empty.
False
JBrowse Styling Options [Advanced]:
feature
signature_desc,product,name,id
note,description,name
10px
600
JBrowse Feature Score Scaling & Coloring Options [Advanced]:
Ignore score
Automatically selected
JBrowse Custom Track Config [Advanced]:
Custom Track Config Options
JBrowse Contextual Menu options [Advanced]:
Track Menus
Off for new users
No - Do not Override
No - Do not Override
Annotation Track 3
BigWig XY
Output dataset 'bigwig' from step 36,Output dataset 'bigwig' from step 37
True
False
Specify Min/Max
0
1
Linear
False
JBrowse Color Options [Advanced]:
Automatically selected
Zero
JBrowse Custom Track Config [Advanced]:
Custom Track Config Options
Off for new users
No - Do not Override
No - Do not Override
Annotation Track 4
BigWig XY
Output dataset 'bigwig' from step 38
True
False
Specify Min/Max
0
1
Linear
False
JBrowse Color Options [Advanced]:
Automatically selected
Zero
JBrowse Custom Track Config [Advanced]:
Custom Track Config Options
Off for new users
No - Do not Override
No - Do not Override
Annotation Track 5
GFF/GFF3/BED Features
Output dataset 'default' from step 39
False
2000000
False
Canvas Features
CanvasFeatures Options [Advanced]:
Empty.
Empty.
False
JBrowse Styling Options [Advanced]:
feature
product,name,id
note,description
10px
600
JBrowse Feature Score Scaling & Coloring Options [Advanced]:
Ignore score
Automatically selected
JBrowse Custom Track Config [Advanced]:
Custom Track Config Options
JBrowse Contextual Menu options [Advanced]:
Track Menus
Off for new users
No - Do not Override
No - Do not Override
Track Group 3
#date# Functional Annotation / Blast / Nucleotide
Annotation Tracks
Annotation Track 1
GFF/GFF3/BED Features
Output dataset 'output' from step 16
True
match
2000000
False
Blast Features
JBrowse Styling Options [Advanced]:
feature
product,name,id
note,description
10px
600
JBrowse Feature Score Scaling & Coloring Options [Advanced]:
Ignore score
Automatically selected
JBrowse Custom Track Config [Advanced]:
Custom Track Config Options
JBrowse Contextual Menu options [Advanced]:
Track Menus
Off for new users
No - Do not Override
No - Do not Override
Track Group 4
#date# Functional Annotation / Blast / Protein
Annotation Tracks
Annotation Track 1
GFF/GFF3/BED Features
Output dataset 'default' from step 40,Output dataset 'default' from step 42,Output dataset 'default' from step 41
True
match
5000000
False
Blast Features
JBrowse Styling Options [Advanced]:
feature
product,name,id
note,description
10px
600
JBrowse Feature Score Scaling & Coloring Options [Advanced]:
Ignore score
Automatically selected
JBrowse Custom Track Config [Advanced]:
Custom Track Config Options
JBrowse Contextual Menu options [Advanced]:
Track Menus
Off for new users
No - Do not Override
No - Do not Override
Track Group 5
#date# Functional Annotation / Sequence Analysis / Spanin
Annotation Tracks
Annotation Track 1
GFF/GFF3/BED Features
Output dataset 'default' from step 52,Output dataset 'default' from step 53
False
2000000
False
Canvas Features
CanvasFeatures Options [Advanced]:
Empty.
Empty.
False
JBrowse Styling Options [Advanced]:
feature
product,name,id
note,description
10px
600
JBrowse Feature Score Scaling & Coloring Options [Advanced]:
Ignore score
Automatically selected
JBrowse Custom Track Config [Advanced]:
Custom Track Config Options
JBrowse Contextual Menu options [Advanced]:
Track Menus
Off for new users
No - Do not Override
No - Do not Override
Annotation Track 2
Blast XML
Output dataset 'output1' from step 45
Output dataset 'stdout' from step 23
2
True
False
JBrowse Styling Options [Advanced]:
feature
description
Hit_titles
600px
600
JBrowse Feature Score Scaling & Coloring Options [Advanced]:
Ignore score
Automatically selected
JBrowse Custom Track Config [Advanced]:
Custom Track Config Options
JBrowse Contextual Menu options [Advanced]:
Track Menus
Off for new users
No - Do not Override
No - Do not Override
Annotation Track 3
GFF/GFF3/BED Features
Output dataset 'default' from step 51,Output dataset 'default' from step 54
False
2000000
False
Canvas Features
CanvasFeatures Options [Advanced]:
Empty.
Empty.
False
JBrowse Styling Options [Advanced]:
feature
product,name,id
note,description
10px
600
JBrowse Feature Score Scaling & Coloring Options [Advanced]:
Ignore score
Automatically selected
JBrowse Custom Track Config [Advanced]:
Custom Track Config Options
JBrowse Contextual Menu options [Advanced]:
Track Menus
Off for new users
No - Do not Override
No - Do not Override
General JBrowse Options [Advanced]:
Empty.
20
True
Empty.
True
True
True
True
False
Plugins:
True
False
False
Empty.
Step 56: Create or Update Organism
Output dataset 'output' from step 55
Autodetect from Apollo JSON
Output dataset 'output' from step 2
Empty.
Empty.
False
None
False
False
no
Step 57: Annotate
Output dataset 'output' from step 56
False