PAP Functional Workflow v2022.01

Annotation: Phage functional annotation workflow, please see the guide at https://cpt.tamu.edu/training-material/topics/phage-annotation-pipeline/ for help. Updated 2022-03-02.

StepAnnotation
Step 1: Input dataset
select at runtime
Step 2: Input dataset
select at runtime
Step 3: Split GFF3+Fasta into separate parts
Output dataset 'output' from step 1
Step 4: Retrieve JBrowse
Autodetect from Apollo JSON
Output dataset 'output' from step 2
Step 5: Remove Description
Output dataset 'fasta_out' from step 3
Step 6: Reformat GFF3 File
Output dataset 'gff_out' from step 3
Step 7: Get open reading frames (ORFs) or coding sequences (CDSs)
Output dataset 'out' from step 5
11. Bacterial
Look for ORFs (check for stop codons only, ignore start codons)
30
Search both the forward and reverse strand
Step 8: NCBI BLAST+ blastn (CPT Latest)
Output dataset 'out' from step 5
Locally installed BLAST database
nt_current
Empty.
Empty.
dc-megablast - Discontiguous megablast used to find more distant (e.g., interspecies) sequences
0.001
Tabular (extended 25 columns)
12333 1714270 10656 10472 10659 10841 10860 28883 11989 10877 79205 2
Show Advanced Options
True
Both
0
Not available.
0.0
Not available.
False
False
No restriction, search the entire database
0.0
Not available.
Not available.
Not available.
Step 9: GFF3 Feature Sequence Export
From History
Output dataset 'out' from step 5
Output dataset 'default' from step 6
CDS
False
Step 10: GFF3 Add Gene to CDS
Output dataset 'out_gff3_file' from step 7
Step 11: Blast Tabular Dice Filter
Output dataset 'output1' from step 8
0.2
Step 12: Start Codon Statistics
Output dataset 'default' from step 9
Step 13: Stop Codon Statistics
Output dataset 'default' from step 9
Step 14: Fasta Translate
Output dataset 'default' from step 9
Protein
[11] The Bacterial, Archaeal and Plant Plastid Code
True
True
Step 15: GFF3 Filter: Require Phage Start
From History
Output dataset 'out' from step 5
Output dataset 'output' from step 10
Step 16: BlastN Results to GFF3
Blast 25 Column Table"
Output dataset 'output' from step 11
Step 17: TMHMM (GFF3)
Output dataset 'default' from step 14
Step 18: Interproscan functional predictions of ORFs
Output dataset 'default' from step 14
True
True
True
TIGRFAM: protein families based on Hidden Markov Models or HMMs PIRSF: non-overlapping clustering of UniProtKB sequences into a hierarchical order (evolutionary relationships) Panther: Protein ANalysis THrough Evolutionary Relationships SMART: identification and analysis of domain architectures based on Hidden Markov Models or HMMs PROSITE Profiles: protein domains, families and functional sites as well as associated profiles to identify them PROSITE Pattern: protein domains, families and functional sites as well as associated patterns to identify them HAMAP: High-quality Automated Annotation of Microbial Proteomes PfamA: protein families, each represented by multiple sequence alignments and hidden Markov models PRINTS: group of conserved motifs (fingerprints) used to characterise a protein family SUPERFAMILY: database of structural and functional annotation Coils: Prediction of Coiled Coil Regions in Proteins Gene3d: Structural assignment for whole genes and genomes using the CATH domain structure database SignalP Gram Positive Bacteria SignalP Gram Negative Bacteria Phobius: combined transmembrane topology and signal peptide predictor
Step 19: NCBI BLAST+ blastp (CPT Latest)
Output dataset 'default' from step 14
Locally installed BLAST database
nr
Empty.
Empty.
blastp - Traditional BLASTP to compare a protein query to a protein database
0.001
BLAST XML
12333 1714270 10656 10472 10659 10841 10860 28883 11989 10877 79205
Hide Advanced Options
Step 20: NCBI BLAST+ blastp (CPT Latest)
Output dataset 'default' from step 14
Locally installed BLAST database
canonical_2021
Empty.
Empty.
blastp - Traditional BLASTP to compare a protein query to a protein database
0.001
BLAST XML
Empty.
Hide Advanced Options
Step 21: NCBI BLAST+ blastp (CPT Latest)
Output dataset 'default' from step 14
Locally installed BLAST database
sprot
Empty.
Empty.
blastp - Traditional BLASTP to compare a protein query to a protein database
0.001
BLAST XML
Empty.
Hide Advanced Options
Step 22: LipoP
Output dataset 'default' from step 14
False
Step 23: SignalP
Output dataset 'default' from step 14
Gram-negative bacteria
True
Step 24: SignalP
Output dataset 'default' from step 14
Archaea
True
Step 25: Gff3 Filter: Require SD
From History
Output dataset 'out' from step 5
Output dataset 'stdout' from step 15
Step 26: Rebase Wig Analysis Results
Output dataset 'default' from step 6
Output dataset 'bw_i' from step 17
True
ID
Step 27: Rebase GFF3 features
Output dataset 'default' from step 6
Output dataset 'output' from step 17
False
True
ID
Step 28: Rebase Wig Analysis Results
Output dataset 'default' from step 6
Output dataset 'bw_o' from step 17
True
ID
Step 29: Rebase Wig Analysis Results
Output dataset 'default' from step 6
Output dataset 'bw_m' from step 17
True
ID
Step 30: Rebase GFF3 features
Output dataset 'default' from step 6
Output dataset 'output_gff3' from step 18
True
True
ID
Step 31: Interrupted gene detection tool
Output dataset 'default' from step 6
Output dataset 'output1' from step 19
10
2000
0.3
Step 32: BlastP Results to GFF3
BlastXML
Output dataset 'output1' from step 19
True
Step 33: BlastP Results to GFF3
BlastXML
Output dataset 'output1' from step 20
True
Step 34: BlastP Results to GFF3
BlastXML
Output dataset 'output1' from step 21
True
Step 35: LipoP to GFF3
Output dataset 'html_file' from step 22
Output dataset 'default' from step 6
True
True
Step 36: Remove Annotation Feature
Output dataset 'gffOutput' from step 23
True
True
Step 37: Remove Annotation Feature
Output dataset 'gffOutput' from step 24
True
True
Step 38: GFF3 Feature Sequence Export
From History
Output dataset 'out' from step 5
Output dataset 'stdout' from step 25
CDS
True
Step 39: Identify Lipoboxes
Output dataset 'stdout' from step 25
Output dataset 'out' from step 5
10
60
Step 40: Wig to BigWig
Output dataset 'output' from step 26
Output dataset 'out' from step 5
Step 41: Wig to BigWig
Output dataset 'output' from step 28
Output dataset 'out' from step 5
Step 42: Wig to BigWig
Output dataset 'output' from step 29
Output dataset 'out' from step 5
Step 43: Rebase GFF3 features
Output dataset 'default' from step 6
Output dataset 'output' from step 32
False
True
ID
Step 44: Rebase GFF3 features
Output dataset 'default' from step 6
Output dataset 'output' from step 33
False
True
ID
Step 45: Rebase GFF3 features
Output dataset 'default' from step 6
Output dataset 'output' from step 34
False
True
ID
Step 46: Remove Annotation Feature
Output dataset 'stdout' from step 35
True
True
Step 47: Rebase GFF3 features
Output dataset 'default' from step 6
Output dataset 'default' from step 36
False
True
ID
Step 48: Rebase GFF3 features
Output dataset 'default' from step 6
Output dataset 'default' from step 37
False
True
ID
Step 49: Fasta Translate
Output dataset 'default' from step 38
Protein
[11] The Bacterial, Archaeal and Plant Plastid Code
True
True
Step 50: LipoP
Output dataset 'default' from step 49
False
Step 51: TMHMM (GFF3)
Output dataset 'default' from step 49
Step 52: NCBI BLAST+ blastp (CPT Latest)
Output dataset 'default' from step 49
Locally installed BLAST database
spanindbv2
Empty.
Empty.
blastp - Traditional BLASTP to compare a protein query to a protein database
0.001
BLAST XML
Empty.
Hide Advanced Options
Step 53: LipoP to GFF3
Output dataset 'html_file' from step 50
Output dataset 'stdout' from step 25
True
True
Step 54: Rebase GFF3 features
Output dataset 'stdout' from step 25
Output dataset 'output' from step 51
False
True
ID
Step 55: Intersect and Adjacent
Output dataset 'default' from step 54
Output dataset 'default' from step 39
50
True
Step 56: Intersect and Adjacent
Output dataset 'default' from step 54
Output dataset 'stdout' from step 53
50
True
Step 57: Remove Annotation Feature
Output dataset 'oa' from step 55
True
True
Step 58: Remove Annotation Feature
Output dataset 'ob' from step 55
True
True
Step 59: Remove Annotation Feature
Output dataset 'ob' from step 56
True
True
Step 60: Remove Annotation Feature
Output dataset 'oa' from step 56
True
True
Step 61: JBrowse
Use a genome from history
Output dataset 'fasta_out' from step 3
False
11. The Bacterial, Archaeal and Plant Plastid Code
Update exising JBrowse Instance
Output dataset 'jbrowse' from step 4
Track Groups
Track Group 1
#date# Functional Annotation / Sequence Analysis / Interruptions
Annotation Tracks
Annotation Track 1
GFF/GFF3/BED Features
Output dataset 'output' from step 31
False
2000000
False
Canvas Features
CanvasFeatures Options [Advanced]:
Empty.
Empty.
False
JBrowse Styling Options [Advanced]:
feature
Empty.
note,description
10px
600
JBrowse Feature Score Scaling & Coloring Options [Advanced]:
Ignore score
Automatically selected
JBrowse Custom Track Config [Advanced]:
Custom Track Config Options
JBrowse Contextual Menu options [Advanced]:
Track Menus
Off for new users
No - Do not Override
No - Do not Override
Track Group 2
#date# Functional Annotation / Sequence Analysis / Domains
Annotation Tracks
Annotation Track 1
GFF/GFF3/BED Features
Output dataset 'default' from step 27
False
2000000
False
HTML Features
HTMLFeatures Options [Advanced]:
Empty.
JBrowse Styling Options [Advanced]:
feature
product,name,id
note,description
10px
600
JBrowse Feature Score Scaling & Coloring Options [Advanced]:
Ignore score
Automatically selected
JBrowse Custom Track Config [Advanced]:
Custom Track Config Options
JBrowse Contextual Menu options [Advanced]:
Track Menus
Off for new users
No - Do not Override
No - Do not Override
Annotation Track 2
GFF/GFF3/BED Features
Output dataset 'default' from step 30
False
2000000
False
Canvas Features
CanvasFeatures Options [Advanced]:
Empty.
Empty.
False
JBrowse Styling Options [Advanced]:
feature
signature_desc,product,name,id
note,description,name
10px
600
JBrowse Feature Score Scaling & Coloring Options [Advanced]:
Ignore score
Automatically selected
JBrowse Custom Track Config [Advanced]:
Custom Track Config Options
JBrowse Contextual Menu options [Advanced]:
Track Menus
Off for new users
No - Do not Override
No - Do not Override
Annotation Track 3
BigWig XY
Output dataset 'bigwig' from step 41,Output dataset 'bigwig' from step 40
True
False
Specify Min/Max
0
1
Linear
False
JBrowse Color Options [Advanced]:
Automatically selected
Zero
JBrowse Custom Track Config [Advanced]:
Custom Track Config Options
Off for new users
No - Do not Override
No - Do not Override
Annotation Track 4
BigWig XY
Output dataset 'bigwig' from step 42
True
False
Specify Min/Max
0
1
Linear
False
JBrowse Color Options [Advanced]:
Automatically selected
Zero
JBrowse Custom Track Config [Advanced]:
Custom Track Config Options
Off for new users
No - Do not Override
No - Do not Override
Track Group 3
#date# Functional Annotation / BLAST / Nucleotide
Annotation Tracks
Annotation Track 1
GFF/GFF3/BED Features
Output dataset 'output' from step 16
True
match
2000000
False
Blast Features
JBrowse Styling Options [Advanced]:
feature
product,name,id
note,description
10px
600
JBrowse Feature Score Scaling & Coloring Options [Advanced]:
Ignore score
Automatically selected
JBrowse Custom Track Config [Advanced]:
Custom Track Config Options
JBrowse Contextual Menu options [Advanced]:
Track Menus
Off for new users
No - Do not Override
No - Do not Override
Track Group 4
#date# Functional Annotation / BLAST / Protein
Annotation Tracks
Annotation Track 1
GFF/GFF3/BED Features
Output dataset 'default' from step 44,Output dataset 'default' from step 45,Output dataset 'default' from step 43
True
match
5000000
False
Blast Features
JBrowse Styling Options [Advanced]:
feature
product,name,id
note,description
10px
2400
JBrowse Feature Score Scaling & Coloring Options [Advanced]:
Ignore score
Automatically selected
JBrowse Custom Track Config [Advanced]:
Custom Track Config Options
JBrowse Contextual Menu options [Advanced]:
Track Menus
Off for new users
No - Do not Override
No - Do not Override
Track Group 5
#date# Functional Annotation / Sequence Analysis / Spanin
Annotation Tracks
Annotation Track 1
GFF/GFF3/BED Features
Output dataset 'default' from step 59,Output dataset 'default' from step 58
False
2000000
False
Canvas Features
CanvasFeatures Options [Advanced]:
Empty.
Empty.
False
JBrowse Styling Options [Advanced]:
feature
product,name,id
note,description
10px
2400
JBrowse Feature Score Scaling & Coloring Options [Advanced]:
Ignore score
Automatically selected
JBrowse Custom Track Config [Advanced]:
Custom Track Config Options
JBrowse Contextual Menu options [Advanced]:
Track Menus
Off for new users
No - Do not Override
No - Do not Override
Annotation Track 2
Blast XML
Output dataset 'output1' from step 52
Output dataset 'stdout' from step 25
2
True
False
JBrowse Styling Options [Advanced]:
feature
description
Hit_titles
600px
2400
JBrowse Feature Score Scaling & Coloring Options [Advanced]:
Ignore score
Automatically selected
JBrowse Custom Track Config [Advanced]:
Custom Track Config Options
JBrowse Contextual Menu options [Advanced]:
Track Menus
Off for new users
No - Do not Override
No - Do not Override
Annotation Track 3
GFF/GFF3/BED Features
Output dataset 'default' from step 60,Output dataset 'default' from step 57
False
2000000
False
HTML Features
HTMLFeatures Options [Advanced]:
Empty.
JBrowse Styling Options [Advanced]:
feature
product,name,id
note,description
10px
2400
JBrowse Feature Score Scaling & Coloring Options [Advanced]:
Ignore score
Automatically selected
JBrowse Custom Track Config [Advanced]:
Custom Track Config Options
JBrowse Contextual Menu options [Advanced]:
Track Menus
Off for new users
No - Do not Override
No - Do not Override
Track Group 6
#date# Functional Annotation / Sequence Analysis / Signals
Annotation Tracks
Annotation Track 1
GFF/GFF3/BED Features
Output dataset 'default' from step 48,Output dataset 'default' from step 47,Output dataset 'default' from step 46
False
2000000
False
Canvas Features
CanvasFeatures Options [Advanced]:
Empty.
Empty.
False
JBrowse Styling Options [Advanced]:
feature
product,name,id
note,description
10px
600
JBrowse Feature Score Scaling & Coloring Options [Advanced]:
Ignore score
Automatically selected
JBrowse Custom Track Config [Advanced]:
Custom Track Config Options
JBrowse Contextual Menu options [Advanced]:
Track Menus
Off for new users
No - Do not Override
No - Do not Override
General JBrowse Options [Advanced]:
Empty.
20
True
Empty.
True
True
True
True
False
Plugins:
True
False
False
Empty.
Step 62: Create or Update Organism
Output dataset 'output' from step 61
Autodetect from Apollo JSON
Output dataset 'output' from step 2
Empty.
Empty.
False
None
False
False
no
Step 63: Annotate
Output dataset 'output' from step 62
False