Galaxy | CPT Public | Published Workflow | Shine-Dalgarno from single protein ID

Shine-Dalgarno from single protein ID

Annotation: Input a protein accession and receive the upstream genomic sequence

Step	Annotation
Step 1: NCBI ELink From NCBI Database Protein ELink command to execute Neighbor: Fetch a set of UIDs in DB linked to input UIDs in DBFROM To NCBI Database Nucleotide Select source for IDs Direct Entry ID List Empty.	Input NCBI Protein Accession
Step 2: XPath Input XML data Output dataset 'default' from step 1 XPath Query //Id/text()
Step 3: Select last Select last 1 from Output dataset 'output' from step 2
Step 4: NCBI EFetch NCBI Database to Use Nuccore Output Format GenBank with Contig Sequences (text) Select source for IDs File containing IDs (one per line) ID List Output dataset 'out_file1' from step 3 NCBI API Key 4d4c37fdab732a93bbab2f748d4ba63d9309
Step 5: Collapse Collection Collection of files to collapse into single dataset Output dataset 'output1' from step 4 Keep one header line False Prepend File name False
Step 6: Split Genbank On Qualifier Genbank file Output dataset 'output' from step 5 Which qualifier(s) to check against (Space separated list) protein_id Value(s) that qualifier must match for feature to be extracted (Space separated list) Not available. Number of additional bases upstream to extract 0 Number of additional bases downstream to extract 0	Enter the same protein accession here as used in step 1
Step 7: (CPT) Genbank to GFF3: GenBank file Output dataset 'output' from step 5 Automatically generate any missing Gene features if CDS/RBS has none True Automatically generate missing mRNA features for genes True Qualifier to derive GFF ID from protein_id
Step 8: (CPT) Genbank to GFF3: GenBank file Output dataset 'output' from step 6 Automatically generate any missing Gene features if CDS/RBS has none True Automatically generate missing mRNA features for genes True Qualifier to derive GFF ID from protein_id
Step 9: Rebase GFF feature tree GenBank file Output dataset 'default' from step 8 Types to change (space separated): CDS New tree (space separated, first type will be top-most feature): gene mRNA
Step 10: Shine Find Reference Genome From History Source FASTA Sequence Output dataset 'fastaOut' from step 7 GFF3 Annotations Output dataset 'output' from step 9 Minimum number of bases upstream of CDS for gap (--lookahead_min) 3 Maximum number of bases upstream of CDS for gap (--lookahead_max) 17 Automatically add RBSs to input GFF3 True Only report best hits (--top_only) True

About this Workflow

Author

jolenerr

Related Workflows

All published workflows
Published workflows by jolenerr

Rating

Community
(0 ratings, 0.0 average)

Tags

Community:

none