User guide

Run pre-process the input bam file

This step filter out unwanted reads.

In stranded protocol it is also split the input bam file to 2 output bam files one for each strand.

usage: bt-flor-separate-bam-strands.py [-h] –input-file INPUT_FILE

Pre-process the input bam file

optional arguments:
-h, --help show this help message and exit
--input-file INPUT_FILE
 Full path to input .bam or .sam file (default: None)
--min-mapq MIN_MAPQ
 Minimum quality of the read mapping (default: 10)
--max-gene-length MAX_GENE_LENGTH
 Maximum length of the gene. Reads that will be mapped to longer bases will be discarded (default: 100000)
--filter-cigar FILTER_CIGAR
 Filter out reads with these characters in the cigar. For example: DSHI - filter reads with deletion/insertion/softclipped/hardclipped (default: ‘’)
--filter-tags

TAG_NAME,VALUE TAG_NAME,VALUE … [TAG_NAME,VALUE TAG_NAME,VALUE … …]

Filter out the reads that contain the substring in the value of the tag. For example: –filter-tags XF,__ filter out reads with “__” in value of XF tag (htseq-count indicate that the read mapped to non-genomic region) (default: None)

--is-stranded-protocol
 Is stranded protocol (default: False)
--log-file LOG_FILE
 Log File (default: None)

Run the main program

This step is the main program.

In stranded protocol you need run the command twice: on each of the two bam files that are created in the previous step - plus and minus strand

usage: bt-flor.py [-h] –input-file INPUT_FILE –gtf-output-file

Acurate assembly of transcripts according mapped reads

optional arguments:
-h, --help show this help message and exit
--input-file INPUT_FILE
 Full path to input .bam or .sam file (default: None)
--gtf-output-file GTF_OUTPUT_FILE
 Full path to output file name (default: None)
--is-stranded-protocol
 Is stranded protocol (default: False)
--max-dist-internal-edge-from-average MAX_DIST_INTERNAL_EDGE_FROM_AVERAGE
 Maximum distance between reads in start and end of the internal exons of the trancript (except the start of the first exon and end of the last exon) (default: 3)
--max-dist-external-edge-from-average MAX_DIST_EXTERNAL_EDGE_FROM_AVERAGE
 For non-stranded protocol: maximum distance between reads in start of the first exon and the end of the last exon of the trancript (default: 3)
--max-dist-first-edge-from-average MAX_DIST_FIRST_EDGE_FROM_AVERAGE
 For stranded protocol: maximum distance between reads in start of the first exon of the trancript (default: 3)
--max-dist-last-edge-from-average MAX_DIST_LAST_EDGE_FROM_AVERAGE
 For stranded protocol: maximum distance between reads in end of the last exon in the trancript (sometimes the enzyme drops before the end of the transcript) (default: 50)
--local-average-max-num-positions LOCAL_AVERAGE_MAX_NUM_POSITIONS
 Maximum neighbors positions for which the average will be calculated and the distance from this average will be considered) (default: 5)
--known-sorted-gtf-file KNOWN_SORTED_GTF_FILE
 Full path the known gtf file sorted by chromosome and then by start position in ascending order (you can use the command: bedtools sort -i <gtf-file>). The program will create *.gz (compressed) and *.gz.tbi (index) files in the same location of the known-gtf-file if they don’t exists. (default: None)
--threads THREADS
 number of threads. Each chromosome can run in parallel. (default: 1)
--log-file LOG_FILE
 Log File (default: None)