User guide¶

Run pre-process the input bam file¶

This step filter out unwanted reads.

In stranded protocol it is also split the input bam file to 2 output bam files one for each strand.

usage: bt-flor-separate-bam-strands.py [-h] –input-file INPUT_FILE

Pre-process the input bam file

optional arguments:

`-h, --help`	show this help message and exit
`--input-file INPUT_FILE`
	Full path to input .bam or .sam file (default: None)
`--min-mapq MIN_MAPQ`
	Minimum quality of the read mapping (default: 10)
`--max-gene-length MAX_GENE_LENGTH`
	Maximum length of the gene. Reads that will be mapped to longer bases will be discarded (default: 100000)
`--filter-cigar FILTER_CIGAR`
	Filter out reads with these characters in the cigar. For example: DSHI - filter reads with deletion/insertion/softclipped/hardclipped (default: ‘’)
`--filter-tags`	TAG_NAME,VALUE TAG_NAME,VALUE … [TAG_NAME,VALUE TAG_NAME,VALUE … …] Filter out the reads that contain the substring in the value of the tag. For example: –filter-tags XF,__ filter out reads with “__” in value of XF tag (htseq-count indicate that the read mapped to non-genomic region) (default: None)
`--is-stranded-protocol`
	Is stranded protocol (default: False)
`--log-file LOG_FILE`
	Log File (default: None)

Run the main program¶

This step is the main program.

In stranded protocol you need run the command twice: on each of the two bam files that are created in the previous step - plus and minus strand

usage: bt-flor.py [-h] –input-file INPUT_FILE –gtf-output-file

Acurate assembly of transcripts according mapped reads

optional arguments:

`-h, --help`	show this help message and exit
`--input-file INPUT_FILE`
	Full path to input .bam or .sam file (default: None)
`--gtf-output-file GTF_OUTPUT_FILE`
	Full path to output file name (default: None)
`--is-stranded-protocol`
	Is stranded protocol (default: False)
`--max-dist-internal-edge-from-average MAX_DIST_INTERNAL_EDGE_FROM_AVERAGE`
	Maximum distance between reads in start and end of the internal exons of the trancript (except the start of the first exon and end of the last exon) (default: 3)
`--max-dist-external-edge-from-average MAX_DIST_EXTERNAL_EDGE_FROM_AVERAGE`
	For non-stranded protocol: maximum distance between reads in start of the first exon and the end of the last exon of the trancript (default: 3)
`--max-dist-first-edge-from-average MAX_DIST_FIRST_EDGE_FROM_AVERAGE`
	For stranded protocol: maximum distance between reads in start of the first exon of the trancript (default: 3)
`--max-dist-last-edge-from-average MAX_DIST_LAST_EDGE_FROM_AVERAGE`
	For stranded protocol: maximum distance between reads in end of the last exon in the trancript (sometimes the enzyme drops before the end of the transcript) (default: 50)
`--local-average-max-num-positions LOCAL_AVERAGE_MAX_NUM_POSITIONS`
	Maximum neighbors positions for which the average will be calculated and the distance from this average will be considered) (default: 5)
`--known-sorted-gtf-file KNOWN_SORTED_GTF_FILE`
	Full path the known gtf file sorted by chromosome and then by start position in ascending order (you can use the command: bedtools sort -i <gtf-file>). The program will create .gz (compressed) and .gz.tbi (index) files in the same location of the known-gtf-file if they don’t exists. (default: None)
`--threads THREADS`
	number of threads. Each chromosome can run in parallel. (default: 1)
`--log-file LOG_FILE`
	Log File (default: None)