User guide¶
Run pre-process the input bam file¶
This step filter out unwanted reads.
In stranded protocol it is also split the input bam file to 2 output bam files one for each strand.
usage: bt-flor-separate-bam-strands.py [-h] –input-file INPUT_FILE
Pre-process the input bam file
- optional arguments:
-h, --help show this help message and exit --input-file INPUT_FILE Full path to input .bam or .sam file (default: None) --min-mapq MIN_MAPQ Minimum quality of the read mapping (default: 10) --max-gene-length MAX_GENE_LENGTH Maximum length of the gene. Reads that will be mapped to longer bases will be discarded (default: 100000) --filter-cigar FILTER_CIGAR Filter out reads with these characters in the cigar. For example: DSHI - filter reads with deletion/insertion/softclipped/hardclipped (default: ‘’) --filter-tags TAG_NAME,VALUE TAG_NAME,VALUE … [TAG_NAME,VALUE TAG_NAME,VALUE … …]
Filter out the reads that contain the substring in the value of the tag. For example: –filter-tags XF,__ filter out reads with “__” in value of XF tag (htseq-count indicate that the read mapped to non-genomic region) (default: None)
--is-stranded-protocol Is stranded protocol (default: False) --log-file LOG_FILE Log File (default: None)
Run the main program¶
This step is the main program.
In stranded protocol you need run the command twice: on each of the two bam files that are created in the previous step - plus and minus strand
usage: bt-flor.py [-h] –input-file INPUT_FILE –gtf-output-file
Acurate assembly of transcripts according mapped reads
- optional arguments:
-h, --help show this help message and exit --input-file INPUT_FILE Full path to input .bam or .sam file (default: None) --gtf-output-file GTF_OUTPUT_FILE Full path to output file name (default: None) --is-stranded-protocol Is stranded protocol (default: False) --max-dist-internal-edge-from-average MAX_DIST_INTERNAL_EDGE_FROM_AVERAGE Maximum distance between reads in start and end of the internal exons of the trancript (except the start of the first exon and end of the last exon) (default: 3) --max-dist-external-edge-from-average MAX_DIST_EXTERNAL_EDGE_FROM_AVERAGE For non-stranded protocol: maximum distance between reads in start of the first exon and the end of the last exon of the trancript (default: 3) --max-dist-first-edge-from-average MAX_DIST_FIRST_EDGE_FROM_AVERAGE For stranded protocol: maximum distance between reads in start of the first exon of the trancript (default: 3) --max-dist-last-edge-from-average MAX_DIST_LAST_EDGE_FROM_AVERAGE For stranded protocol: maximum distance between reads in end of the last exon in the trancript (sometimes the enzyme drops before the end of the transcript) (default: 50) --local-average-max-num-positions LOCAL_AVERAGE_MAX_NUM_POSITIONS Maximum neighbors positions for which the average will be calculated and the distance from this average will be considered) (default: 5) --known-sorted-gtf-file KNOWN_SORTED_GTF_FILE Full path the known gtf file sorted by chromosome and then by start position in ascending order (you can use the command: bedtools sort -i <gtf-file>). The program will create *.gz (compressed) and *.gz.tbi (index) files in the same location of the known-gtf-file if they don’t exists. (default: None) --threads THREADS number of threads. Each chromosome can run in parallel. (default: 1) --log-file LOG_FILE Log File (default: None)