SAM-TB
Home
My data
My analysis
My samples
Help
FAQ
中文
Sign in
线上提问
##FAQ ###Questions about website functions ####1. What are the functions of this website? What can it do? SAM-TB is a user-friendly, function-rich platform for analysis of Mycobacterium tuberculosis whole genome sequencing data derived from Illumina pair-end sequencing. The website currently has the following functions: (1) Quality control for the sequencing data: the website can evaluate the sequencing quality of samples and improve data and analysis quality through quality control. (2) Species identification for NTM: the website will conduct NTM species identification for samples in which ≤ 95% reads map to the M. tuberculosis reference strain. If the sample belongs to one of the 175 NTM species that can be detected by mlstverse software [1], its species will be reported as the identified species; otherwise, the species of the sample will be reported as unknown. If the sample is detected as a mixed sample of MTB and NTM, its species will be reported as MTB and the mixed NTM species. (3) Identification of mixed Mycobacterium tuberculosis (MTB) infections: For samples belonging to MTB, the website uses Mixinfect[2] software to detect the presence of mixed infections with two or more MTB strains. If a mixed infection is identified, the estimated proportion of the predominant strain in the sample will be reported. It should be noted that when the proportion of the minor strain in the sample is below 10%, the software’s ability to detect mixed infections may decrease. (4) Variants detection and annotation: the website will perform variants detection and annotation for samples containing complete MTBC specific sequences with average depth ≥5. SAM-TB reports genome-wide variants detected in the analyzed samples, including SNPs, short indels, and long deletions. (5) Core Genome Multilocus Sequence Typing (cgMLST) analysis: The website determines the cgMLST allelic profile of MTB strains based on the published cgMLST scheme that includes 2891 core genes[3]. (6) Prediction of drug susceptibility and resistance: the website can predict drug-resistance to 17 anti-tuberculosis drugs (isoniazid, rifampicin, ethambutol, pyrazinamide, streptomycin, ethioniamide, amikacin, kanamycin, capreomycin, ofloxacin, moxifloxacin, para-aminosalicylic acid, cycloserine, linezolid, clofazimine, bedaquiline, and delamanid) for MTBC samples; it reports drug-resistance mutations and their frequency, and annotates mutations with low confidence for predicting drug resistance (including both low confidence [4] and unknown confidence mutations). This website can also assess the susceptibility of samples to the four first-line drugs (see " Prediction of susceptibility and resistance to first-line drugs" in Help). (7) MTBC lineage classification: the website will classify the MTBC samples into L1-9 , M. bovis, M.caprae, and M.orygis. (8) Pairwise SNP distance analysis: the website can perform pairwise SNP distance analysis for multiple samples. (9) Phylogenetic tree construction: the website can construct phylogenetic trees for 4~2000 samples. Genomic clustering strains can be identified by mapping strains with a genetic distance less than a specified threshold (e.g. 12 SNPs). Note: ①The website also provide variant detection and annotation, drug-resistance prediction, and lineage classification for MTB and NTM mixed samples, but the results may be not accurate. ②If you select samples that are not applicable for genetic relationship analysis to perform the pairwise SNP distance analysis and the phylogenetic tree construction, the results may be affected. ####2. I have a large amount of sequencing data, can I use the website for the analysis? Currently, the upper limit of samples that can be analyzed per user is 100. When the analysis reaches the upper limit, you can download the results and delete the analyzed data to perform additional analyses. If you need to increase the upper limit, you can request that your limit be increased by sending an email to: samtb2018@163.com. ####3. After the data analysis is completed, can the results be exported? Yes. The website can export the results of quality control, drug resistance prediction, drug resistance mutations and genome-wide variants for the analyzed samples. In addition, the results of the analyses of multiple samples for quality control, detection of genome-wide variants, and mDST by detection of drug-resistance mutations can be exported in batch. The variants of multiple samples are integrated to show their distribution in each of the samples. ###Questions about the input ####1. What is the requirements for the format of sequencing data? Currently, the website only provides analysis for paired-end sequencing data, and only supports uploading “fastq/fastq.gz” files. If your sequencing data is in “bam” format, you can convert it to “fastq” format on the [bamtofastq](http://bedtools.readthedocs.io/en/latest/content/tools/bamtofastq.html) webpage. ####2. Are there any requirements for phylogenetic tree reconstruction? The phylogenetic tree reconstruction requires at least 4 samples and no more than 2000 samples. But if the number of samples exceed 500, the tree construction (based on SNP) will be very slow when the maximum likelihood method is used, but slightly faster with the Maximum Parsimony method. The website provides “fasta” file that used for constructing the tree. You can download it and use other software to construct the phylogenetic tree. Additionally, the website offers a tree-building function based on cgMLST, which can significantly accelerate the speed of phylogenetic tree reconstruction. ###Questions about the website usage ####1. How do I use the website? (1) Create an account: after entering the SAM-TB homepage, click "sign in". Enter the registered email address and password on the login page. If you haven't applied for an account yet, click "sign up" and fill in the information requested. You will receive an email with the initial password of the account. The password can be modified in the “User setting”. (2) In “Help” you will find information on how to upload data, create an analysis and view or download results. ####2. Can you provide a progress bar to indicate how the analysis is progressing so we can get an idea of how long it will be until we can see the results. Sorry, we cannot provide a progress bar because the analysis process is affected by many factors, such as the cluster status and system task scheduling. Usually, if an analysis of “single sample variants analysis” is started, the results of genome-wide variants, drug resistance, drug resistance mutation, and lineage classification can be obtained in about 35 minutes. ####3. If I accidentally close the webpage halfway, will the running analysis be affected? If an analysis has begun, it will not be affected by closing or refreshing the webpage. However, if the sequencing data is uploading, closing or refreshing the webpage will interrupt it. If the data uploading is interrupted, the re-uploaded file will be resumed from the breakpoint. However, while the data is uploading, you can open other web pages to search and browse. ####4. Can I only link samples and data one by one at a time? I have too much data. This website provides two methods to link data with corresponding samples: ① Click and enter the page “My data”. Click “Batch operations” and select “Batch upload Matadata”. Click “Download template” to download the template file and fill in the information requested and then upload the edited file to the website. If the upload is successful, all samples in the file will be linked to the corresponding data. ②Click and enter the page “My data”. Click “Create batch analysis”. Click “Download sample input template” to download the template file and fill in the information requested and then upload the edited file to the website. If the upload is successful, all samples in the file will be linked to the corresponding data and the website will automatically perform "single sample variants analysis" for them. ####5. How can I retrieve a sample I accidentally deleted? Click and enter the “My samples” page. Click "Recycle bin" and choose the sample to restore it. ####6. Why can't the browser export the result tables? The website recommends using Google Chrome, Safari or Firefox browsers. Browsers such as QQ do not support exporting the results table. ###Questions about website security ####1. How to change the password? Click the person avatar on the rightmost side of the first-level navigation bar. Select “User setting” and enter the “Reset password” page to change password. The password cannot be viewed in the system background, so please make sure to **save your password** yourself. ####2. What should I do if I forget my password? Click “Forget Password?” on the login page. Enter your email in the floating window and click “Retrieve password”. Enter your registered email in the pull-down menu. You will receive an email with a new password. ####3. My data are all unpublished sequencing data. Will the website leak the data? Do not worry about data security. We have provided a detailed “User agreement”, including “Rules of use”, “Information protection”, "Risk warning", "Disclaimer" and other content when you register your account. Any patient information is not required when you create an analysis. ###Questions about the analysis results ####1. What does each item of quality control mean? If the result of quality control is “failed”, will the analysis results be completely useless? (1) For the specific meaning of items of quality control, please refer to: [use FastQC to check the quality of the raw data of high-throughput sequencing](https://www.cnblogs.com/longjianggu/p/5078782.html). (2) The result of quality control is judged by the values of “average depth” and “10X coverage” (see “QC” page). However, if samples passed the quality control, but the results of several quality control items are "bad", the predictions of susceptibility or resistance to the antibiotics may not be accurate. (3) For samples with failed quality control results, if the “average depth” is ≥ 20 (or slightly lower than 20), and 10X coverage ≥ 97% (or slightly lower than 97%), the result of variants detection can be used, but the predictions of resistance and sensitivity to antibiotics may not be accurate. ####2. Why do some variants not provide specific sequence data in the “ref” column, such as C...T1358del? What does it mean? For deletions larger than 50bp, the website will not show the whole sequence on the reference genome. Such deletions are annotated as the "start base" "..." "end base" "deletion length" "del". For instance, “C...T1358del” means that there is a 1358bp deletion, and the bases at the start and stop positions are “C” and “T”, respectively. If there is a large deletion that includes genes and/or intergenic regions, the "Location of gene" column is annotated with the positions on the chromosome, such as Chromosome:1541947_1543304. The "Gene" and "Symbol" columns list all the genes or intergenic regions involved, such as Rv1369c, Rv1370c, lprF-Rv1369c, Rv1370c-Rv1371. ####3. I want to know all the mutations in a certain gene, such as katG, or a certain type of mutation in the sample, such as all the non-synonymous mutations. Can the website help me? Yes. Click "Analysis ID" on the “My Analysis” page to enter the “Dashboard” page of the analysis. Then switch to the “Variants” page. You can search or filter variants on this page. ① Enter the gene name on the search box in the upper right corner of the page to view all variants in that gene. For example, if you enter “katG”, you will get all the variants detected in the katG gene. ② Click the screen icon on the right side of the page to screen the variants based on position, variation type, etc. There are 11 types of variants: Delete, Delete_frameshift, Delete_non_frameshift, Delete_SV (Structural variation, here are deletions of chromosomal segments larger than 50bp), Insert, Insert_frameshift, Insert_non_frameshift, Intergenic, Nonsynonymous, Small RNA, Synonymous. Among these, the indels in intergenic regions are annotated as Delete or Insert, and the indels in genes are annotated as Delete_frameshift, Delete_non_frameshift, Insert_frameshift or Insert_non_frameshift. The deletions across genes and/or intergenic regions are annotated as Delete_SV. Enter a certain type of variation in the “Variation type” box to see all variations of this type in the genome. For example, enter “Nonsynonymous” in the box to get all non-synonymous mutations in the sample. ####4. Why are some drug resistant mutations not found on the "Variants" page? Only fixed mutations with a frequency of ≥75% are reported on the "Variants" page, while drug-resistant mutations with a frequency ≥10% are reported on the “DR mutations” page, in order to detect heterogeneous drug resistance (heteroresistance) in the sample. ####5. What is the use of the batch exported genome-wide variants? This is a result that integrates the genome-wide variants of multiple samples. Each row in the file represents a variant. The first 12 columns indicate the mutation information, the 13th column indicates how many samples have the variant, and the following columns are the samples with the variant and their mapping results. You can view the distribution of certain variants in the samples and perform specific analysis by combining the distribution of certain variants with the phenotypes of the samples. ####6. How to use the downloaded phylogenetic tree file? The phylogenetic tree file downloaded from the SAM-TB website is in “.nwk” format (a text format that converts the phylogenetic tree into each branch node, not a graph of the phylogenetic tree). The file is essentially a text file, and you can view it with Notepad. To see the tree, go to the “MEGA” official website, download and install the appropriate version of MEGA 7/10. Open MEGA 7/10, click on "File" in the upper left corner and select “Open file” to import the nwk file to produce a phylogenetic tree, which should be consistent with the tree shown on the SAM-TB website. You can browse the official website of MEGA to learn more. The “.nwk” file can also be imported into tools like figtree, ggtree and iTOL websites to be annotated and the presentation improved. ####7. Why are two strains belonging to different lineages identified as clustered strains, with SNP distances < 12 in the analysis of "Pairwise SNP distance"? Generally, it is because the sample include mixed MTB complex strains. You can first check whether the quality control for “Single sample variants analysis” is passed. Then check whether the number of variants of the samples is much different from other samples. Strains belong to the same Lineage usually accumulate a similar number of SNPs. Strains with an unusually large or small number of SNPs (ie, outliers) can be problematic and should be excluded from the analysis. The problematic samples may be the result of sequencing errors or the sample itself may be contaminated with another strain. ###References [1] Yuki Matsumoto, Takeshi Kinjo, Daisuke Motooka, Daijiro Nabeya, Nicolas Jung, Kohei Uechi, Toshihiro Horii, Tetsuya Iida, Jiro Fujita & Shota Nakamura. Comprehensive subspecies identification of 175 nontuberculous mycobacteria species based on 7547 genomic profiles. Emerging Microbes & Infections. 2019;8(1):1043-1053. doi: 10.1080/22221751.2019.1637702. PubMed PMID: 31287781; PubMed Central PMCID: PMC6691804. [2] Sobkowiak, B., Glynn, J.R., Houben, R.M.G.J. et al. Identifying mixed Mycobacterium tuberculosis infections from whole genome sequence data. BMC Genomics 19, 613 (2018). https://doi.org/10.1186/s12864-018-4988-z [3] Kohl TA, Harmsen D, Rothgänger J, Walker T, Diel R, Niemann S (2018) Harmonized genome wide typing of tubercle bacilli using a web-based gene-by-gene nomenclature system. EBioMedicine 34:131–138. https://doi.org/10.1016/j.ebiom.2018.07.030 [4] Paolo Miotto, Belay Tessema, Elisa Tagliani, Leonid Chindelevitch, Angela M Starks, Claudia Emerson, Debra Hanna, Peter S Kim, Richard Liwski, Matteo Zignol, Christopher Gilpin, Stefan Niemann, Claudia M Denkinger, Joy Fleming, Robin M Warren, Derrick Crook, James Posey, Sebastien Gagneux, Sven Hoffner, Camilla Rodrigues, Iñaki Comas, David M Engelthaler, Megan Murray, David Alland, Leen Rigouts, Christoph Lange, Keertan Dheda, Rumina Hasan, Uma Devi K Ranganathan, Ruth McNerney, Matthew Ezewudo, Daniela M Cirillo, Marco Schito, Claudio U Köser, Timothy C Rodwell. A standardised method for interpreting the association between mutations and phenotypic drug resistance in Mycobacterium tuberculosis. European Respiratory Journal. 2017;50(6):1701354. doi: 10.1183/13993003.01354-2017. PubMed PMID: 29284687; PubMed Central PMCID: PMC5898944.
<< 返回帮助中心
标题:
描述:
感谢您使用我们的服务,我们将尽快邮件回复您。