NGS QC Toolkit: A toolkit for the quality control (QC) of next generation sequencing (NGS) data. The toolkit comprises of user-friendly stand alone tools for quality control of the sequence data generated using Illumina and Roche 454 platforms with detailed results in the form of tables and graphs, and filtering of high-quality sequence data. It also includes few other tools, which are helpful in NGS data quality control and analysis.

Tools in NGS QC Toolkit

  • QC Tools
    • IlluQC.pl
      Tool for quality control of sequencing data generated using Illumina platform (FASTQ format)
    • IlluQC_PRLL.pl
      Tool for quality control of sequencing data generated using Illumina platform (uses multiple CPUs)
    • 454QC.pl
      Tool for quality control of sequencing data generated using 454 platform (read and quality in FASTA format)
    • 454QC_PRLL.pl
      Tool for quality control of sequencing data generated using 454 platform (uses multiple CPUs)
    • 454QC_PE.pl
      Tool for quality control of paired-end sequencing data generated using 454 platform (read and quality in FASTA format)

  • Format-converter Tools
    • SangerFastqToIlluFastq.pl
      Tool to convert fastq-sanger variant to fastq-illumina variant of FASTQ format
    • SolexaFastqToIlluFastq.pl
      Tool to convert fastq-solexa variant to fastq-illumina variant of FASTQ format
    • FastqTo454.pl
      Tool to convert FASTQ format (any variant) to 454 format (two files in FASTA format: one for reads/sequences (.fna) and another for quality (.qual))
    • FastqToFasta.pl
      Tool to convert FASTQ format file to FASTA format file for reads/sequences

  • Trimming Tools
    • TrimmingReads.pl
      Tool for trimming reads from 5' and/or 3' end of the read (FASTQ or FASTA format)
    • HomoPolymerTrimming.pl
      Tool for trimming 3' end of the reads from the first base of homopolymer of given length
    • AmbiguityFiltering.pl
      Tool for filtering reads containing ambiguous bases or trimming flanking ambiguous bases

  • Statistics Tools
    • AvgQuality.pl
      Tool to calculate average quality score for each read and overall quality score for the given FASTA quality file
    • N50Stat.pl
      Tool to generate statistics for read/sequence data given in FASTA format (total number of reads/sequences, total bases and minimum, maximum, average, median, N25, N50, N75, N90 and N95 read/sequence length)

Downloads

NGS QC Toolkit v2.3.3

Download

Documentation

Download

Sample data

Direct download: Input-Data.zip and Output-Data.zip
  • Input-Data
  • Output-Data
    • QC
      • 454QC
        • Paired-end
          • 454ReadsPE.fna_PE_summary.png (Download)
          • 454ReadsPE.fna_UP_summary.png (Download)
          • 454ReadsPE.fna_baseCompostion.png (Download)
          • 454ReadsPE.fna_filtered (Download)
          • 454ReadsPE.fna_gcDistribution.png (Download)
          • 454ReadsPE.fna_lenDistribution.png (Download)
          • 454ReadsPE.fna_qualDistribution.png (Download)
          • 454ReadsPE.fna_stat (Download)
          • 454ReadsPE.qual_filtered (Download)
          • output_454ReadsPE.fna.html (Download)
        • Single-end
      • IlluQC
        • Paired-end
          • output_pairedEnd_1.fastq_pairedEnd_2.fastq.html (Download)
          • pairedEnd_1.fastq_QualRangePerBase.png (Download)
          • pairedEnd_1.fastq_avgQual.png (Download)
          • pairedEnd_1.fastq_baseCompostion.png (Download)
          • pairedEnd_1.fastq_filtered (Download)
          • pairedEnd_1.fastq_filtered_QualRangePerBase.png (Download)
          • pairedEnd_1.fastq_gcDistribution.png (Download)
          • pairedEnd_1.fastq_pairedEnd_2.fastq_stat (Download)
          • pairedEnd_1.fastq_pairedEnd_2.fastq_summary.png (Download)
          • pairedEnd_1.fastq_pairedEnd_2.fastq_unPaired_HQReads (Download)
          • pairedEnd_1.fastq_qualDistribution.png (Download)
          • pairedEnd_2.fastq_QualRangePerBase.png (Download)
          • pairedEnd_2.fastq_avgQual.png (Download)
          • pairedEnd_2.fastq_baseCompostion.png (Download)
          • pairedEnd_2.fastq_filtered (Download)
          • pairedEnd_2.fastq_filtered_QualRangePerBase.png (Download)
          • pairedEnd_2.fastq_gcDistribution.png (Download)
          • pairedEnd_2.fastq_qualDistribution.png (Download)
        • Single-end
          • output_singleEnd.fastq.html (Download)
          • singleEnd.fastq_QualRangePerBase.png (Download)
          • singleEnd.fastq_avgQual.png (Download)
          • singleEnd.fastq_baseCompostion.png (Download)
          • singleEnd.fastq_filtered (Download)
          • singleEnd.fastq_filtered_QualRangePerBase.png (Download)
          • singleEnd.fastq_gcDistribution.png (Download)
          • singleEnd.fastq_qualDistribution.png (Download)
          • singleEnd.fastq_stat (Download)
          • singleEnd.fastq_summary.png (Download)
    • Statistics
    • Trimming
      • HomopolymerTrimming.pl
      • TrimmingReads.pl

Releases

NGSQCTookit v2.3.3: (Date: 3rd February 2014)
  • Minor modification in the output format compatible with the Windows 32 bit servers (QC tools)
NGSQCTookit v2.3.2: (Date: 8th November 2013)
  • Bug fixed version (of 454QC_PRLL.pl)
NGSQCTookit v2.3.1: (Date: 29th September 2013)
  • Compatibility with the latest version of Perl's threads module (454QC_PRLL.pl)
NGSQCTookit v2.3: (Date: 30th August 2012)
  • Introducing a new tool for filtering and/or trimming ambiguous base content (AmbiguityFiltering.pl)
  • Added support for paired-end sequencing data (TrimmingReads.pl)
  • Compatibility with primer/adaptor sequences of length less than 20bp in contamination removal module (QC Tools)
  • Bug fixed (Minor corrections in the code to remove unnecessary error messages) (IlluQC_PRLL.pl)
NGSQCTookit v2.2.3: (Date: 27th Feb 2012)
  • Option to provide a linker sequence used in the preparation of paired-end library (454QC_PE tool)
NGSQCTookit v2.2.2: (Date: 18th Feb 2012)
  • Bug fixed (Minor correction in the code for input data varification) (IlluQC tools)
  • Added support for Illumina pipeline 1.8+ (TrimmingReads.pl and FastqTo454.pl)
NGSQCTookit v2.2.1: (Date: 3rd Feb 2012)
  • Compatibility with the data generated using recent Illumina chemistry (Illumina pipeline 1.8+) (IlluQC tools)
  • Detection of additional input FASTQ variant (Illumina 1.8+) (IlluQC tools)
NGSQCTookit v2.2:
  • Printing average quality score at each base position in statistics file (IlluQC tools)
  • Generating a graph showing percentage of reads falling into different quality score ranges at each base position (IlluQC tools)
  • An additional option for trimming reads based on quality score (TrimmingReads.pl)
  • Statistics for individual (A,T,G,C and N) bases and G+C and A+T counts (N50Stat.pl)
  • Addition of new tool for QC of Roche 454 Paired-end data, 454QC_PE.pl
NGSQCTookit v2.1:
  • Reading (input)/Writing (output) compressed data files (gzip) (IlluQC and 454QC tools)
  • Option to input primer/adaptor sequences used in non-standard sequencing protocols, for QC filtering (IlluQC and 454QC tools)
  • Generating consolidated QC report in HTML format (IlluQC and 454QC tools)
  • Feature to export one of the paired reads as HQ filtered output, which pass the filter criteria (IlluQC tools)
  • Detection of additional input FASTQ variants (Sanger, Solexa, Illumina 1.3+ and Illumina 1.5+) (IlluQC tools)
NGSQCTookit v2.0:
  • Parallelization of QC tools to speed-up the processing
  • Additional feature to generate graphs for various QC statistics (GC content, average length and average quality distribution, average quality score at each base position and pie charts for summary of quality control analysis and base composition)
NGSQCTookit v1.0:
  • Tools with basic functionalities for QC of Illumina and 454 sequencing data (high-quality filtering, primer/adaptor contamination removal, homopolymer trimming and length filtering)
  • Tools for file format conversion, sequence statistics and trimming
  • Additional feature to automatically detect variant of input FASTQ file (Sanger and Illumina)

Citation

Patel RK, Jain M (2012). NGS QC Toolkit: A toolkit for quality control of next generation sequencing data. PLoS ONE, 7(2): e30619.

(Toolkit downloads: 21777 (since 18th Feb 2012))
National Institute of Plant Genome Research, Aruna Asaf Ali Marg, New Delhi, India
For Questions and Suggestions, contact Mukesh Jain (mjain@nipgr.ac.in); Ravi Patel (ravi_patel_4@yahoo.co.in)