NGS C++ Library, v2


The libseq 2 software is a C++ programming library with facilities designed to analyze genomes produced by sequencets that come after the Next Generation Sequencing (NGS) machines. These sequencers can be held in the palm of your hand, and can be attached to mobile devices. It is under current development and it should not be considered production-ready. The software makes use of heavy templating in order to achieve a runtime boost by using static polymorphism.

Our new version runs on x86 processors (e.g., Intel, AMD) and on ARM processors found on almost all smartphones and tablets. The version 2 will be available to download as soon as possible.

libseq 1.0

The old libseq 1.0 provides classes for the following NGS concepts:

  • DNA sequence container traits (e.g., strings)
  • Generic DNA sequences with variadic properties
  • Sequence specialization for FASTQ and FASTA properties
  • Reverse-Forward matching sequences
  • Kmer list containers (serial and concurrent)
  • Kmer generators
  • De Bruijn graph generator between diverse sequence classes
  • Specialization for de Bruijn graph (e.g., kmers)
  • Parsers for FASTA and FASTQ

The library is wholly parametrized via templates. For instance, you can create a graph that matches any given sequence list against another, with custom node and edge weights. As an example, given the list of all reads, and the list of all kmers, you can create a graph that links reads with kmers, with nodes as kmers and reads, and weighted edges indicating the position of a kmer in any given read. We are also implementing cache-oblivious data structures along with out-of-core algorithms to exploit modern architectures and to process data with limited amounts of RAM.

Note: The libseq 1.0 library incorporates the SeqDB library by Mark Howison, published in “High-throughput compression of FASTQ data with SeqDB”, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 10(1): 213–218, 2013. See the BitBucket page for SeqDB for more information.