Sequence File Formats

Sequence file formats for a variety of data analysis options

Choose your preferred format for downstream analysis of sequencing data

File Formats for Illumina Sequencing

Numerous options are available for converting data to compatible sequence file formats such as FASTQ files, and for downstream analysis of sequencing data. Illumina sequencers are designed so data can be easily streamed into Illumina Connected Analytics and BaseSpace Sequence Hub for cloud-based data management, analysis, and collaboration.

Raw data files are provided in sequence file formats that are compatible, or easily converted, to standardized data formats for streamlined aggregation and mining of large cohorts. With the DRAGEN BioIT platform, the newest file format, FASTQ.ORA, is available. FASTQ.ORA is a lossless compression file reducing the size, time to transfer, and storage cost.

FASTQ Sequence File Format

FASTQ is a text-based sequencing data file format that stores both raw sequence data and quality scores. FASTQ files have become the standard format for storing NGS data from Illumina sequencing systems, and can be used as input for a wide variety of secondary data analysis solutions.

The MiniSeq and MiSeq Sequencing Systems provide the option to automatically convert data from BCL to FASTQ format, so separate conversion software is not required.

Learn More About FASTQ Files

FASTQ ORA Sequence File Format

FASTQ ORA is a binary compressed file format of the text-based FASTQ sequencing data file format. fastq.ora files are up to 5x smaller than their corresponding fastq.gz files without compromising data integrity. All fastq.ora files can be read using the free decompression software available here. Once installed, a simple command can be used to directly pipe the output of decompression on the fly into a wide range of popular mapping tools such as BWA,1 STAR,2 and Bowtie.3 DRAGEN ORA compression is available with the DRAGEN server and on-board the NextSeq1000/2000.

BCL Sequence File Format

The binary base call (BCL) sequence file format requires conversion to FASTQ format for use with user-developed or third-party data analysis tools. The NextSeq and HiSeq Sequencing Systems and NovaSeq 6000 generate raw data files in BCL format.

The DRAGEN Bio-IT Platform offers rapid BCL conversion to FASTQ files as part of its suite of pipelines.

Illumina also offers stand alone BCL Convert Software to convert BCL files to FASTQ files. BCL Convert is a standalone conversion software solution that demultiplexes data and converts BCL files to standard FASTQ file formats for downstream analysis.

Other Sequence File Formats

FASTQ files are the typical starting format for sequencing data analysis. However, BaseSpace Sequence Hub can create other file formats that are common to secondary and tertiary analysis programs.

During secondary or tertiary analysis of NGS data, software platforms and apps in the Illumina informatics platforms will often convert raw sequence files from FASTQ files to other sequence file formats (ie, .vcf, .bam) as part of the analysis workflow.

Interested in receiving newsletters, case studies, and information on genomic analysis techniques?

Enter your email address.

Additional Resources

Developer Portal

Access user guides, release notes, and additional technical information.

Online Training

These free online courses cover common topics in library prep, sequencing, and data analysis.

Illumina DRAGEN Bio-IT Platform Training

Learn more about the accurate, ultra-rapid secondary analysis platform and accompanying pipelines.

Enterprise-Level Protection

To meet the most stringent security requirements, the Illumina Connected Analytics Platform was built with security and compliance at its core.

References
  1. Li H. and Durbin R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics. 2009 Jul 15; 25(14): 1754–1760.
  2. Dobin A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013 Jan; 29(1): 15–21.
  3. Langmead B. et al. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology 2009 10:R25