Skip to content

FastQC in Galaxy

After sequencing, the reads should be checked for their quality.

  • This tutorial demonstrates how to use the tool called FastQC to examine bacterial paired-end Illumina sequence reads.
  • The FastQC website is here.

New to Galaxy? First try the introduction and then learn some key tasks

Import the data

  • Log in to your Galaxy instance (for example, Galaxy Australia, usegalaxy.org.au).
  • Create a new history for this analysis.
  • In a new browser tab, go to this webpage:

DOI

  • Find the file called mutant_R1.fastq
  • Right click on file name: select “copy link address”
  • In Galaxy, go to Get Data and then Upload File
  • Click Paste/Fetch data
  • A box will appear: paste in link address
  • Click Start
  • Click Close
  • The file will now appear in the top of your history panel.

The file name is quite long: let’s change it:

  • Click on the pencil icon next to the file name.
  • In the centre Galaxy panel, click in the box under Name
  • Shorten the file name to mutant_R1.fastq
  • Then click Save

rename

FASTQ is a file format for sequence reads that displays quality scores for each of the sequenced nucleotides.

  • For more information about FASTQ format see this link.
  • We will evaluate the mutant_R1.fastq reads using the FastQC tool.

Run FastQC

In the Tool panel search box, search for “FastQC”; then click on the tool FastQC.

The tool interface will appear in the centre Galaxy panel.

  • for Short read data from your current history: mutant_R1.fastq
  • Click Execute
  • In the History pane, click on the “refresh” icon to see if the analysis has finished.

Examine output files

Once finished, examine the output called FastQC on data1:webpage (Hint: click the eye icon). It has a summary at the top of the page and a number of graphs.

Look at:

  • Basic Statistics

    • Sequence length: will be important in setting maximum k-mer size value for assembly.
    • Encoding: The quality encoding type is important for quality trimming software.
    • % GC: high GC organisms don’t tend to assemble well and may have an uneven read coverage distribution.
    • Total sequences: Total number of reads: gives you an idea of coverage.
  • Per base sequence quality: Dips in quality near the beginning, middle or end of the reads: determines possible trimming/cleanup methods and parameters and may indicate technical problems with the sequencing process/machine run. In this case, all the reads are of relatively high quality across their length (150 bp).

sequence quality graph

  • Per base N content: Presence of large numbers of Ns in reads may point to a poor quality sequencing run. You would need to trim these reads to remove Ns.

General questions you might ask about your input reads include:

  • How good is my read set?
  • Do I need to ask for a new sequencing run?
  • Is it suitable for the analysis I need to do?

For a fuller discussion of FastQC outputs and warnings, see:

For a more general introduction to quality control, see:

What’s next?

To use the tutorials on this website:

  • ← see the list in the left hand panel
  • ↖ or, click the menu button (three horizontal bars) in the top left of the page

You can find more tutorials at the Galaxy Training Network: