FastQC in Galaxy
After sequencing, the reads should be checked for their quality.
- This tutorial demonstrates how to use the tool called FastQC to examine bacterial paired-end Illumina sequence reads.
- The FastQC website is here.
Import the data
- Log in to your Galaxy instance (for example, Galaxy Australia, usegalaxy.org.au).
- Create a new history for this analysis.
- In a new browser tab, go to this webpage:
- Find the file called
- Right click on file name: select “copy link address”
- In Galaxy, go to
Get Dataand then Upload File
- A box will appear: paste in link address
- The file will now appear in the top of your history panel.
The file name is quite long: let’s change it:
- Click on the pencil icon next to the file name.
- In the centre Galaxy panel, click in the box under
- Shorten the file name to
- Then click
FASTQ is a file format for sequence reads that displays quality scores for each of the sequenced nucleotides.
- For more information about FASTQ format see this link.
- We will evaluate the
mutant_R1.fastqreads using the FastQC tool.
In the Tool panel search box, search for “FastQC”; then click on the tool
The tool interface will appear in the centre Galaxy panel.
Short read data from your current history: mutant_R1.fastq
- In the History pane, click on the “refresh” icon to see if the analysis has finished.
Examine output files
Once finished, examine the output called
Basic Statistics Sequence length: will be important in setting maximum k-mer size value for assembly. Encoding: The quality encoding type is important for quality trimming software. % GC: high GC organisms don’t tend to assemble well and may have an uneven read coverage distribution. Total sequences: Total number of reads: gives you an idea of coverage.
Per base sequence quality: Dips in quality near the beginning, middle or end of the reads: determines possible trimming/cleanup methods and parameters and may indicate technical problems with the sequencing process/machine run. In this case, all the reads are of relatively high quality across their length (150 bp).
Per base N content: Presence of large numbers of Ns in reads may point to a poor quality sequencing run. You would need to trim these reads to remove Ns.
General questions you might ask about your input reads include:
- How good is my read set?
- Do I need to ask for a new sequencing run?
- Is it suitable for the analysis I need to do?
For a fuller discussion of FastQC outputs and warnings, see:
- the FastQC website link, including the section on each of the output reports, and examples of “good” and “bad” Illumina data.
For a more general introduction to quality control, see:
To use the tutorials on this website:
- ← see the list in the left hand panel
- ↖ or, click the menu button (three horizontal bars) in the top left of the page
You can find more tutorials at the Galaxy Training Network: