FastQC in Galaxy
After sequencing, the reads should be checked for their quality.
- This tutorial demonstrates how to use the tool called FastQC to examine bacterial paired-end Illumina sequence reads.
- The FastQC website is here.
Import the data
- Log in to your Galaxy instance (for example, Galaxy Australia, usegalaxy.org.au).
- Create a new history for this analysis.
- In a new browser tab, go to this webpage:
- Find the file called mutant_R1.fastq 
- Right click on file name: select “copy link address”
- In Galaxy, go to Get Data and then Upload File
- Click Paste/Fetch data 
- A box will appear: paste in link address
- Click Start 
- Click Close 
- The file will now appear in the top of your history panel.
The file name is quite long: let’s change it:
- Click on the pencil icon next to the file name.
- In the centre Galaxy panel, click in the box under Name 
- Shorten the file name to mutant_R1.fastq 
- Then click Save 

FASTQ is a file format for sequence reads that displays quality scores for each of the sequenced nucleotides.
- For more information about FASTQ format see this link.
- We will evaluate the mutant_R1.fastq reads using the FastQC tool.
Run FastQC
In the Tool panel search box, search for “FastQC”; then click on the tool 
The tool interface will appear in the centre Galaxy panel.
- for Short read data from your current history :mutant_R1.fastq 
- Click Execute 
- In the History pane, click on the “refresh” icon to see if the analysis has finished.
Examine output files
Once finished, examine the output called 
Look at:
- 
Basic Statistics - Sequence length : will be important in setting maximum k-mer size value for assembly.
- Encoding : The quality encoding type is important for quality trimming software.
- % GC : high GC organisms don’t tend to assemble well and may have an uneven read coverage distribution.
- Total sequences : Total number of reads: gives you an idea of coverage.
 
- 
Per base sequence quality : Dips in quality near the beginning, middle or end of the reads: determines possible trimming/cleanup methods and parameters and may indicate technical problems with the sequencing process/machine run. In this case, all the reads are of relatively high quality across their length (150 bp).

- Per base N content : Presence of large numbers of Ns in reads may point to a poor quality sequencing run. You would need to trim these reads to remove Ns.
General questions you might ask about your input reads include:
- How good is my read set?
- Do I need to ask for a new sequencing run?
- Is it suitable for the analysis I need to do?
For a fuller discussion of FastQC outputs and warnings, see:
- the FastQC website link, including the section on each of the output reports, and examples of “good” and “bad” Illumina data.
For a more general introduction to quality control, see:
What’s next?
To use the tutorials on this website:
- ← see the list in the left hand panel
- ↖ or, click the menu button (three horizontal bars) in the top left of the page
You can find more tutorials at the Galaxy Training Network: