Quick Start Guide
Get started with SAIGE-QTL in 5 simple steps. Follow this guide to run your first eQTL analysis.
Step 1: Choose Your Installation Method
Pick the installation method that best fits your environment:
π PIXI (Recommended for most users)
Best for: Users without root access, no existing conda setup
# Install Pixi and SAIGE-QTL
curl -fsSL https://pixi.sh/install.sh | sh
export PATH="$HOME/.pixi/bin:$PATH"
# Clone repository
git clone https://github.com/weizhou0/SAIGEQTL
cd SAIGEQTL
# Install dependencies
CONDA_OVERRIDE_GLIBC=2.28 pixi install --manifest-path=$PWD/pixi.toml
CONDA_OVERRIDE_GLIBC=2.28 pixi run --manifest-path=$PWD/pixi.toml \
R CMD INSTALL .
Your command prefix:
CONDA_OVERRIDE_GLIBC=2.28 pixi run --manifest-path=/path/to/SAIGEQTL/pixi.toml Rscript
π³ Docker (Best for HPC and reproducibility)
Best for: HPC clusters, reproducible environments
# Pull pre-built image
docker pull wzhou88/saigeqtl:latest
# Test installation
docker run wzhou88/saigeqtl:latest step1_fitNULLGLMM_qtl.R --help
Your command prefix:
docker run -v /your/data:/data wzhou88/saigeqtl:latest
π¬ Singularity (For HPC without Docker)
Best for: HPC clusters without Docker access
# Pull and convert Docker image
singularity pull docker://wzhou88/saigeqtl:latest
# Test installation
singularity exec saigeqtl_latest.sif step1_fitNULLGLMM_qtl.R --help
Your command prefix:
singularity exec --bind /your/data:/data saigeqtl_latest.sif
Step 2: Prepare Your Data
Youβll need three types of files:
Required Files
File Type | Format | Description | Example |
---|---|---|---|
Phenotype | Tab/space-delimited | Gene expression + covariates | phenotypes.txt |
Genotype | PLINK/VCF/BGEN | Genetic variants | genotypes.bed/bim/fam |
Regions (optional) | Tab-delimited | cis-regions for genes | cis_regions.txt |
Example Phenotype File Format
individual_id cell_id gene_1 gene_2 age sex PC1 PC2
IND001 CELL_001 45 120 35 M 0.12 -0.05
IND001 CELL_002 52 98 35 M 0.12 -0.05
IND002 CELL_001 38 156 42 F -0.08 0.15
Key columns:
- Individual ID (matches genotype file)
- Cell ID (for single-cell data)
- Gene expression values (one column per gene)
- Covariates (age, sex, PCs, batch effects, etc.)
Example Region File (for cis-eQTL)
1 1000000 2000000
2 500000 1500000
Format: chromosome start end
(no header)
Step 3: Fit the Null Model (Step 1)
Run Step 1 for each gene you want to analyze.
π Basic Command Structure
Show me the command for my installation method
For PIXI Users
CONDA_OVERRIDE_GLIBC=2.28 pixi run --manifest-path=/path/to/SAIGEQTL/pixi.toml \
Rscript /path/to/SAIGEQTL/extdata/step1_fitNULLGLMM_qtl.R \
--phenoFile=phenotypes.txt \
--phenoCol=gene_1 \
--covarColList=age,sex,PC1,PC2 \
--sampleCovarColList=age,sex,PC1,PC2 \
--sampleIDColinphenoFile=individual_id \
--cellIDColinphenoFile=cell_id \
--traitType=count \
--plinkFile=genotypes \
--outputPrefix=output/gene_1
For Docker Users
docker run -v /data/myproject:/data wzhou88/saigeqtl:latest \
step1_fitNULLGLMM_qtl.R \
--phenoFile=/data/phenotypes.txt \
--phenoCol=gene_1 \
--covarColList=age,sex,PC1,PC2 \
--sampleCovarColList=age,sex,PC1,PC2 \
--sampleIDColinphenoFile=individual_id \
--cellIDColinphenoFile=cell_id \
--traitType=count \
--plinkFile=/data/genotypes \
--outputPrefix=/data/output/gene_1
For Singularity Users
singularity exec --bind /data:/data saigeqtl_latest.sif \
step1_fitNULLGLMM_qtl.R \
--phenoFile=/data/phenotypes.txt \
--phenoCol=gene_1 \
--covarColList=age,sex,PC1,PC2 \
--sampleCovarColList=age,sex,PC1,PC2 \
--sampleIDColinphenoFile=individual_id \
--cellIDColinphenoFile=cell_id \
--traitType=count \
--plinkFile=/data/genotypes \
--outputPrefix=/data/output/gene_1
π Key Parameters Explained
Parameter | What to put | Example |
---|---|---|
--phenoFile | Your phenotype file path | phenotypes.txt |
--phenoCol | Gene/phenotype column name | gene_1 |
--covarColList | All covariates (comma-separated) | age,sex,PC1,PC2,batch |
--sampleCovarColList | Individual-level covariates | age,sex,PC1,PC2 |
--sampleIDColinphenoFile | Individual ID column name | individual_id |
--cellIDColinphenoFile | Cell ID column name | cell_id |
--traitType | Always use count for eQTL | count |
--plinkFile | Genotype file prefix (no extension) | genotypes |
--outputPrefix | Where to save results | output/gene_1 |
β Step 1 Output
Youβll get two files:
output/gene_1.rda
- Model file (needed for Step 2)output/gene_1.varianceRatio.txt
- Variance ratio (needed for Step 2)
Step 4: Choose Your Analysis Type
Pick the analysis that fits your research question:
π― cis-eQTL Analysis (Most Common)
Use when: Testing variants near genes (within ~1Mb)
Advantages:
- β Higher statistical power
- β Easier interpretation (local regulation)
- β Faster computation
Next: Continue to Step 5A - cis-eQTL Testing
π Genome-wide eQTL Analysis
Use when: Testing all variants across the genome
Advantages:
- β Discover trans-acting effects
- β Comprehensive regulatory networks
- β Identify distant regulators
Next: Continue to Step 5B - Genome-wide Testing
Step 5A: Run cis-eQTL Tests (Step 2)
Test genetic variants in the cis-region (near the gene).
Create Region File
Create a file with your cis-window (e.g., gene location Β± 1Mb):
# Example: gene on chr1, position 1,500,000
# cis-window: 500,000 - 2,500,000 (Β±1Mb)
echo -e "1\t500000\t2500000" > gene_1_cis_region.txt
Run Step 2
Show me the command for my installation method
For PIXI Users
CONDA_OVERRIDE_GLIBC=2.28 pixi run --manifest-path=/path/to/SAIGEQTL/pixi.toml \
Rscript /path/to/SAIGEQTL/extdata/step2_tests_qtl.R \
--vcfFile=chr1.vcf.gz \
--vcfFileIndex=chr1.vcf.gz.csi \
--vcfField=DS \
--chrom=1 \
--rangestoIncludeFile=gene_1_cis_region.txt \
--GMMATmodelFile=output/gene_1.rda \
--varianceRatioFile=output/gene_1.varianceRatio.txt \
--SAIGEOutputFile=output/gene_1_cis_results.txt \
--minMAC=20 \
--LOCO=FALSE
For Docker Users
docker run -v /data/myproject:/data wzhou88/saigeqtl:latest \
step2_tests_qtl.R \
--vcfFile=/data/chr1.vcf.gz \
--vcfFileIndex=/data/chr1.vcf.gz.csi \
--vcfField=DS \
--chrom=1 \
--rangestoIncludeFile=/data/gene_1_cis_region.txt \
--GMMATmodelFile=/data/output/gene_1.rda \
--varianceRatioFile=/data/output/gene_1.varianceRatio.txt \
--SAIGEOutputFile=/data/output/gene_1_cis_results.txt \
--minMAC=20 \
--LOCO=FALSE
For Singularity Users
singularity exec --bind /data:/data saigeqtl_latest.sif \
step2_tests_qtl.R \
--vcfFile=/data/chr1.vcf.gz \
--vcfFileIndex=/data/chr1.vcf.gz.csi \
--vcfField=DS \
--chrom=1 \
--rangestoIncludeFile=/data/gene_1_cis_region.txt \
--GMMATmodelFile=/data/output/gene_1.rda \
--varianceRatioFile=/data/output/gene_1.varianceRatio.txt \
--SAIGEOutputFile=/data/output/gene_1_cis_results.txt \
--minMAC=20 \
--LOCO=FALSE
π Continue to Step 6 - Calculate Gene P-value
Step 5B: Run Genome-wide Tests (Step 2)
Test all genetic variants across the genome.
Run Step 2
Show me the command for my installation method
For PIXI Users
CONDA_OVERRIDE_GLIBC=2.28 pixi run --manifest-path=/path/to/SAIGEQTL/pixi.toml \
Rscript /path/to/SAIGEQTL/extdata/step2_tests_qtl.R \
--vcfFile=chr1.vcf.gz \
--vcfFileIndex=chr1.vcf.gz.csi \
--vcfField=DS \
--chrom=1 \
--GMMATmodelFile=output/gene_1.rda \
--varianceRatioFile=output/gene_1.varianceRatio.txt \
--SAIGEOutputFile=output/gene_1_chr1_results.txt \
--minMAF=0.01 \
--LOCO=TRUE
For Docker Users
docker run -v /data/myproject:/data wzhou88/saigeqtl:latest \
step2_tests_qtl.R \
--vcfFile=/data/chr1.vcf.gz \
--vcfFileIndex=/data/chr1.vcf.gz.csi \
--vcfField=DS \
--chrom=1 \
--GMMATmodelFile=/data/output/gene_1.rda \
--varianceRatioFile=/data/output/gene_1.varianceRatio.txt \
--SAIGEOutputFile=/data/output/gene_1_chr1_results.txt \
--minMAF=0.01 \
--LOCO=TRUE
For Singularity Users
singularity exec --bind /data:/data saigeqtl_latest.sif \
step2_tests_qtl.R \
--vcfFile=/data/chr1.vcf.gz \
--vcfFileIndex=/data/chr1.vcf.gz.csi \
--vcfField=DS \
--chrom=1 \
--GMMATmodelFile=/data/output/gene_1.rda \
--varianceRatioFile=/data/output/gene_1.varianceRatio.txt \
--SAIGEOutputFile=/data/output/gene_1_chr1_results.txt \
--minMAF=0.01 \
--LOCO=TRUE
π‘ Tip: Repeat for each chromosome (chr1-chr22, chrX)
Step 6: Calculate Gene-Level P-value (Step 3)
Combine results from all variants into a single gene-level p-value using ACAT.
Run Step 3
Show me the command for my installation method
For PIXI Users
CONDA_OVERRIDE_GLIBC=2.28 pixi run --manifest-path=/path/to/SAIGEQTL/pixi.toml \
Rscript /path/to/SAIGEQTL/extdata/step3_gene_pvalue_qtl.R \
--assocFile=output/gene_1_cis_results.txt \
--geneName=gene_1 \
--genePval_outputFile=output/gene_1_gene_pvalue.txt
For Docker Users
docker run -v /data/myproject:/data wzhou88/saigeqtl:latest \
step3_gene_pvalue_qtl.R \
--assocFile=/data/output/gene_1_cis_results.txt \
--geneName=gene_1 \
--genePval_outputFile=/data/output/gene_1_gene_pvalue.txt
For Singularity Users
singularity exec --bind /data:/data saigeqtl_latest.sif \
step3_gene_pvalue_qtl.R \
--assocFile=/data/output/gene_1_cis_results.txt \
--geneName=gene_1 \
--genePval_outputFile=/data/output/gene_1_gene_pvalue.txt
β Final Output
Your final results file (gene_1_gene_pvalue.txt
) contains:
- Gene name
- ACAT p-value (combined evidence from all variants)
- Top variant ID
- Top variant p-value
π Congratulations!
Youβve completed your first SAIGE-QTL analysis!
Whatβs Next?
For more genes:
- Repeat Step 3 (fit null model) for each gene
- Run Step 2 and Step 3 for each gene
- Consider parallelizing across genes for efficiency
To learn more:
- π Complete cis-eQTL Tutorial - Detailed workflow with examples
- π Genome-wide eQTL Tutorial - Large-scale analysis
- π Parameters & Options - Complete parameter reference
- β FAQ - Common questions and troubleshooting
Common Next Steps
Analyze multiple genes:
# Create a loop for multiple genes
for gene in gene_1 gene_2 gene_3; do
# Run Step 1 for each gene
# Run Step 2 for each gene
# Run Step 3 for each gene
done
Test rare variants: See Set-based Tests Tutorial for testing rare variants using SKAT-O
Visualize results:
- Plot Manhattan plots of p-values
- Create QQ plots to check for inflation
- Visualize top eQTLs with effect sizes
π‘ Quick Tips
For Single-Cell Data
- β Include cell-level covariates (batch, UMI counts)
- β Include individual-level covariates (age, sex, PCs)
- β
Use
--traitType=count
for read counts - β
Consider using
--offsetCol
for log(total UMI)
For Computational Efficiency
- β Run Step 1 in parallel (one job per gene)
- β
Use
--LOCO=FALSE
for cis-eQTL to save time - β
Use
--LOCO=TRUE
for genome-wide to avoid bias - β
Filter variants:
--minMAC=20
for common variants
Data Quality
- β Remove low-quality variants (high missing rate)
- β Filter genes with low expression
- β Include appropriate covariates for your study design
- β Check for population stratification
β Need Help?
Common issues:
- File path errors? Make sure to use absolute paths in Docker/Singularity
- Convergence errors? Check your covariates and sample size
- Memory errors? Reduce
--markers_per_chunk
parameter
Get support:
- π Check the FAQ for common problems
- π§ Email: wzhou@broadinstitute.org
- π Report bugs: GitHub Issues