Quick Start Guide
Get started with SAIGE-QTL in 6 simple steps. Follow this guide to run your first eQTL analysis.
Step 1: Choose Your Installation Method
Pick the installation method that best fits your environment:
π PIXI (Recommended for most users)
Best for: Users without root access, no existing conda setup
Installation method 1 (Recommended): Pixi binary installation (using prebuilt binaries)
# install pixi & Restart shell or reload environment to make sure pixi is installed correctly (if needed)
curl -fsSL https://pixi.sh/install.sh | bash
export PATH="$HOME/.pixi/bin:$PATH"
# Download SAIGE-QTL repository
git clone https://github.com/weizhou0/SAIGEQTL.git
cd SAIGEQTL
# Detect pre-built binary (choose one of the commands below to match your system)
Obtain corresponding Binary file based on which system you are using β choose either one of the following command to match your system
## For Linux users (Linux x86_64):
BINARY_FILE=$(ls binaries/SAIGEQTL_*_linux-x86_64.tgz | head -n1)
## For MacOs users (arm64):
BINARY_FILE=$(ls binaries/SAIGEQTL_*_macos.tgz | head -n1)
Install from corresponding binary youβve chosen for your system
# Install from pre-built binary
echo "Installing: $BINARY_FILE"
CONDA_OVERRIDE_GLIBC=2.28 pixi run R -e "
install.packages('${BINARY_FILE}', repos = NULL, type = 'source')
library(SAIGEQTL)
cat('β SAIGEQTL', as.character(packageVersion('SAIGEQTL')), 'installed successfully\n')
"
Installation method 2: Pixi source installation (using pixi to install from source code with all dependencies)
# Install Pixi and SAIGE-QTL
curl -fsSL https://pixi.sh/install.sh | sh
export PATH="$HOME/.pixi/bin:$PATH"
# Clone repository
git clone https://github.com/weizhou0/SAIGEQTL
cd SAIGEQTL
# Install dependencies
CONDA_OVERRIDE_GLIBC=2.28 pixi install --manifest-path=$PWD/pixi.toml
CONDA_OVERRIDE_GLIBC=2.28 pixi run --manifest-path=$PWD/pixi.toml \
R CMD INSTALL .
Your command prefix:
CONDA_OVERRIDE_GLIBC=2.28 pixi run --manifest-path=/path/to/SAIGEQTL/pixi.toml Rscript
π³ Docker (Best for HPC and reproducibility)
Best for: HPC clusters, reproducible environments
# Pull pre-built image
docker pull --platform linux/amd64 wzhou88/saigeqtl:latest
# Test installation
docker run wzhou88/saigeqtl:latest step1_fitNULLGLMM_qtl.R --help
Your command prefix:
docker run -v /your/data:/data wzhou88/saigeqtl:latest
π¬ Singularity (For HPC without Docker)
Best for: HPC clusters without Docker access
# Pull and convert Docker image
singularity pull docker://wzhou88/saigeqtl:latest
# Test installation
singularity exec saigeqtl_latest.sif step1_fitNULLGLMM_qtl.R --help
Your command prefix:
singularity exec --bind /your/data:/data saigeqtl_latest.sif
Step 2: Prepare Your Data
Youβll need three types of files:
Required Files
| File Type | Format | Description | Example |
|---|---|---|---|
| Phenotype | Tab/space-delimited | Gene expression + covariates | phenotypes.txt |
| Genotype | PLINK/VCF/BGEN | Genetic variants | genotypes.bed/bim/fam |
| Regions (optional) | Tab-delimited | cis-regions for genes | cis_regions.txt |
Example Phenotype File Format
individual_id cell_id gene_1 gene_2 age sex PC1 PC2
IND001 CELL_001 45 120 35 M 0.12 -0.05
IND001 CELL_002 52 98 35 M 0.12 -0.05
IND002 CELL_001 38 156 42 F -0.08 0.15
Key columns:
- Individual ID (matches genotype file)
- Cell ID (for single-cell data)
- Gene expression values (one column per gene)
- Covariates (age, sex, PCs, batch effects, etc.)
Example Region File (for cis-eQTL)
1 1000000 2000000
Format: chromosome start end (no header)
Step 3: Fit the Null Model (Step 1)
Run Step 1 for each gene you want to analyze.
π Basic Command Structure
Show me example command
Note: below paths to all inputs can be modified according to your own actual paths as well.
For PIXI Users
cd SAIGEQTL/extdata
pixi run --manifest-path=../pixi.toml Rscript step1_fitNULLGLMM_qtl.R \
--useSparseGRMtoFitNULL=FALSE \
--useGRMtoFitNULL=FALSE \
--phenoFile=./input/seed_1_100_nindep_100_ncell_100_lambda_2_tauIntraSample_0.5_Poisson.txt \
--phenoCol=gene_1 \
--covarColList=X1,X2,pf1,pf2 \
--sampleCovarColList=X1,X2 \
--sampleIDColinphenoFile=IND_ID \
--traitType=count \
--outputPrefix=./output/nindep_100_ncell_100_lambda_2_tauIntraSample_0.5_gene_1 \
--skipVarianceRatioEstimation=FALSE \
--isRemoveZerosinPheno=FALSE \
--isCovariateOffset=FALSE \
--isCovariateTransform=TRUE \
--skipModelFitting=FALSE \
--tol=0.00001 \
--plinkFile=./input/n.indep_100_n.cell_1_01.step1 \
--IsOverwriteVarianceRatioFile=TRUE
For Docker Users
WKDIR=/data/wzhougroup/ # !!!Modify path to your working directory
docker run -w ${WKDIR} wzhou88/saigeqtl:latest \
step1_fitNULLGLMM_qtl.R \
--useSparseGRMtoFitNULL=FALSE \
--useGRMtoFitNULL=FALSE \
--phenoFile=/usr/local/bin/input/seed_1_100_nindep_100_ncell_100_lambda_2_tauIntraSample_0.5_Poisson.txt \
--phenoCol=gene_1 \
--covarColList=X1,X2,pf1,pf2 \
--sampleCovarColList=X1,X2 \
--sampleIDColinphenoFile=IND_ID \
--traitType=count \
--outputPrefix=${WKDIR}nindep_100_ncell_100_lambda_2_tauIntraSample_0.5_gene_1 \
--skipVarianceRatioEstimation=FALSE \
--isRemoveZerosinPheno=FALSE \
--isCovariateOffset=FALSE \
--isCovariateTransform=TRUE \
--skipModelFitting=FALSE \
--tol=0.00001 \
--plinkFile=/usr/local/bin/input/n.indep_100_n.cell_1_01.step1 \
--IsOverwriteVarianceRatioFile=TRUE
For Singularity Users
WKDIR=/data/wzhougroup/ # !!!Modify path to your working directory
PATHTOSIF=/data/wzhougroup/saigeqtl_latest.sif # !!!Modify path to your sif location
singularity exec \
--bind ${WKDIR}:${WKDIR} \
--cleanenv ${PATHTOSIF} \
step1_fitNULLGLMM_qtl.R \
--useSparseGRMtoFitNULL=FALSE \
--useGRMtoFitNULL=FALSE \
--phenoFile=/usr/local/bin/input/seed_1_100_nindep_100_ncell_100_lambda_2_tauIntraSample_0.5_Poisson.txt \
--phenoCol=gene_1 \
--covarColList=X1,X2,pf1,pf2 \
--sampleCovarColList=X1,X2 \
--sampleIDColinphenoFile=IND_ID \
--traitType=count \
--outputPrefix=${WKDIR}nindep_100_ncell_100_lambda_2_tauIntraSample_0.5_gene_1 \
--skipVarianceRatioEstimation=FALSE \
--isRemoveZerosinPheno=FALSE \
--isCovariateOffset=FALSE \
--isCovariateTransform=TRUE \
--skipModelFitting=FALSE \
--tol=0.00001 \
--plinkFile=/usr/local/bin/input/n.indep_100_n.cell_1_01.step1 \
--IsOverwriteVarianceRatioFile=TRUE
π Key Parameters Explained
| Parameter | What to put | Example |
|---|---|---|
--phenoFile | Your phenotype file path | phenotypes.txt |
--phenoCol | Gene/phenotype column name | gene_1 |
--covarColList | All covariates (comma-separated) | age,sex,PC1,PC2,batch |
--sampleCovarColList | Individual-level covariates | age,sex,PC1,PC2 |
--sampleIDColinphenoFile | Individual ID column name | individual_id |
--cellIDColinphenoFile | Cell ID column name | cell_id |
--traitType | Always use count for eQTL | count |
--plinkFile | Genotype file prefix (no extension) | genotypes |
--outputPrefix | Where to save results | output/gene_1 |
--library | Custom library path (if needed) | /path/to/custom/library |
π Note on
--libraryparameter: If you installed SAIGEQTL to a custom library location (e.g., usingR CMD INSTALL --library=custom/path), use this parameter to specify the path. This avoids manually editing wrapper scripts. Not needed for standard installations.
β Step 1 Output
Youβll get two files:
output/gene_1.rda- Model file (needed for Step 2)output/gene_1.varianceRatio.txt- Variance ratio (needed for Step 2)output/gene_1.status.txt- status file indicating Step 1 Analysis Status. This is a file for usersβ reference to confirm if the null model fitting succeeded or failed to converge.
Step 4: Choose Your Analysis Type
Pick the analysis that fits your research question:
π― cis-eQTL Analysis (Most Common)
Use when: Testing variants near genes (within ~1Mb)
Advantages:
- β Higher statistical power
- β Easier interpretation (local regulation)
- β Faster computation
Next: Continue to Step 5A - cis-eQTL Testing
π Genome-wide eQTL Analysis
Use when: Testing all variants across the genome
Advantages:
- β Discover trans-acting effects
- β Comprehensive regulatory networks
- β Identify distant regulators
Next: Continue to Step 5B - Genome-wide Testing
Step 5A: Run cis-eQTL Tests (Step 2)
Test genetic variants in the cis-region (near the gene).
Create Region File
Create a file with your cis-window (e.g., gene location Β± 1Mb):
# Example: gene on chr1, position 1,500,000
# cis-window: 500,000 - 2,500,000 (Β±1Mb)
echo -e "1\t500000\t2500000" > gene_1_cis_region.txt
Run Step 2
Show me example command
Note: below paths to all inputs should be modified according to your own actual paths
For PIXI Users
cd SAIGEQTL/extdata
CONDA_OVERRIDE_GLIBC=2.28 pixi run --manifest-path=../pixi.toml Rscript step2_tests_qtl.R \
--bedFile=./input/n.indep_100_n.cell_1_full.bed \
--bimFile=./input/n.indep_100_n.cell_1_full.bim \
--famFile=./input/n.indep_100_n.cell_1_full.fam \
--SAIGEOutputFile=${step2prefix} \
--chrom=2 \
--minMAF=0 \
--minMAC=20 \
--LOCO=FALSE \
--GMMATmodelFile=${step1prefix}.rda \
--SPAcutoff=2 \
--varianceRatioFile=${step1prefix}.varianceRatio.txt \
--rangestoIncludeFile=${regionFile} \
--markers_per_chunk=10000
For Docker Users
# Similar commands flags as in pixi section above
docker run -v ${WKDIR}:/data wzhou88/saigeqtl:latest step2_tests_qtl.R ...
For Singularity Users
# Similar commands flags as in pixi section above
singularity exec --bind ${WKDIR}:/data saigeqtl_latest.sif step2_tests_qtl.R ...
π Continue to Step 6 - Calculate Gene P-value
Step 5B: Run Genome-wide Analyses (need to restart from a differently tuned step1 as below)
Test all genetic variants across the genome.
Run Analyses β
Please refer run all steps following this page: All Steps for Genome-wide Analysis using Batch Running.
Docker and Singularity users may simply modify the command prefix to (docker run -v ${WKDIR}:/data wzhou88/saigeqtl:latest or singularity exec --bind /data:/data saigeqtl_latest.sif) to use same option flags denoted in the commands in the webpage linked above.
π‘ Tip: Repeat for each chromosome (chr1-chr22, chrX)
Step 6: Calculate Gene-Level P-value (Step 3)
Combine results from all variants into a single gene-level p-value using ACAT.
Run Step 3
Show me example command
Note: below paths to all inputs should be modified according to your own actual paths
For PIXI Users
CONDA_OVERRIDE_GLIBC=2.28 pixi run --manifest-path=../pixi.toml Rscript step3_gene_pvalue_qtl.R \
--assocFile=./output/nindep_100_ncell_100_lambda_2_tauIntraSample_0.5_gene_1_cis \
--geneName=gene_1 \
--genePval_outputFile=./output/nindep_100_ncell_100_lambda_2_tauIntraSample_0.5_gene_1_cis_genePval
For Docker Users
# Similar commands flags as in pixi section above
docker run -v ${WKDIR}:/data wzhou88/saigeqtl:latest step2_tests_qtl.R ...
For Singularity Users
# Similar commands flags as in pixi section above
singularity exec --bind ${WKDIR}:/data saigeqtl_latest.sif step2_tests_qtl.R ...
π‘ Using Custom Library Locations
For all wrapper scripts (step1_fitNULLGLMM_qtl.R, step2_tests_qtl.R, step3_gene_pvalue_qtl.R), you can now specify a custom library path using the --library parameter:
# Example with custom library location
step1_fitNULLGLMM_qtl.R --library=/path/to/custom/library [other_options]
step2_tests_qtl.R --library=/path/to/custom/library [other_options]
step3_gene_pvalue_qtl.R --library=/path/to/custom/library [other_options]
This is especially useful when youβve installed SAIGEQTL to a custom location and eliminates the need to manually edit wrapper scripts with lib.loc specifications.
β Final Output (cis-eQTL)
Your final results file (./output/nindep_100_ncell_100_lambda_2_tauIntraSample_0.5_gene_1_cis_genePval) contains:
- Gene name
- ACAT p-value (combined evidence from all variants)
- Top variant ID
- Top variant p-value
π Congratulations!
Youβve completed your first SAIGE-QTL analysis!
Whatβs Next?
For more genes:
- Repeat Step 3 (fit null model) for each gene
- Run Step 2 and Step 3 for each gene
- Consider parallelizing across genes for efficiency
To learn more:
- π Complete cis-eQTL Tutorial - Detailed workflow with examples
- π Complete Genome-wide eQTL Tutorial - Large-scale analysis
- π Parameters & Options - Complete parameter reference
- β FAQ - Common questions and troubleshooting
Common Next Steps
Analyze multiple genes:
# Create a loop for multiple genes
for gene in gene_1 gene_2 gene_3; do
# Run Step 1 for each gene
# Run Step 2 for each gene
# Run Step 3 for each gene
done
Test rare variants: See Set-based Tests Tutorial for testing rare variants using SKAT-O
Visualize results:
- Plot Manhattan plots of p-values
- Create QQ plots to check for inflation
- Visualize top eQTLs with effect sizes
π‘ Quick Tips
For Single-Cell Data
- β Include cell-level covariates (batch, UMI counts)
- β Include individual-level covariates (age, sex, PCs)
- β
Use
--traitType=countfor read counts - β
Consider using
--offsetColfor log(total UMI)
For Computational Efficiency
- β Run Step 1 in parallel (one job per gene)
- β
Use
--LOCO=FALSEfor cis-eQTL to save time - β
Use
--LOCO=TRUEfor genome-wide to avoid bias - β
Filter variants:
--minMAC=20for common variants
Data Quality
- β Remove low-quality variants (high missing rate)
- β Filter genes with low expression
- β Include appropriate covariates for your study design
- β Check for population stratification
β Need Help?
Common issues:
- File path errors? Make sure to use absolute paths in Docker/Singularity
- Convergence errors? Check your covariates and sample size
- Memory errors? Reduce
--markers_per_chunkparameter
Get support:
- π Check the FAQ for common problems
- π§ Email: wzhou@broadinstitute.org, lhu@broadinstitute.org
- π Report bugs: GitHub Issues