Quick Start Guide

Get started with SAIGE-QTL in 5 simple steps. Follow this guide to run your first eQTL analysis.


Step 1: Choose Your Installation Method

Pick the installation method that best fits your environment:

🐍 PIXI (Recommended for most users)

Best for: Users without root access, no existing conda setup

# Install Pixi and SAIGE-QTL
curl -fsSL https://pixi.sh/install.sh | sh
export PATH="$HOME/.pixi/bin:$PATH"

# Clone repository
git clone https://github.com/weizhou0/SAIGEQTL
cd SAIGEQTL

# Install dependencies
CONDA_OVERRIDE_GLIBC=2.28 pixi install --manifest-path=$PWD/pixi.toml
CONDA_OVERRIDE_GLIBC=2.28 pixi run --manifest-path=$PWD/pixi.toml \
    R CMD INSTALL .

Your command prefix:

CONDA_OVERRIDE_GLIBC=2.28 pixi run --manifest-path=/path/to/SAIGEQTL/pixi.toml Rscript

πŸ“– Detailed PIXI Installation Guide

🐳 Docker (Best for HPC and reproducibility)

Best for: HPC clusters, reproducible environments

# Pull pre-built image
docker pull wzhou88/saigeqtl:latest

# Test installation
docker run wzhou88/saigeqtl:latest step1_fitNULLGLMM_qtl.R --help

Your command prefix:

docker run -v /your/data:/data wzhou88/saigeqtl:latest

πŸ“– Detailed Docker Installation Guide

πŸ”¬ Singularity (For HPC without Docker)

Best for: HPC clusters without Docker access

# Pull and convert Docker image
singularity pull docker://wzhou88/saigeqtl:latest

# Test installation
singularity exec saigeqtl_latest.sif step1_fitNULLGLMM_qtl.R --help

Your command prefix:

singularity exec --bind /your/data:/data saigeqtl_latest.sif

πŸ“– Detailed Singularity Installation Guide


Step 2: Prepare Your Data

You’ll need three types of files:

Required Files

File Type Format Description Example
Phenotype Tab/space-delimited Gene expression + covariates phenotypes.txt
Genotype PLINK/VCF/BGEN Genetic variants genotypes.bed/bim/fam
Regions (optional) Tab-delimited cis-regions for genes cis_regions.txt

Example Phenotype File Format

individual_id  cell_id         gene_1  gene_2  age  sex  PC1    PC2
IND001        CELL_001        45      120     35   M    0.12   -0.05
IND001        CELL_002        52      98      35   M    0.12   -0.05
IND002        CELL_001        38      156     42   F    -0.08  0.15

Key columns:

  • Individual ID (matches genotype file)
  • Cell ID (for single-cell data)
  • Gene expression values (one column per gene)
  • Covariates (age, sex, PCs, batch effects, etc.)

Example Region File (for cis-eQTL)

1    1000000    2000000
2     500000    1500000

Format: chromosome start end (no header)


Step 3: Fit the Null Model (Step 1)

Run Step 1 for each gene you want to analyze.

πŸ“ Basic Command Structure

Show me the command for my installation method

For PIXI Users

CONDA_OVERRIDE_GLIBC=2.28 pixi run --manifest-path=/path/to/SAIGEQTL/pixi.toml \
    Rscript /path/to/SAIGEQTL/extdata/step1_fitNULLGLMM_qtl.R \
    --phenoFile=phenotypes.txt \
    --phenoCol=gene_1 \
    --covarColList=age,sex,PC1,PC2 \
    --sampleCovarColList=age,sex,PC1,PC2 \
    --sampleIDColinphenoFile=individual_id \
    --cellIDColinphenoFile=cell_id \
    --traitType=count \
    --plinkFile=genotypes \
    --outputPrefix=output/gene_1

For Docker Users

docker run -v /data/myproject:/data wzhou88/saigeqtl:latest \
    step1_fitNULLGLMM_qtl.R \
    --phenoFile=/data/phenotypes.txt \
    --phenoCol=gene_1 \
    --covarColList=age,sex,PC1,PC2 \
    --sampleCovarColList=age,sex,PC1,PC2 \
    --sampleIDColinphenoFile=individual_id \
    --cellIDColinphenoFile=cell_id \
    --traitType=count \
    --plinkFile=/data/genotypes \
    --outputPrefix=/data/output/gene_1

For Singularity Users

singularity exec --bind /data:/data saigeqtl_latest.sif \
    step1_fitNULLGLMM_qtl.R \
    --phenoFile=/data/phenotypes.txt \
    --phenoCol=gene_1 \
    --covarColList=age,sex,PC1,PC2 \
    --sampleCovarColList=age,sex,PC1,PC2 \
    --sampleIDColinphenoFile=individual_id \
    --cellIDColinphenoFile=cell_id \
    --traitType=count \
    --plinkFile=/data/genotypes \
    --outputPrefix=/data/output/gene_1

πŸ”‘ Key Parameters Explained

Parameter What to put Example
--phenoFile Your phenotype file path phenotypes.txt
--phenoCol Gene/phenotype column name gene_1
--covarColList All covariates (comma-separated) age,sex,PC1,PC2,batch
--sampleCovarColList Individual-level covariates age,sex,PC1,PC2
--sampleIDColinphenoFile Individual ID column name individual_id
--cellIDColinphenoFile Cell ID column name cell_id
--traitType Always use count for eQTL count
--plinkFile Genotype file prefix (no extension) genotypes
--outputPrefix Where to save results output/gene_1

βœ… Step 1 Output

You’ll get two files:

  • output/gene_1.rda - Model file (needed for Step 2)
  • output/gene_1.varianceRatio.txt - Variance ratio (needed for Step 2)

Step 4: Choose Your Analysis Type

Pick the analysis that fits your research question:

🎯 cis-eQTL Analysis (Most Common)

Use when: Testing variants near genes (within ~1Mb)

Advantages:

  • βœ… Higher statistical power
  • βœ… Easier interpretation (local regulation)
  • βœ… Faster computation

Next: Continue to Step 5A - cis-eQTL Testing

🌐 Genome-wide eQTL Analysis

Use when: Testing all variants across the genome

Advantages:

  • βœ… Discover trans-acting effects
  • βœ… Comprehensive regulatory networks
  • βœ… Identify distant regulators

Next: Continue to Step 5B - Genome-wide Testing


Step 5A: Run cis-eQTL Tests (Step 2)

Test genetic variants in the cis-region (near the gene).

Create Region File

Create a file with your cis-window (e.g., gene location Β± 1Mb):

# Example: gene on chr1, position 1,500,000
# cis-window: 500,000 - 2,500,000 (Β±1Mb)
echo -e "1\t500000\t2500000" > gene_1_cis_region.txt

Run Step 2

Show me the command for my installation method

For PIXI Users

CONDA_OVERRIDE_GLIBC=2.28 pixi run --manifest-path=/path/to/SAIGEQTL/pixi.toml \
    Rscript /path/to/SAIGEQTL/extdata/step2_tests_qtl.R \
    --vcfFile=chr1.vcf.gz \
    --vcfFileIndex=chr1.vcf.gz.csi \
    --vcfField=DS \
    --chrom=1 \
    --rangestoIncludeFile=gene_1_cis_region.txt \
    --GMMATmodelFile=output/gene_1.rda \
    --varianceRatioFile=output/gene_1.varianceRatio.txt \
    --SAIGEOutputFile=output/gene_1_cis_results.txt \
    --minMAC=20 \
    --LOCO=FALSE

For Docker Users

docker run -v /data/myproject:/data wzhou88/saigeqtl:latest \
    step2_tests_qtl.R \
    --vcfFile=/data/chr1.vcf.gz \
    --vcfFileIndex=/data/chr1.vcf.gz.csi \
    --vcfField=DS \
    --chrom=1 \
    --rangestoIncludeFile=/data/gene_1_cis_region.txt \
    --GMMATmodelFile=/data/output/gene_1.rda \
    --varianceRatioFile=/data/output/gene_1.varianceRatio.txt \
    --SAIGEOutputFile=/data/output/gene_1_cis_results.txt \
    --minMAC=20 \
    --LOCO=FALSE

For Singularity Users

singularity exec --bind /data:/data saigeqtl_latest.sif \
    step2_tests_qtl.R \
    --vcfFile=/data/chr1.vcf.gz \
    --vcfFileIndex=/data/chr1.vcf.gz.csi \
    --vcfField=DS \
    --chrom=1 \
    --rangestoIncludeFile=/data/gene_1_cis_region.txt \
    --GMMATmodelFile=/data/output/gene_1.rda \
    --varianceRatioFile=/data/output/gene_1.varianceRatio.txt \
    --SAIGEOutputFile=/data/output/gene_1_cis_results.txt \
    --minMAC=20 \
    --LOCO=FALSE

πŸ“Š Continue to Step 6 - Calculate Gene P-value


Step 5B: Run Genome-wide Tests (Step 2)

Test all genetic variants across the genome.

Run Step 2

Show me the command for my installation method

For PIXI Users

CONDA_OVERRIDE_GLIBC=2.28 pixi run --manifest-path=/path/to/SAIGEQTL/pixi.toml \
    Rscript /path/to/SAIGEQTL/extdata/step2_tests_qtl.R \
    --vcfFile=chr1.vcf.gz \
    --vcfFileIndex=chr1.vcf.gz.csi \
    --vcfField=DS \
    --chrom=1 \
    --GMMATmodelFile=output/gene_1.rda \
    --varianceRatioFile=output/gene_1.varianceRatio.txt \
    --SAIGEOutputFile=output/gene_1_chr1_results.txt \
    --minMAF=0.01 \
    --LOCO=TRUE

For Docker Users

docker run -v /data/myproject:/data wzhou88/saigeqtl:latest \
    step2_tests_qtl.R \
    --vcfFile=/data/chr1.vcf.gz \
    --vcfFileIndex=/data/chr1.vcf.gz.csi \
    --vcfField=DS \
    --chrom=1 \
    --GMMATmodelFile=/data/output/gene_1.rda \
    --varianceRatioFile=/data/output/gene_1.varianceRatio.txt \
    --SAIGEOutputFile=/data/output/gene_1_chr1_results.txt \
    --minMAF=0.01 \
    --LOCO=TRUE

For Singularity Users

singularity exec --bind /data:/data saigeqtl_latest.sif \
    step2_tests_qtl.R \
    --vcfFile=/data/chr1.vcf.gz \
    --vcfFileIndex=/data/chr1.vcf.gz.csi \
    --vcfField=DS \
    --chrom=1 \
    --GMMATmodelFile=/data/output/gene_1.rda \
    --varianceRatioFile=/data/output/gene_1.varianceRatio.txt \
    --SAIGEOutputFile=/data/output/gene_1_chr1_results.txt \
    --minMAF=0.01 \
    --LOCO=TRUE

πŸ’‘ Tip: Repeat for each chromosome (chr1-chr22, chrX)


Step 6: Calculate Gene-Level P-value (Step 3)

Combine results from all variants into a single gene-level p-value using ACAT.

Run Step 3

Show me the command for my installation method

For PIXI Users

CONDA_OVERRIDE_GLIBC=2.28 pixi run --manifest-path=/path/to/SAIGEQTL/pixi.toml \
    Rscript /path/to/SAIGEQTL/extdata/step3_gene_pvalue_qtl.R \
    --assocFile=output/gene_1_cis_results.txt \
    --geneName=gene_1 \
    --genePval_outputFile=output/gene_1_gene_pvalue.txt

For Docker Users

docker run -v /data/myproject:/data wzhou88/saigeqtl:latest \
    step3_gene_pvalue_qtl.R \
    --assocFile=/data/output/gene_1_cis_results.txt \
    --geneName=gene_1 \
    --genePval_outputFile=/data/output/gene_1_gene_pvalue.txt

For Singularity Users

singularity exec --bind /data:/data saigeqtl_latest.sif \
    step3_gene_pvalue_qtl.R \
    --assocFile=/data/output/gene_1_cis_results.txt \
    --geneName=gene_1 \
    --genePval_outputFile=/data/output/gene_1_gene_pvalue.txt

βœ… Final Output

Your final results file (gene_1_gene_pvalue.txt) contains:

  • Gene name
  • ACAT p-value (combined evidence from all variants)
  • Top variant ID
  • Top variant p-value

πŸŽ‰ Congratulations!

You’ve completed your first SAIGE-QTL analysis!

What’s Next?

For more genes:

  1. Repeat Step 3 (fit null model) for each gene
  2. Run Step 2 and Step 3 for each gene
  3. Consider parallelizing across genes for efficiency

To learn more:

Common Next Steps

Analyze multiple genes:

# Create a loop for multiple genes
for gene in gene_1 gene_2 gene_3; do
    # Run Step 1 for each gene
    # Run Step 2 for each gene
    # Run Step 3 for each gene
done

Test rare variants: See Set-based Tests Tutorial for testing rare variants using SKAT-O

Visualize results:

  • Plot Manhattan plots of p-values
  • Create QQ plots to check for inflation
  • Visualize top eQTLs with effect sizes

πŸ’‘ Quick Tips

For Single-Cell Data

  • βœ… Include cell-level covariates (batch, UMI counts)
  • βœ… Include individual-level covariates (age, sex, PCs)
  • βœ… Use --traitType=count for read counts
  • βœ… Consider using --offsetCol for log(total UMI)

For Computational Efficiency

  • βœ… Run Step 1 in parallel (one job per gene)
  • βœ… Use --LOCO=FALSE for cis-eQTL to save time
  • βœ… Use --LOCO=TRUE for genome-wide to avoid bias
  • βœ… Filter variants: --minMAC=20 for common variants

Data Quality

  • βœ… Remove low-quality variants (high missing rate)
  • βœ… Filter genes with low expression
  • βœ… Include appropriate covariates for your study design
  • βœ… Check for population stratification

❓ Need Help?

Common issues:

  • File path errors? Make sure to use absolute paths in Docker/Singularity
  • Convergence errors? Check your covariates and sample size
  • Memory errors? Reduce --markers_per_chunk parameter

Get support:

  • πŸ“– Check the FAQ for common problems
  • πŸ“§ Email: wzhou@broadinstitute.org
  • πŸ› Report bugs: GitHub Issues