SAIGE-QTL Analysis Overview

SAIGE-QTL follows a multi-step analysis pipeline designed for efficient and accurate single-cell eQTL mapping. The workflow is optimized to handle the unique challenges of single-cell data, including complex correlation structures and discrete count distributions.

Analysis Workflow

SAIGE-QTL workflow overview showing the multi-step analysis pipeline

Analysis Types

SAIGE-QTL supports two main types of eQTL analysis:

1. cis-eQTL Analysis

  • Tests genetic variants near genes (typically within 1Mb)
  • Uses both single-variant tests (common variants) and set-based tests (rare variants)
  • Steps: Step 1 → Step 2 → Step 3 (optional gene-level p-values)

2. Genome-wide eQTL Analysis

  • Tests all genetic variants across the genome for each gene
  • Optimized for computational efficiency when analyzing multiple genes
  • Steps: Step 1 → Step 2 (batch processing multiple genes)

Key Components

Step 1: Null Model Fitting

  • Fits a Poisson mixed model for each gene
  • Accounts for:
    • Cell-cell correlation
    • Cell-level and individual-level covariates
    • Total read count normalization
  • Output: Model parameters and variance components for Step 2

Step 2: Association Testing

  • Performs genetic association tests using the null model from Step 1
  • Supports:
    • Single-variant tests for common variants
    • Set-based tests for rare/low-frequency variants
  • Output: Association statistics and p-values

Step 3: Gene-level Analysis (Optional)

  • Combines variant-level results into gene-level statistics
  • Useful for rare variant analysis and pathway studies
  • Output: Gene-level p-values and effect estimates

Data Requirements

Input Files

  • Phenotype file: Gene expression counts with covariates
  • Genotype files: PLINK, BGEN, VCF, BCF, or SAV format
  • Group files: For set-based rare variant tests (optional)

Computational Considerations

Scalability Features

  • Efficient memory usage for large datasets
  • Parallel processing capabilities
  • Optimized for:
    • 20,000+ genes
    • Millions of cells
    • Millions of genetic variants
    • Hundreds of cell types

Performance Tips

  • Step 1 can be run independently for each gene (parallelizable)
  • Step 2 benefits from batch processing multiple genes
  • Use appropriate compute resources based on dataset size

Next Steps

  1. Install SAIGE-QTL - Set up your analysis environment
  2. Step 1 Guide - Fit null models for your genes
  3. Calling SAIGE-QTL - Execute the analysis pipeline
  4. cis-eQTL or Genome-wide eQTL - Choose your analysis type