Overview
Large binary files don’t belong in your content repository, but Quarto’s figure generation puts them there by default right. The result? Repositories that grow unbounded and team members who avoid fresh checkouts.
This post shows you how to redirect Quarto’s figure output to a git submodule
, keeping your main repository focused on content while maintaining full version control over your generated assets. It’s a quick configuration change that solves a scalability problem.
This technique only works when rendering with engine: knitr
, not with julia
or jupyter
engines.
Background
This approach came up during a discussion on rOpenSci’s Slack about managing separate repositories or aiming for a mono repository containing assets under version control. Rather than making my own repository public to demonstrate the technique, I created this post and example repositories to show a minimally viable way of implementing this approach. For my own purposes, I’m using this to redirect to a separate repository being served on an asset subdomain.
You might wonder why we use a git submodule rather than git subtrees or simply organizing figures into a figures/
directory tree within the same repository. While directory organization is simpler and git subtrees avoid some submodule complexity, neither solves the core problem: repository size.
With directory organization or git subtrees, all figure files still contribute to the main repository’s size and clone time. Git submodules, however, are entirely separate repositories that are linked by reference. This means:
- The main repository only stores a pointer to a specific commit in the figures repository
- Collaborators can choose whether to download figures or work with content only
- Different access permissions can be set for each repository
- The figures repository can be shared across multiple projects
- The main repository stays lightweight regardless of how many figures you generate
The trade-off is slightly more complex git workflows, but the benefits usually outweigh this cost for content-heavy projects.
Repository Structure
The core concept involves splitting your content across two repositories:
- a main website repository containing your Quarto content; and,
- a separate figures repository that holds all generated plots and images.
The figures repository is then included in your main repository as a git submodule
, creating a clean separation between content and assets. This looks like:
main-website/
├── _quarto.yml
├── index.qmd
├── posts/
│ └── welcome/
│ └── index.qmd
└── figures/ # Git submodule pointing to figures-repo
└── post-figures/
├── plot1.png
└── plot2.png
figures-repo/ # Separate repository
├── post-figures/
│ ├── plot1.png
│ └── plot2.png └── README.md
Visual Comparison
Example Repositories
To see this approach in action, check out these demonstration repositories:
- Main Website: demo-figure-quarto-website
- Shows how a Quarto website is structured with the figures submodule
- Figures Repository: demo-figure-quarto-generated-repo
- Contains the generated figures stored separately
These repositories demonstrate the complete workflow and show how the .gitmodules
file is configured, how figure paths are set up in Quarto documents, and how the submodule appears in the main repository.
Setting Up the Submodule
To implement this structure:
Create the figures repository: Set up a separate repository specifically for your figures
Add as submodule: In your main repository, run:
git submodule add https://github.com/username/figures-repo.git figures
Initialize for collaborators: Others working on your project will need to run:
git submodule update --init --recursive
Replace username
and figures-repo.git
with your actual GitHub username and the name of your figures repository!
How Submodules Appear in Git
When you commit the submodule to your main repository, Git doesn’t store the actual files from the figures repository. Instead, it stores a pointer to a specific commit hash in the external repository. In your git log and file listings, the submodule appears as:
figures @ a1b2c3d
You’ll also notice a .gitmodules
file is created in your repository root, which contains the submodule configuration:
[submodule "figures"]
path = figures url = https://github.com/username/figures-repo.git
This means your main repository only tracks which version of the figures repository to use, keeping the size overhead minimal.
Setting Figure Output Directory
The key to this approach is using fig.path
, which is preferred because it creates relative paths that work consistently across different environments. Since our post files are typically in a posts/
subdirectory, we’ll need to use ../../
to navigate up one level to reach our figures directory at the repository root.
Method 1: YAML Header
Configure the figure path directly in your document’s YAML header:
---
title: "My Post"
knitr:
opts_chunk:
fig.path: ../../figures/post-figures/
---
Method 2: Code Chunk
Set the figure path in your first R code chunk:
```{r}
#| label: setup
::opts_chunk$set(
knitrfig.path = "../../figures/post-figures/"
)```
Tip: Adding Prefixes
You can also add prefixes to figure filenames for better organization:
knitr:
opts_chunk:
fig.path: ../../figures/post-figures/mypost-
This will generate files like mypost-plot1.png
, mypost-plot2.png
, etc.
Tip: Dynamic Figure Organization
You can further organize your figures by automatically using post directory or filename information in your figure paths. This creates a more structured organization that keeps figures for each post neatly separated, making it easier to manage and locate them later.
Using Directory Names
For posts organized in subdirectories:
posts/
└── welcome/ └── index.qmd
Extract the directory name to organize figures:
```{r}
# Get the current working directory
<- getwd()
current_directory
# Extract the last directory name
<- basename(current_directory)
last_directory # "welcome" in this case
# Use it in figure path
::opts_chunk$set(
knitrfig.path = paste0("../figures/post-figures/", last_directory, "/")
)```
This automatically creates ../../figures/post-figures/welcome/
for your figures.
Using Filenames
For posts organized as individual files:
posts/ └── my-post.qmd
Extract the filename to organize figures:
```{r}
# Get the filename without the extension
<- tools::file_path_sans_ext(knitr::current_input())
file_name_without_extension # "my-post" in this case
# Use it in figure path
::opts_chunk$set(
knitrfig.path = paste0("../figures/post-figures/", file_name_without_extension, "/")
)```
This automatically creates ../../figures/post-figures/my-post/
for your figures, keeping each post’s visualizations neatly separated.
Workflow
Once your submodule is set up, your typical workflow becomes:
- Configure figure paths: Use either YAML or code chunk method above
- Render your content: Quarto will generate figures in the submodule directory
- Commit figures: Navigate to the figures directory and commit new/updated figures
- Update main repository: Commit the submodule pointer update in your main repository
This workflow maintains version control over both your content and generated figures while keeping repositories focused and manageable.
Committing Figures to the Submodule
When you render your Quarto document and generate new figures, you’ll need to commit them to both the submodule and update the main repository. Here’s the two-step process required to keep everything in sync.
1. Commit Changes to the Submodule
First, navigate to your submodule directory and commit the new figures:
# Navigate to the submodule
cd figures
# Check what files were generated
git status
# Add new/updated figures
git add post-figures/
git commit -m "Add figures for new blog post"
# Push to the figures repository
git push origin main
2. Update the Main Repository
After committing to the submodule, you need to update the main repository to point to the new commit:
# Navigate back to main repository root
cd ..
# Check the submodule status
git status
# You should see: modified: figures (new commits)
# Add the submodule pointer update
git add figures
git commit -m "Update figures submodule with new blog post figures"
# Push the main repository
git push origin main
The main repository now points to the latest version of your figures, but still maintains its lightweight size since it only stores the commit reference, not the actual image files.
GitHub Actions Considerations
When using GitHub Actions to build and deploy your Quarto site, you’ll need to configure the checkout action to properly handle submodules.
For Any Repository with Submodules
Always include submodules: recursive
in your checkout action:
- name: Checkout
uses: actions/checkout@v5
with:
submodules: recursive
For Private Submodule Repositories
If your figures repository is private, you’ll need to provide a Personal Access Token (PAT) because ${{ github.token }}
is scoped only to the current repository:
- name: Checkout
uses: actions/checkout@v5
with:
submodules: recursive
token: ${{ secrets.GH_PAT }} # `GH_PAT` is a secret that contains your PAT
After creating a Personal Access Token, you can store it in your repository settings as a secret under “Secrets and variables” > “Actions”. This allows GitHub Actions to access the private submodule repository without exposing sensitive information in your workflow files.
Advanced Configuration
Knitr provides several options for figure management discussed on Yihui’s knitr options guide. The main customization options relevant to this submodule approach include:
fig.path
: A prefix concatenated with chunk labels to generate full file paths (preferred for relative paths)base.dir
: Sets an absolute directory under which plots are generated
base.url
: Defines the base URL of images on HTML pages (useful for subdomain hosting)
The options are discussed at length in plots and package-options sections with additional context being available.
Fin
Git submodules solve a real problem: keeping your content repository focused while managing the inevitable growth of generated assets. Instead of watching your repository balloon with every new visualization, you get clean separation between content and figures. This architectural choice pays dividends as your project scales and your team grows.
The initial setup takes maybe 10 minutes of configuration. Your daily workflow adds just two extra git commands to commit figures and update pointers. In return, you get repositories that clone in seconds, collaborators who can work without downloading gigabytes of images, and complete control over how you organize your assets.
If you’re wrestling with large repositories or planning a content-heavy project, this approach is worth implementing. The benefits compound over time as your figure collection grows and more people interact with your work. Your future self (and anyone who clones your repository) will appreciate the foresight.