TL;DR Today I learned that you can hide irrelevant changes from Github PRs. The syntax for .gitattributes can be tricky though.

In my current project, it’s sometimes necessary to re-generate and commit a bunch of files to the repository. This can happen when you store some IaC or test data in the repository.

Sample repository

To demonstrate the problem, here’s an example repository: https://github.com/majk-p/nice-pr-diffs that features two PRs:

The first one is hard to read https://github.com/majk-p/nice-pr-diffs/pull/1/files because it features a wall of text - new testing data that has been added.

In the second one https://github.com/majk-p/nice-pr-diffs/pull/2/files the change is the same, but I’ve added .gitattributes file with following content:

test-data/** linguist-generated=true

Thanks to that, file content is not rendered with an option to load diff on demand.

Tricky .gitattributes syntax

Although Github documentation states that

A .gitattributes file uses the same rules for matching as .gitignore files

Adding something like test-data linguist-generated=true does not exclude large files within it. Referencing the directory is not enough and you need to be specific - explicitly state the rule applies to all files recursively in the directory.

Summary

Next time you want to make your diff look nice and clean, just create .gitattributes with following content, replacing my-unreadable-data with the directory of your large files.

my-unreadable-data/** linguist-generated=true

If you’re using Gitlab, it seems similar thing is achievable with -diff instead of linguist-generated=true according to this SO answer but I haven’t tested that.