github-linguist/linguist
{ "createdAt": "2011-05-09T22:53:13Z", "defaultBranch": "main", "description": "Language Savant. If your repository's language is being reported incorrectly, send us a pull request!", "fullName": "github-linguist/linguist", "homepage": "", "language": "Ruby", "name": "linguist", "pushedAt": "2026-03-18T15:06:01Z", "stargazersCount": 13378, "topics": [ "language-grammars", "language-statistics", "linguistic", "syntax-highlighting" ], "updatedAt": "2026-03-21T18:12:51Z", "url": "https://github.com/github-linguist/linguist"}Linguist
Section titled “Linguist”This library is used on GitHub.com to detect blob languages, ignore binary or vendored files, suppress generated files in diffs, and generate language breakdown graphs.
Documentation
Section titled “Documentation”- [How Linguist works]!(/docs/how-linguist-works.md)
- [Change Linguist’s behaviour with overrides]!(/docs/overrides.md)
- [Troubleshooting]!(/docs/troubleshooting.md)
- [Contributing guidelines]!(CONTRIBUTING.md)
Installation
Section titled “Installation”Install the gem:
gem install github-linguistDependencies
Section titled “Dependencies”Linguist is a Ruby library so you will need a recent version of Ruby installed.
There are known problems with the macOS/Xcode supplied version of Ruby that causes problems installing some of the dependencies.
Accordingly, we highly recommend you install a version of Ruby using Homebrew, rbenv, rvm, ruby-build, asdf or other packaging system, before attempting to install Linguist and the dependencies.
Linguist uses charlock_holmes for character encoding and rugged for libgit2 bindings for Ruby.
These components have their own dependencies.
You may need to install missing dependencies before you can install Linguist. For example, on macOS with Homebrew:
brew install cmake pkg-config icu4cOn Ubuntu:
sudo apt-get install build-essential cmake pkg-config libicu-dev zlib1g-dev libcurl4-openssl-dev libssl-dev ruby-devApplication usage
Section titled “Application usage”Linguist can be used in your application as follows:
require 'rugged'require 'linguist'
repo = Rugged::Repository.new('.')project = Linguist::Repository.new(repo, repo.head.target_id)project.language #=> "Ruby"project.languages #=> { "Ruby" => 119387 }Command line usage
Section titled “Command line usage”The github-linguist executable operates in two distinct modes:
- [Git Repository mode]!(#git-repository) - Analyzes an entire Git repository (when given a directory path or no path)
- [Single file mode]!(#single-file) - Analyzes a specific file (when given a file path)
Git Repository
Section titled “Git Repository”A repository’s languages stats can be assessed from the command line using the github-linguist executable.
Without any options, github-linguist will output the language breakdown by percentage and file size.
cd /path-to-repositorygithub-linguistYou can try running github-linguist on the root directory in this repository itself:
$ github-linguist66.84% 264519 Ruby24.68% 97685 C6.57% 25999 Go1.29% 5098 Lex0.32% 1257 Shell0.31% 1212 DockerfileAdditional options
Section titled “Additional options”--rev REV
Section titled “--rev REV”The --rev REV flag will change the git revision being analyzed to any gitrevisions(1) compatible revision you specify.
This is useful to analyze the makeup of a repo as of a certain tag, or in a certain branch.
For example, here is the popular Jekyll open source project.
$ github-linguist jekyll
70.64% 709959 Ruby23.04% 231555 Gherkin3.80% 38178 JavaScript1.19% 11943 HTML0.79% 7900 Shell0.23% 2279 Dockerfile0.13% 1344 Earthly0.10% 1019 CSS0.06% 606 SCSS0.02% 234 CoffeeScript0.01% 90 HackAnd here is Jekyll’s published website, from the gh-pages branch inside their repository.
$ github-linguist jekyll --rev origin/gh-pages100.00% 2568354 HTML--breakdown
Section titled “--breakdown”The --breakdown or -b flag will additionally show the breakdown of files by language.
You can try running github-linguist on the root directory in this repository itself:
$ github-linguist --breakdown66.84% 264519 Ruby24.68% 97685 C6.57% 25999 Go1.29% 5098 Lex0.32% 1257 Shell0.31% 1212 Dockerfile
Ruby:GemfileRakefilebin/git-linguistbin/github-linguistext/linguist/extconf.rbgithub-linguist.gemspeclib/linguist.rb…--strategies
Section titled “--strategies”The --strategies or -s flag will show the language detection strategy used for each file. This is useful for understanding how Linguist determined the language of specific files. Note that unless the --json flag is specified, this flag will set the --breakdown flag implicitly.
You can try running github-linguist on the root directory in this repository itself with the strategies flag:
$ github-linguist --breakdown --strategies66.84% 264519 Ruby24.68% 97685 C6.57% 25999 Go1.29% 5098 Lex0.32% 1257 Shell0.31% 1212 Dockerfile
Ruby: Gemfile [Filename] Rakefile [Filename] bin/git-linguist [Extension] bin/github-linguist [Extension] lib/linguist.rb [Extension] …If a file’s language is affected by .gitattributes, the strategy will show the original detection method along with a note indicating whether the gitattributes setting changed the result or confirmed it.
For instance, if you had the following .gitattributes overrides in your repo:
*.ts linguist-language=JavaScript*.js linguist-language=JavaScriptthe output of Linguist would be something like this:
100.00% 217 JavaScript
JavaScript: demo.ts [Heuristics (overridden by .gitattributes)] demo.js [Extension (confirmed by .gitattributes)]--json
Section titled “--json”The --json or -j flag output the data into JSON format.
$ github-linguist --json{"Dockerfile":{"size":1212,"percentage":"0.31"},"Ruby":{"size":264519,"percentage":"66.84"},"C":{"size":97685,"percentage":"24.68"},"Lex":{"size":5098,"percentage":"1.29"},"Shell":{"size":1257,"percentage":"0.32"},"Go":{"size":25999,"percentage":"6.57"}}This option can be used in conjunction with --breakdown to get a full list of files along with the size and percentage data.
$ github-linguist --breakdown --json{"Dockerfile":{"size":1212,"percentage":"0.31","files":["Dockerfile","tools/grammars/Dockerfile"]},"Ruby":{"size":264519,"percentage":"66.84","files":["Gemfile","Rakefile","bin/git-linguist","bin/github-linguist","ext/linguist/extconf.rb","github-linguist.gemspec","lib/linguist.rb",...]}}NB. The --strategies flag has no effect, when the --json flag is present.
Single file
Section titled “Single file”Alternatively you can find stats for a single file using the github-linguist executable.
You can try running github-linguist on files in this repository itself:
$ github-linguist grammars.ymlgrammars.yml: 884 lines (884 sloc) type: Text mime type: text/x-yaml language: YAMLAdditional options
Section titled “Additional options”--breakdown
Section titled “--breakdown”This flag has no effect in Single file mode.
--strategies
Section titled “--strategies”When using the --strategies or -s flag with a single file, you can see which detection method was used:
$ github-linguist --strategies lib/linguist.rblib/linguist.rb: 105 lines (96 sloc) type: Text mime type: application/x-ruby language: Ruby strategy: ExtensionIf a file’s language is affected by .gitattributes, the strategy will show whether the gitattributes setting changed the result or confirmed it:
In this fictitious example, it says “confirmed by .gitattributes” since the detection process (using the Filename strategy) would have given the same output as the override:
.devcontainer/devcontainer.json: 27 lines (27 sloc) type: Text mime type: application/json language: JSON with Comments strategy: Filename (confirmed by .gitattributes)In this other fictitious example, it says “overridden by .gitattributes” since the gitattributes setting changes the detected language to something different:
test.rb: 13 lines (11 sloc) type: Text mime type: application/x-ruby language: Java strategy: Extension (overridden by .gitattributes)Here, the .rb file would normally be detected as Ruby by the Extension strategy, but .gitattributes overrides it to be detected as Java instead.
--json
Section titled “--json”Using the --json flag will give you the output for a single file in JSON format:
$ github-linguist --strategies --json lib/linguist.rb{"lib/linguist.rb":{"lines":105,"sloc":96,"type":"Text","mime_type":"application/x-ruby","language":"Ruby","large":false,"generated":false,"vendored":false}}NB. The --strategies has no effect, when the --json flag is present.
Docker
Section titled “Docker”If you have Docker installed you can either build or use our pre-built images and run Linguist within a container:
$ docker run --rm -v $(pwd):$(pwd):Z -w $(pwd) -t ghcr.io/github-linguist/linguist:latest66.84% 264519 Ruby24.68% 97685 C6.57% 25999 Go1.29% 5098 Lex0.32% 1257 Shell0.31% 1212 DockerfileBuilding the image
Section titled “Building the image”$ docker build -t linguist .$ docker run --rm -v $(pwd):$(pwd):Z -w $(pwd) -t linguist66.84% 264519 Ruby24.68% 97685 C6.57% 25999 Go1.29% 5098 Lex0.32% 1257 Shell0.31% 1212 Dockerfile$ docker run --rm -v $(pwd):$(pwd) -w $(pwd) -t linguist github-linguist --breakdown66.84% 264519 Ruby24.68% 97685 C6.57% 25999 Go1.29% 5098 Lex0.32% 1257 Shell0.31% 1212 Dockerfile
Ruby:GemfileRakefilebin/git-linguistbin/github-linguistext/linguist/extconf.rbgithub-linguist.gemspeclib/linguist.rb…Contributing
Section titled “Contributing”Please check out our [contributing guidelines]!(CONTRIBUTING.md).
License
Section titled “License”The language grammars included in this gem are covered by their repositories’ respective licenses.
[vendor/README.md]!(/vendor/README.md) lists the repository for each grammar.
All other files are covered by the MIT license, see [LICENSE]!(./LICENSE).