lean-dojo/LeanCopilot
{ "createdAt": "2023-09-09T01:52:22Z", "defaultBranch": "main", "description": "LLMs as Copilots for Theorem Proving in Lean", "fullName": "lean-dojo/LeanCopilot", "homepage": "https://leandojo.org/leancopilot.html", "language": "C++", "name": "LeanCopilot", "pushedAt": "2025-11-18T00:27:20Z", "stargazersCount": 1186, "topics": [ "formal-mathematics", "lean", "lean4", "llm", "llm-inference", "machine-learning", "theorem-proving" ], "updatedAt": "2025-11-19T11:37:58Z", "url": "https://github.com/lean-dojo/LeanCopilot"}Lean Copilot: LLMs as Copilots for Theorem Proving in Lean
Section titled “Lean Copilot: LLMs as Copilots for Theorem Proving in Lean”Lean Copilot allows large language models (LLMs) to be used natively in Lean for proof automation, e.g., suggesting tactics/premises and searching for proofs. You can use our built-in models from LeanDojo or bring your own models that run either locally (w/ or w/o GPUs) or on the cloud.
https://github.com/lean-dojo/LeanCopilot/assets/114432581/ee0f56f8-849e-4099-9284-d8092cbd22a3
Table of Contents
Section titled “Table of Contents”- [Requirements]!(#requirements)
- [Using Lean Copilot in Your Project]!(#using-lean-copilot-in-your-project)
- [Adding Lean Copilot as a Dependency]!(#adding-lean-copilot-as-a-dependency)
- [Getting Started with Lean Copilot]!(#getting-started-with-lean-copilot)
- [Tactic Suggestion]!(#tactic-suggestion)
- [Proof Search]!(#proof-search)
- [Premise Selection]!(#premise-selection)
- [Advanced Usage]!(#advanced-usage)
- [Tactic APIs]!(#tactic-apis)
- [Model APIs]!(#model-apis)
- [Bring Your Own Model]!(#bring-your-own-model)
- [Caveats]!(#caveats)
- [Getting in Touch]!(#getting-in-touch)
- [Acknowledgements]!(#acknowledgements)
- [Citation]!(#citation)
Requirements
Section titled “Requirements”- Supported platforms: Linux, macOS, Windows and Windows WSL.
- Git LFS.
- Optional (recommended if you have a CUDA-enabled GPU): CUDA and cuDNN.
- Required for building Lean Copilot itself (rather than a downstream package): CMake >= 3.7 and a C++17 compatible compiler.
Using Lean Copilot in Your Project
Section titled “Using Lean Copilot in Your Project”:warning: Your project must use a Lean version of at least lean4:v4.3.0-rc2.
Adding Lean Copilot as a Dependency
Section titled “Adding Lean Copilot as a Dependency”- Add the package configuration option
moreLinkArgs := #["-L./.lake/packages/LeanCopilot/.lake/build/lib", "-lctranslate2"]to lakefile.lean. For example,
package «my-package» { moreLinkArgs := #[ "-L./.lake/packages/LeanCopilot/.lake/build/lib", "-lctranslate2" ]}Alternatively, if your project uses lakefile.toml, it should include:
moreLinkArgs = ["-L./.lake/packages/LeanCopilot/.lake/build/lib", "-lctranslate2"]- Add the following line to lakefile.lean, including the quotation marks:
require LeanCopilot from git "https://github.com/lean-dojo/LeanCopilot.git" @ "LEAN_COPILOT_VERSION"For stable Lean versions (e.g., v4.25.0), set LEAN_COPILOT_VERSION to be that version. For the latest unstable Lean versions (e.g., v4.26.0-rc1), set LEAN_COPILOT_VERSION to main. In either case, make sure the version is compatible with other dependencies such as mathlib. If your project uses lakefile.toml instead of lakefile.lean, it should include:
[[require]]name = "LeanCopilot"git = "https://github.com/lean-dojo/LeanCopilot.git"rev = "LEAN_COPILOT_VERSION"-
If you are using native Windows, add
<path_to_your_project>/.lake/packages/LeanCopilot/.lake/build/libto yourPathvariable in Advanced System Settings > Environment Variables… > System variables. -
Run
lake update LeanCopilot. -
Run
lake exe LeanCopilot/downloadto download the built-in models from Hugging Face to~/.cache/lean_copilot/. Alternatively, you can download the models from Hugging Face manually from
- ct2-leandojo-lean4-tacgen-byt5-small
- ct2-leandojo-lean4-retriever-byt5-small
- premise-embeddings-leandojo-lean4-retriever-byt5-small
- ct2-byt5-small
- Run
lake build.
Here is an example of a Lean package depending on Lean Copilot. If you have problems building the project, our [Dockerfile]!(./Dockerfile), [build.sh]!(scripts/build.sh) or [build_example.sh]!(scripts/build_example.sh) may be helpful.
Getting Started with Lean Copilot
Section titled “Getting Started with Lean Copilot”Tactic Suggestion
Section titled “Tactic Suggestion”After import LeanCopilot, you can use the tactic suggest_tactics to generate tactic suggestions. You can click on any of the suggested tactics to use it in the proof.
You can provide a prefix (e.g., simp) to constrain the generated tactics:
Proof Search
Section titled “Proof Search”The tactic search_proof combines LLM-generated tactics with aesop to search for multi-tactic proofs. When a proof is found, you can click on it to insert it into the editor.
Premise Selection
Section titled “Premise Selection”The select_premises tactic retrieves a list of potentially useful premises. Currently, it uses the retriever in LeanDojo to select premises from a fixed snapshot of Lean and mathlib4.
Running LLMs
Section titled “Running LLMs”You can also run the inference of any LLMs in Lean, which can be used to build customized proof automation or other LLM-based applications (not limited to theorem proving). It’s possible to run arbitrary models either locally or remotely (see [Bring Your Own Model]!(#bring-your-own-model)).
Advanced Usage
Section titled “Advanced Usage”This section is only for advanced users who would like to change the default behavior of suggest_tactics, search_proof, or select_premises, e.g., to use different models or hyperparameters.
Tactic APIs
Section titled “Tactic APIs”- Examples in [TacticSuggestion.lean]!(LeanCopilotTests/TacticSuggestion.lean) showcase how to configure
suggest_tactics, e.g., to use different models or generate different numbers of tactics. - Examples in [ProofSearch.lean]!(LeanCopilotTests/ProofSearch.lean) showcase how to configure
search_proofusing options provided by aesop. - Examples in [PremiseSelection.lean]!(LeanCopilotTests/PremiseSelection.lean) showcase how to set the number of retrieved premises for
select_premises.
Model APIs
Section titled “Model APIs”Examples in [ModelAPIs.lean]!(LeanCopilotTests/ModelAPIs.lean) showcase how to run the inference of different models and configure their parameters (temperature, beam size, etc.).
Lean Copilot supports two kinds of models: generators and encoders. Generators must implement the TextToText interface:
class TextToText (τ : Type) where generate (model : τ) (input : String) (targetPrefix : String) : IO $ Array (String × Float)inputis the input stringtargetPrefixis used to constrain the generator’s output.""means no constraint.generateshould return an array ofString × Float. EachStringis an output from the model, andFloatis the corresponding score.
We provide three types of Generators:
- [
NativeGenerator]!(LeanCopilot/Models/Native.lean) runs locally powered by CTranslate2 and is linked to Lean using Foreign Function Interface (FFI). - [
ExternalGenerator]!(LeanCopilot/Models/External.lean) is hosted either locally or remotely. See [Bring Your Own Model]!(#bring-your-own-model) for details. - [
GenericGenerator]!(LeanCopilot/Models/Generic.lean) can be anything that implements thegeneratefunction in theTextToTexttypeclass.
Encoders must implement TextToVec:
class TextToVec (τ : Type) where encode : τ → String → IO FloatArrayinputis the input stringencodeshould return a vector embedding produced by the model.
Similar to generators, we have NativeEncoder, ExternalEncoder, and GenericEncoder.
Bring Your Own Model
Section titled “Bring Your Own Model”In principle, it is possible to run any model using Lean Copilot through ExternalGenerator or ExternalEncoder (examples in [ModelAPIs.lean]!(LeanCopilotTests/ModelAPIs.lean)). To use a model, you need to wrap it properly to expose the APIs in [external_model_api.yaml]!(./external_model_api.yaml). As an example, we provide a [Python API server]!(./python) and use it to run a few models.
Caveats
Section titled “Caveats”select_premisesalways retrieves the original form of a premise. For example,Nat.add_left_commis a result of the theorem below. In this case,select_premisesretrievesNat.mul_left_comminstead ofNat.add_left_comm.
@[to_additive]theorem mul_left_comm : ∀ a b c : G, a * (b * c) = b * (a * c)- In some cases,
search_proofproduces an erroneous proof with error messages likefail to show termination for .... A temporary workaround is changing the theorem’s name before applyingsearch_proof. You can change it back aftersearch_proofcompletes.
Getting in Touch
Section titled “Getting in Touch”- For general questions and discussions, please use GitHub Discussions.
- To report a potential bug, please open an issue. In the issue, please include your OS information, the exact steps to reproduce the error on the latest stable version of Lean Copilot, and complete logs preferrably in debug mode. Important: If your issue cannot be reproduced easily, it will be unlikely to receive help.
- Feature requests and contributions are extremely welcome. Please feel free to start a discussion or open a pull request.
Acknowledgements
Section titled “Acknowledgements”- We thank Scott Morrison for suggestions on simplifying Lean Copilot’s installation and Mac Malone for helping implement it. Both Scott and Mac work for the Lean FRO.
- We thank Jannis Limperg for supporting our LLM-generated tactics in Aesop (https://github.com/leanprover-community/aesop/pull/70).
Citation
Section titled “Citation”If you find our work useful, please consider citing our paper:
@article{song2024lean, title={Lean copilot: Large language models as copilots for theorem proving in lean}, author={Song, Peiyang and Yang, Kaiyu and Anandkumar, Anima}, journal={arXiv preprint arXiv:2404.12534}, year={2024}}