rspivak/tinypie
{ "createdAt": "2011-01-08T04:00:03Z", "defaultBranch": "develop", "description": "Tree-Based Interpreter, compiler, and VM for TinyPie language", "fullName": "rspivak/tinypie", "homepage": "", "language": "Python", "name": "tinypie", "pushedAt": "2011-11-30T08:35:38Z", "stargazersCount": 72, "topics": [], "updatedAt": "2025-06-16T21:10:22Z", "url": "https://github.com/rspivak/tinypie"}TinyPie - Tree-Based Interpreter, Compiler, and VM for TinyPie language
Section titled “TinyPie - Tree-Based Interpreter, Compiler, and VM for TinyPie language”Overview
Section titled “Overview”TinyPie is a tree-based interpreter for a simple programming language with a Python-like syntax.
It’s based on Pie language from Language Implementation Patterns Ch.9
Quote from the book: “A tree-based interpreter is like a compiler front end with an interpreter grafted onto the end instead of a code generator”
TinyPie also includes Bytecode Assembler and Register-Based Virtual Machine.
Goals of the project
Section titled “Goals of the project”-
Self-education
-
To serve as an example for people interested in crafting their own interpreter in Python for a simple programming language or DSL
Installation
Section titled “Installation”-
Using
buildout$ git clone git://github.com/rspivak/tinypie.git$ cd tinypie$ python bootstrap.py$ bin/buildoutRun the interpreter$ bin/tinypie factorial.tp -
Using
piporeasy_install(no need for sudo if usingvirtualenv)$ sudo pip install tinypie(or run alternatively easy_install: $ sudo easy_install tinypie)Run the interpreter$ tinypie factorial.tp
Main interpreter features
Section titled “Main interpreter features”-
Implemented in Python
-
Regexp-based lexer
-
LL(k) recursive-descent parser
-
Parser constructs homogeneous Abstract Syntax Tree (AST)
-
Static / lexical scope support. Interpreter builds complete scope tree during AST construction.
-
Interpeter manages global memory space and function space stack
-
Interpreter implements external AST visitor
-
Forward references support
High-level overview:
| +------------------------------------+ | source code | Symbol Table / Scope Tree | | | | \|/ | +-----------------------+ |+--------------X---------------+ | | GlobalScope | || | | +----/-------------\----+ || Regexp-based lexer | | / \ || | |+--------/-----+ +-------\-------+|+--------------+---------------+ ||FunctionSymbol| |FunctionSymbol || | |+------X-------+ +-------X-------+| | token stream | /|\ /|\ | \|/ | | | |+--------------X---------------+ |+------+-------+ +-------+-------+|| | || LocalScope | | LocalScope |||LL(k) recursive-descent parser| |+--------------+ +---------------+|| | +------------------------------------++--------------+---------------+ | +------------------------------------+ | AST | Memory Space | \|/ | |+--------------X-------------+ |Function Space Global Memory || Interpreter | |Stack Space || with external tree visitor | |+-------------+ +----------------+|| | ||FunctionSpace| | MemorySpace ||+--------------+-------------+ |+-------------+ +----------------+| | |+-------------+ | | output ||FunctionSpace| | \|/ |+-------------+ | X +------------------------------------+This Tree-Based Interpreter is similar to Syntax-Directed Interpreter but instead of directly executing source code during parsing the interpreter executes the source code by constructing AST and then walking the tree.
Advantages of tree-based interperter over syntax-directed one:
-
Symbol definition and symbol resolution happen at different stages:
- symbol definition is performed during parsing
- symbol resolution is performed during interpretation
This allows forward references and simplifies implementation - parser deals with scope tree only and interpreter deals with memory spaces.
-
More flexible because of AST as an intermediate representation.
Language grammar
Section titled “Language grammar”program -> (function_definition | statement)+ EOFfunction_definition -> 'def' ID '(' (ID (',' ID)*)? ')' slistslist -> ':' NL statement+ '.' NL | statementstatement -> 'print' expr NL | 'return' expr NL | call NL | assign NL | 'if' expr slist ('else' slist)? | 'while' expr slist | NLassign -> ID '=' exprexpr -> add_expr (('<' | '==') add_expr)?add_expr -> mult_expr (('+' | '-') mult_expr)*mult_expr -> atom ('*' atom)*atom -> ID | INT | STRING | call | '(' expr ')'call -> ID '(' (expr (',' expr)*)? ')'Short language reference
Section titled “Short language reference”- statements:
print,return,if,while, and function call - literals: integers and strings
- operators: <, ==, +, -, *
Expressions may contain identifiers.
Code examples
Section titled “Code examples”-
Factorial
print factorial(6) # forward referencedef factorial(x): # function definitionif x < 2 return 1 # if statementreturn x * factorial(x - 1) # return statement. # DOT marks the block end -
WHILE loop
index = 0while index < 10:print indexindex = index + 1. # DOT marks the block end -
IF ELSE
x = 10if x == 10 print 'Yes'else print 'No' -
Variable lookup in different scopes
x = 1 # global variabledef foo(x): # 'foo' is defined in global memory spaceprint x # prints 3 (parameter value). # DOT marks the block enddef bar(): # 'bar' is defined in global memory spacex = 7 # modify global variable. # DOT marks the block endfoo(3) # prints 3bar()print x # prints 7 ('bar' modified global variable)
AST Examples
Section titled “AST Examples”Trees in these examples are represented as S-expressions for tersness.
Function call:
foo(5, 7)
(CALL ID INT INT) means CALL is the root with children ID(foo), INT(5), and INT(7)Function definition:
def foo(x, y): z = x + y return z.
(FUNC_DEF ID ID ID (BLOCK (ASSIGN ID (SUB ID ID)) (RETURN ID)))Development
Section titled “Development”Install ‘enscript’ utility (optional). If you are on Ubuntu:
$ sudo apt-get install enscriptBoostrap the buildout and run it:
$ cd tinypie$ python bootstrap.py$ bin/buildoutRun tests, test coverage and produce coverage reports:
$ bin/test$ bin/coverage-test$ bin/coveragereport
Check ./var/report/tinypie.html out for coverage report.Run pep8 and pylint to check code style and search for potential bugs:
$ bin/pep8$ bin/pylintAST visualizer
Section titled “AST visualizer”To see how TinyPie AST for different language constructs looks
like you can use gendot command line utility that is generated
by buildout. This utility generates DOT file than can be further
processed by dot program to draw nice graphs.
$ echo -n 'foo(3)' | bin/gendotdigraph astgraph { node [shape=plaintext, fontsize=12, fontname="Courier", height=.1]; ranksep=.3; edge [arrowsize=.5]
node1 [label="BLOCK"]; node2 [label="CALL"]; node3 [label="ID (foo)"]; node4 [label="INT (3)"];
node2 -> node3 node2 -> node4 node1 -> node2}
To draw graph and save it as a PNG image:
$ echo -n 'foo(3)' | bin/gendot > funcall.dot$ dot -Tpng -o funcall.png funcall.dotBytecode Assembler
Section titled “Bytecode Assembler”Converts assembly program into binary bytecodes. The bytecode is further interpreted by the TinyPie Register-Based Bytecode Interpreter / Virtual Machine.
TinyPie Assembly language grammar:
program -> globals? (label | function_definition | instruction | NL)+globals -> '.globals' INT NLlabel -> ID ':' NLfunction_definition -> '.def' ID ':' 'args' '=' INT ',' 'locals' '=' INT NLinstruction -> ID NL | ID operand NL | ID operand ',' operand NL | ID operand ',' operand NL ',' operand NL | 'call' ID ',' operand NL | 'loadk' REG ',' (INT | STRING) NLoperand -> REG | ID | STRING | INTAssembler yields the following components:
-
Code memory: This is a
bytearraycontaining bytecode instructions and their operands derived from the assembly source code. -
Global data memory size: The number of slots allocated in global memory for use with GSTORE and GLOAD assembly commands.
-
Program entry point: An address of main function
.def main: ... -
Constant pool: A list of objects (integers, strings, function symbols) that are not part of the code memory. Bytecode instructions refer to those objects via integer index.
Here is a factorial function in TinyPie language:
def factorial(x): if x < 2 return 1 return x * factorial(x - 1).
print factorial(5)Here is an equivalent TinyPie assembly code:
.def factorial: args=1, locals=3 # r1 holds argument 'n' loadk r2, 2 lt r3, r1, r2 # n < 2 ? brf r3, cont # if n >= 2 jump to 'cont' loadk r0, 1 # else return 1 retcont: loadk r2, 1 # r2 = 1 move r3, r1 # r3 = n sub r1, r1, r2 # r1 = n - 1 call factorial, r1 # factorial(n - 1) mul r0, r3, r0 # n = n * result of factorial(n - 1) ret
.def main: args=0, locals=1 loadk r1, 5 call factorial, r1 # factorial(5) print r0 # 120 haltHere are the resulting elements produced by the Bytecode Assembler by translating the above assembly code:
Constant pool:0000: <FunctionSymbol: name='factorial', address=0, args=1, locals=3>0001: 20002: 10003: <FunctionSymbol: name='main', address=95, args=0, locals=1>0004: 3
Code memory:0000: 6 0 0 0 2 0 0 00008: 1 4 0 0 0 3 0 00016: 0 1 0 0 0 2 13 00024: 0 0 3 0 0 0 41 60032: 0 0 0 0 0 0 0 20040: 9 6 0 0 0 2 0 00048: 0 2 14 0 0 0 3 00056: 0 0 1 2 0 0 0 10064: 0 0 0 1 0 0 0 20072: 16 0 0 0 0 0 0 00080: 1 3 0 0 0 0 0 00088: 0 3 0 0 0 0 9 60096: 0 0 0 1 0 0 0 40104: 16 0 0 0 0 0 0 00112: 1 15 0 0 0 0 10Bytecode instructions are described in the following section.
Register-Based Bytecode Interpreter / Virtual Machine
Section titled “Register-Based Bytecode Interpreter / Virtual Machine”Virtual Machine architecture:
-
IP: Instruction pointer register that points into the code memory at the next instruction to execute.
-
CPU: Instruction dispatcher that simulates fetch-decode-execute cycle with a switch (if elif) statement in a loop - reads bytecode at IP, decodes its operands and executes corresponding operation.
-
Global data memory: a list of global objects. Contents is accessed via integer index.
-
Code memory: Holds bytecode instructions and their operands.
-
Call stack: Holds StackFrame objects with function return address, parameters, and local variables.
-
Stack frame: A StackFrame object that holds all required information to invoke a function:
- function symbol
- function return address
- registers hold return value, arguments, locals, and temporary values
-
FP: Frame pointer - a special-purpose register that points to the top of the function
call stack -
Constant pool: Integers, strings, and function symbols all go into constant pool. Instructions refer to constant pool values via an integer index.
High-level overview:
| TinyPie | assembly | V +-----------------------+ | | | Bytecode Assembler | | | +----------+------------+ | code memory(bytecode) | constant pool V+--------------------------------------------------------------------------+| || Register-Based Bytecode Interpreter / Virtual Machine || || +------------------------+ +------------------------------------+ || | | | Function Call Stack | || | Constant Pool +----+ | | || | (integer, string, | | | | || | function symbol) | | | +--------------------------------+ | || +------------------------+ | | | Stack Frame | | || | | | | | || | | |function symbol: FS('main') | | || | | |return address: 0 | | || +------------------------+ | | | ret| args|locals | | || | | | | |registers: r0|r1 r2|r3 r4 r5 | | || | Global data memory | | | +--------------------------------+ | || | +-| | | | || | | | | | | || +------------------------+ | | | +--------------------------------+ | || | | | | Stack Frame | | || | | | | | | || | | | |function symbol: FS('fact') | | || +------------------------+ | | | |return address: 17 | | || | | | | | | ret| args |locals | | || | Code memory | | | | |registers: r0|r1 r2 r3|r4 | | || | (bytecode) | | | | +--------------------------------+ | || | | | | | | || +----------+-------------+ | | | | || | | | +------------------------------------+ || V | | |+ +------------------------+ | | || | | | | || | CPU <-+ | || | (fetch-decode-execute) | | || | <----+ || +------------------------+ || |+--------------------------------------------------------------------------+Bytecode instructions for TinyPie VM:
# Index serves as an opcodeINSTRUCTIONS = [ None, Instruction('add', REG, REG, REG), # A B C R(A) = R(B) + R(C) Instruction('sub', REG, REG, REG), # A B C R(A) = R(B) - R(C) Instruction('mul', REG, REG, REG), # A B C R(A) = R(B) * R(C) Instruction('lt', REG, REG, REG), # A B C R(A) = R(B) < R(C) Instruction('eq', REG, REG, REG), # A B C R(A) = R(B) == R(C) Instruction('loadk', REG, POOL), # A B R(A) = CONST_POOL[B] Instruction('gload', REG, POOL), # A B R(A) = GLOBALS[B] Instruction('gstore', POOL, REG), # A B GLOBALS[A] = R(B) Instruction('ret'), Instruction('halt'), Instruction('br', INT), # A branch to A Instruction('brt', REG, INT), # A B R(A) is True -> branch to B Instruction('brf', REG, INT), # A B R(A) is False -> branch to B Instruction('move', REG, REG), # A B R(A) = R(B) Instruction('print', REG), # A print R(A) Instruction('call', FUNC, REG), # A B call A, R(B) ]TinyPie VM comes with a tpvm command line utility:
$ bin/tpvm -hUsage: tpvm [input file]
If no input file is provided STDIN is used by default.
Options: -h, --help show this help message and exit -i FILE, --input=FILE Input file. Defaults to standard input. -c, --coredump Print coredump to standard output. -d, --disasm Print disassembled code to standard output. -t, --trace Print execution trace.Example output:
fact.tps.def fact: args=1, locals=3 # r1 holds argument 'n' loadk r2, 2 lt r3, r1, r2 # n < 2 ? brf r3, cont # if n >= 2 jump to 'cont' loadk r0, 1 # else return 1 retcont: loadk r2, 1 # r2 = 1 move r3, r1 # r3 = n sub r1, r1, r2 # r1 = n - 1 call fact, r1 # fact(n - 1) mul r0, r3, r0 # n = n * result of fact(n - 1) ret
.def main: args=0, locals=1 loadk r1, 3 call fact, r1 # fact(3) print r0 # 6 halt
$ bin/tpvm --coredump --disasm --trace -i /tmp/fact.tps0095: LOADK r1, #4:3 main.registers=[? | ?] calls=[main]0104: CALL #0:fact@0, r1 main.registers=[? | 3] calls=[main]0000: LOADK r2, #1:2 fact.registers=[? | 3 | ? ? ?] calls=[main fact]0009: LT r3, r1, r2 fact.registers=[? | 3 | 2 ? ?] calls=[main fact]0022: BRF r3, 41 fact.registers=[? | 3 | 2 0 ?] calls=[main fact]0041: LOADK r2, #2:1 fact.registers=[? | 3 | 2 0 ?] calls=[main fact]0050: MOVE r3, r1 fact.registers=[? | 3 | 1 0 ?] calls=[main fact]0059: SUB r1, r1, r2 fact.registers=[? | 3 | 1 3 ?] calls=[main fact]0072: CALL #0:fact@0, r1 fact.registers=[? | 2 | 1 3 ?] calls=[main fact]0000: LOADK r2, #1:2 fact.registers=[? | 2 | ? ? ?] calls=[main fact fact]0009: LT r3, r1, r2 fact.registers=[? | 2 | 2 ? ?] calls=[main fact fact]0022: BRF r3, 41 fact.registers=[? | 2 | 2 0 ?] calls=[main fact fact]0041: LOADK r2, #2:1 fact.registers=[? | 2 | 2 0 ?] calls=[main fact fact]0050: MOVE r3, r1 fact.registers=[? | 2 | 1 0 ?] calls=[main fact fact]0059: SUB r1, r1, r2 fact.registers=[? | 2 | 1 2 ?] calls=[main fact fact]0072: CALL #0:fact@0, r1 fact.registers=[? | 1 | 1 2 ?] calls=[main fact fact]0000: LOADK r2, #1:2 fact.registers=[? | 1 | ? ? ?] calls=[main fact fact fact]0009: LT r3, r1, r2 fact.registers=[? | 1 | 2 ? ?] calls=[main fact fact fact]0022: BRF r3, 41 fact.registers=[? | 1 | 2 1 ?] calls=[main fact fact fact]0031: LOADK r0, #2:1 fact.registers=[? | 1 | 2 1 ?] calls=[main fact fact fact]0040: RET fact.registers=[1 | 1 | 2 1 ?] calls=[main fact fact fact]0081: MUL r0, r3, r0 fact.registers=[1 | 1 | 1 2 ?] calls=[main fact fact]0094: RET fact.registers=[2 | 1 | 1 2 ?] calls=[main fact fact]0081: MUL r0, r3, r0 fact.registers=[2 | 2 | 1 3 ?] calls=[main fact]0094: RET fact.registers=[6 | 2 | 1 3 ?] calls=[main fact]0113: PRINT r0 main.registers=[6 | 3] calls=[main]6Constant pool:0000: <FunctionSymbol: name='fact', address=0, args=1, locals=3>0001: 20002: 10003: <FunctionSymbol: name='main', address=95, args=0, locals=1>0004: 3
Code memory:0000: 6 0 0 0 2 0 0 00008: 1 4 0 0 0 3 0 00016: 0 1 0 0 0 2 13 00024: 0 0 3 0 0 0 41 60032: 0 0 0 0 0 0 0 20040: 9 6 0 0 0 2 0 00048: 0 2 14 0 0 0 3 00056: 0 0 1 2 0 0 0 10064: 0 0 0 1 0 0 0 20072: 16 0 0 0 0 0 0 00080: 1 3 0 0 0 0 0 00088: 0 3 0 0 0 0 9 60096: 0 0 0 1 0 0 0 40104: 16 0 0 0 0 0 0 00112: 1 15 0 0 0 0 10
Disassembly:0000: LOADK r2, #1:20009: LT r3, r1, r20022: BRF r3, 410031: LOADK r0, #2:10040: RET0041: LOADK r2, #2:10050: MOVE r3, r10059: SUB r1, r1, r20072: CALL #0:fact@0, r10081: MUL r0, r3, r00094: RET0095: LOADK r1, #4:30104: CALL #0:fact@0, r10113: PRINT r00118: HALTRoadmap
Section titled “Roadmap”- AST -> TAC (Three-Address Code)
- TAC -> Bytecode Assembly
- VM in C/C++ (here we go, the joy of manual memory management)