spartanz/parserz
{ "createdAt": "2018-07-08T05:49:34Z", "defaultBranch": "master", "description": "A purely-functional library for creating both parsers, pretty-printers, and grammar definitions from a single, type-safe specification of a grammar", "fullName": "spartanz/parserz", "homepage": "", "language": "Scala", "name": "parserz", "pushedAt": "2020-04-04T16:12:07Z", "stargazersCount": 70, "topics": [ "functional-programming", "invertible", "parser-combinators", "scala" ], "updatedAt": "2025-10-19T15:18:17Z", "url": "https://github.com/spartanz/parserz"}Welcome to Parserz
Section titled “Welcome to Parserz”zero dependency invertible parser combinators library for Scala
Section titled “zero dependency invertible parser combinators library for Scala”Summary
Section titled “Summary”Parserz is a purely-functional library for creating parsers, pretty-printers, and grammar definitions from a single, type-safe specification of a grammar.
The main idea of the library is to facilitate conversions between types A and B
in both directions via single coherent implementation,
as in contrast of having two separate implementations for A => B and B => A.
The theory behind this idea is outlined in the paper Invertible Syntax Descriptions but the library does not exactly follow the implementation proposed in paper.
Overview and highlights
Section titled “Overview and highlights”The following are the abstractions provided by library
trait ParsersModule { type Input sealed abstract class Grammar[-SI, +SO, +E, A] {}}Grammar is a data type for modelling user-defined grammars and
ParsersModule is a module that contains interpreters of Grammar into runnable functions, i.e. parsers and printers, etc.
Internally library follows the “free encoding” design where Grammar is defined as a generalized algebraic data type (GADT),
and all interpreters are implemented as traversals of Grammar.
Types:
Input- the input type; values of this type are consumed by parser and produced by printerA- the output type; values of this type are produced by parser and consumed by printerE- the error type; values of this type are produced by both parser and printer if they cannot proceed normallySI- the input state type andSO- the output state type; values of these types represent state that is threaded through the execution of parser and printer
Note: for the printer or other interpreters, meaning of Input and A may be completely reversed.
Unfortunately, Scala does not allow redefining type names based on one’s interpretation,
so we have chosen the parser direction to be the titular one simply because library name is parserz.
Getting the library
Section titled “Getting the library”Simply add
libraryDependencies += "org.spartanz" %% "parserz" % "0.2.0"to the SBT build.
Using the library
Section titled “Using the library”Instantiate the module
import org.spartanz.parserz.ParsersModule
object MyParser extends ParsersModule { override type Input = List[Char]}which also requires defining the input type.
Import combinators
import MyParser.Grammar._import MyParser._It is now possible to create grammars!
Let’s create building blocks to parse the following program
"a(bx(),cy(z()))"into description on how it can be invoked
case class Call(name: ::[Char], args: List[Call])Full working example is here
Consuming input
Section titled “Consuming input”consume combinator and its variants are used to take a portion of input
and return remainder of the input and the consumed value. This is how it’s defined:
final def consume[E, A]!(to: Input => E \/ (Input, A), from: ((Input, A)) => E \/ Input): Grammar[Any, Nothing, E, A]It requires two functions to be provided both returning either a value or an error, for example
val char: G[Char] = consume({ case c :: cs => Right(cs -> c) case Nil => Left("eoi")}, { case (cs, c) => Right(c :: cs)})The G[A] is simply a type alias for Grammar[Any, Nothing, String, A], which means
- state is not used (accepts any state and returns nothing)
- error type is set to
String - the value taken by this grammar is of type
List[Char](as defined above) and produced value is of typeA
Asserting
Section titled “Asserting”There are combinators that allow to assert the value (and return error if assertion failed),
for example filter and its variants, which is defined as
final def filter[E1 >: E]!(e: E1)(f: A => Boolean): Grammar[SI, SO, E1, A]It accepts a predicate function which is applied to the result of existing grammar. For example to test that consumed character is something specific we can do
val alpha: G[Char] = char.filter("expected: alphabetical")(_.isLetter)val comma: G[Char] = char.filter("expected: comma")(_ == ',')Resulting grammars are new bigger building blocks.
Sequencing, alterating and repeating
Section titled “Sequencing, alterating and repeating”~orzipcombines two grammars that produceAandBinto a grammar that producesTuple2[A, B]|oraltcombines two grammars that produceAandBinto a grammar that producesEither[A, B]reprepeats the grammar zero or more times while it can andrep1repeats the grammar one or more times while it can
In the program "a(bx(),cy(z()))", "bx(),cy(z())" is the list of arguments of function a.
Here is how to parse them:
it’s a call to another function followed by zero or more calls to other functions prefixed by comma, or no calls at all:
val args: G[List[Call]] = ((call ~ (comma ~ call).rep) | succeed(Nil)).map({ case Left((e1, en)) => e1 :: en.map(_._2) case Right(_) => Nil}, { case Nil => Right(Nil) case e1 :: en => Left((e1, en.map((',', _))))})Recursive grammars
Section titled “Recursive grammars”In the previous example there is a call to call grammar that produces a value of Call.
Call is also a recursive data structure, as it references itself:
case class Call(name: ::[Char], args: List[Call])To allow recursion in grammar definitions, current version of library has the delay combinator. Here is how
grammar for Call is defined:
lazy val call: G[Call] = delay { (alpha.rep1 ~ paren1 ~ args ~ paren2).map( { case (((name, _), exp), _) => Call(name, exp) }, { case Call(name, exp) => (((name, '('), exp), ')') } )}Notice the lazy modifier as well which allows this grammar to be forward-referenced.
Interpreters
Section titled “Interpreters”All grammars are now constructed and can be passed to interpreters.
Parsing
Section titled “Parsing”The parser is simply a function constructed by the library from the given grammar:
val parser: (Unit, List[Char]) => (Unit, String \/ (List[Char], Call)) = MyParser.parser(call)Passing program "a(bx(),cy(z()))" into this function (along with state of Unit) yields
Call(::('a', Nil), List( Call(::('b', List('x')), Nil), Call(::('c', List('y')), List( Call(::('z', Nil), Nil) ))))which is indeed the structure representing function calls in the program.
Printing
Section titled “Printing”The printer is also a function constructed by the library from the given grammar:
val printer: (Unit, (List[Char], Call)) => (Unit, String \/ List[Char]) = MyParser.printer(call)Printing the value just received from the parser yields the program text back:
printer((), (Nil, value))._2.map(_.reverse.mkString)Here Nil is initial output, which is, of course, empty,
and there is a reverse done on the output because of the consume implementation above which prepends to list.
Grammar definition
Section titled “Grammar definition”There is an interpreter that produces a description of the grammar in the Backus–Naur form (BNF).
val description: List[String] = MyParser.bnf(call)For it to work, grammars can be described with the tag combinator (or its symbolic equivalent @@):
val char: G[Char] = "char" @@ consume(...)val alpha: G[Char] = char.filter("expected: alphabetical")(_.isLetter).tag("alpha")It interprets directly into a value, which is a printout of all tagged grammars. This is how it looks like for the example above
<alpha> ::= <char><open paren> ::= <char><comma> ::= <char><args> ::= (<call> List(<comma> <call>) | )<close paren> ::= <char><call> ::= NEL(<alpha>) <open paren> <args> <close paren>