# Understanding parser combinators

• Published on
18-Mar-2018

• View
1.156

0

Transcript

• Understanding Parser

Combinators

@ScottWlaschin

fsharpforfunandprofit.com/parser

• let digit = satisfy (fun ch -> Char.IsDigit ch ) "digit"

let point = pchar '.'

let e = pchar 'e' pchar 'E'

let optPlusMinus = opt (pchar '-' pchar '+')

let nonZeroInt =

digitOneNine .>>. manyChars digit

|>> fun (first,rest) -> string first + rest

let intPart = zero nonZeroInt

let fractionPart = point >>. manyChars1 digit

let exponentPart = e >>. optPlusMinus .>>. manyChars1 digit

Typical code using parser combinators

• let digit = satisfy (fun ch -> Char.IsDigit ch ) "digit"

let point = pchar '.'

let e = pchar 'e' pchar 'E'

let optPlusMinus = opt (pchar '-' pchar '+')

let nonZeroInt =

digitOneNine .>>. manyChars digit

|>> fun (first,rest) -> string first + rest

let intPart = zero nonZeroInt

let fractionPart = point >>. manyChars1 digit

let exponentPart = e >>. optPlusMinus .>>. manyChars1 digit

• Overview

1. What is a parser combinator library?

2. The foundation: a simple parser

3. Three basic parser combinators

4. Building combinators from other combinators

5. Improving the error messages

6. Building a JSON parser

• Part 1

What is a parser combinator

library?

• Something to match

Parser Create step in parsing recipe

Creating a parsing recipe

A “Parser-making" function

This is a recipe to make something, not the thing

itself

• Parser

Combining parsing recipes

A recipe to make a more complicated thing

Parser Parser combined

with

A "combinator"

• Parser Run

Running a parsing recipe

input

Success or

Failure

• Why parser combinators?

• Written in your favorite programming language

• No preprocessing needed

– Lexing, parsing, AST transform all in one.

– REPL-friendly

• Easy to create little DSLs

• Fun way of understanding functional composition

• Part 2:

A simple parser

• Version 1 – parse the character 'A'

input

pcharA remaining input

true/false

• Version 1 – parse the character 'A'

input

pcharA remaining input

true/false

• let pcharA input =

if String.IsNullOrEmpty(input) then

(false,"")

else if input.[0] = 'A' then

let remaining = input.[1..]

(true,remaining)

else

(false,input)

• Version 2 – parse any character

matched char input

pchar remaining input

charToMatch failure message

• let pchar (charToMatch,input) =

if String.IsNullOrEmpty(input) then

"No more input"

else

let first = input.[0]

if first = charToMatch then

let remaining = input.[1..]

(charToMatch,remaining)

else

sprintf "Expecting '%c'. Got '%c'" charToMatch first

• Fix – create a choice type to capture either case

Success: matched char input

pchar Success: remaining input

charToMatch Failure: message

type Result

• Fix – create a choice type to capture either case

Success: matched char input

pchar Success: remaining input

charToMatch Failure: message

type Result

• Fix – create a choice type to capture either case

Success: matched char input

pchar Success: remaining input

charToMatch Failure: message

type Result

• let pchar (charToMatch,input) =

if String.IsNullOrEmpty(input) then

Failure "No more input"

else

let first = input.[0]

if first = charToMatch then

let remaining = input.[1..]

Success (charToMatch,remaining)

else

let msg = sprintf "Expecting '%c'. Got '%c'" charToMatch first

Failure msg

• Version 3 – returning a function

Success: matched char input

pchar Success: remaining input

charToMatch Failure: message

• Version 3 – returning a function

Success: matched char input

pchar Success: remaining input

charToMatch Failure: message

• Version 3 – returning a function

input pchar

charToMatch

• Version 3 – returning a function

charToMatch pchar

• Version 3 – returning a function

charToMatch pchar

• Version 4 – wrapping the function in a type

charToMatch pchar

Parser

• Version 4 – wrapping the function in a type

charToMatch pchar

Parser

type Parser)

A function that takes a string and returns a Result

• Version 4 – wrapping the function in a type

charToMatch pchar

Parser

type Parser)

Wrapper

• Creating parsing recipes

• charToMatch input

Parser

A parsing recipe for a char

• Parser Run

Running a parsing recipe

input

Success, or

Failure

• Running a parsing recipe

input

Parser

Parser Run

input

Success, or

Failure

• let run parser input =

// unwrap parser to get inner function

let (Parser innerFn) = parser

// call inner function with input

innerFn input

• Enough talk,

show me some code

• Part 3:

Three basic combinators

• What is a combinator?

• A “combinator” library is a library designed around

combining things to get more complex values of

the same type.

• integer + integer = integer

• list @ list = list // @ is list concat

• Parser ?? Parser = Parser

• Basic parser combinators

• Parser andThen Parser => Parser

• Parser orElse Parser => Parser

• Parser map (transformer) => Parser

• AndThen parser combinator

• Run the first parser.

– If there is a failure, return.

• Otherwise, run the second parser with the

remaining input.

– If there is a failure, return.

• If both parsers succeed, return a pair (tuple)

that contains both parsed values.

• let andThen parser1 parser2 =

let innerFn input =

// run parser1 with the input

let result1 = run parser1 input

// test the 1st parse result for Failure/Success

match result1 with

| Failure err ->

Failure err // return error from parser1

| Success (value1,remaining1) ->

// run parser2 with the remaining input

(continued on next slide..)

• let andThen parser1 parser2 =

[...snip...]

let result2 = run parser2 remaining1

// test the 2nd parse result for Failure/Success

match result2 with

| Failure err ->

Failure err // return error from parser2

| Success (value2,remaining2) ->

let combinedValue = (value1,value2)

Success (combinedValue,remaining2)

// return the inner function

Parser innerFn

• OrElse parser combinator

• Run the first parser.

• On success, return the parsed value, along

with the remaining input.

• Otherwise, on failure, run the second parser

with the original input...

• ...and in this case, return the result (success or

failure) from the second parser.

• let orElse parser1 parser2 =

let innerFn input =

// run parser1 with the input

let result1 = run parser1 input

// test the result for Failure/Success

match result1 with

| Success result ->

// if success, return the original result

result1

| Failure err ->

// if failed, run parser2 with the input

(continued on next slide..)

• let orElse parser1 parser2 =

[...snip...]

| Failure err ->

// if failed, run parser2 with the input

let result2 = run parser2 input

// return parser2's result

result2

// return the inner function

Parser innerFn

• Map parser combinator

• Run the parser.

• On success, transform the parsed value using

the provided function.

• Otherwise, return the failure

• let mapP f parser =

let innerFn input =

// run parser with the input

let result = run parser input

// test the result for Failure/Success

match result with

| Success (value,remaining) ->

// if success, return the value transformed by f

let newValue = f value

Success (newValue, remaining)

(continued on next slide..)

• let mapP f parser =

[...snip...]

| Failure err ->

// if failed, return the error

Failure err

// return the inner function

Parser innerFn

• Parser combinator operators

pcharA .>>. pcharB // 'A' andThen 'B'

pcharA pcharB // 'A' orElse 'B'

pcharA |>> (...) // map ch to something

• Demo

• Part 4:

Building complex combinators from

these basic ones

• [ 1; 2; 3] |> List.reduce (+)

// 1 + 2 + 3

[ pcharA; pcharB; pcharC] |> List.reduce ( .>>. )

// pcharA .>>. pcharB .>>. pcharC

[ pcharA; pcharB; pcharC] |> List.reduce ( )

// pcharA pcharB pcharC

Using reduce to combine parsers

• let choice listOfParsers =

listOfParsers |> List.reduce ( )

let anyOf listOfChars =

listOfChars

|> List.map pchar // convert char into Parser

|> choice // combine them all

let parseLowercase = anyOf ['a'..'z']

let parseDigit = anyOf ['0'..'9']

Using reduce to combine parsers

• /// Convert a list of parsers into a Parser of list

let sequence listOfParsers =

let concatResults p1 p2 = // helper

p1 .>>. p2

|>> (fun (list1,list2) -> list1 @ list2)

listOfParsers

// map each parser result to a list

|> Seq.map (fun parser -> parser |>> List.singleton)

// reduce by concatting the results of AndThen

|> Seq.reduce concatResults

Using reduce to combine parsers

• /// match a specific string

let pstring str =

str

// map each char to a pchar

|> Seq.map pchar

// convert to Parser

|> sequence

// convert Parser to Parser

|>> List.toArray

// convert Parser to Parser

|>> String

Using reduce to combine parsers

• Demo

• Yet more combinators

• “More than one” combinators

let many p = ... // zero or more

let many1 p = ... // one or more

let opt p = ... // zero or one

// example

let whitespaceChar = anyOf [' '; '\t'; '\n']

let whitespace = many1 whitespaceChar

• “Throwing away” combinators p1 .>> p2 // throw away right side

p1 >>. p2 // throw away left side

// keep only the inside value

let between p1 p2 p3 = p1 >>. p2 .>> p3

// example

let pdoublequote = pchar '"'

let quotedInt = between pdoublequote pint pdoublequote

• “Separator” combinators

let sepBy1 p sep = ... /// one or more p separated by sep

let sepBy p sep = ... /// zero or more p separated by sep

// example

let comma = pchar ','

let digit = anyOf ['0'..'9']

let oneOrMoreDigitList = sepBy1 digit comma

• Demo

• Part 5:

Improving the error messages

• input

Parser

Named parsers

Name: “Digit”

Parsing Function:

• Named parsers

let ( ) = setLabel // infix version

run parseDigit "ABC" // without the label

// Error parsing "9" : Unexpected 'A'

let parseDigit_WithLabel = anyOf ['0'..'9'] "digit"

run parseDigit_WithLabel "ABC" // with the label

// Error parsing "digit" : Unexpected 'A'

• input

Parser

Extra input context

Input: * Stream of characters * Line, Column

• Extra input context

run pint "-Z123"

// Line:0 Col:1 Error parsing integer

// -Z123

// ^Unexpected 'Z'

run pfloat "-123Z45"

// Line:0 Col:4 Error parsing float

// -123Z45

// ^Unexpected 'Z'

• Part 6:

Building a JSON Parser

• // A type that represents the previous diagram

type JValue =

| JString of string

| JNumber of float

| JObject of Map

| JArray of JValue list

| JBool of bool

| JNull

• Parsing JSON Null

• // new helper operator.

let (>>%) p x =

p |>> (fun _ -> x) // runs parser p, but ignores the result

// Parse a "null"

let jNull =

pstring "null"

>>% JNull // map to JNull

"null" // give it a label

• Parsing JSON Bool

• // Parse a boolean

let jBool =

let jtrue = pstring "true"

>>% JBool true // map to JBool

let jfalse = pstring "false"

>>% JBool false // map to JBool

// choose between true and false

jtrue jfalse

"bool" // give it a label

• Parsing a JSON String

• Call this "unescaped char"

• /// Parse an unescaped char

let jUnescapedChar =

let label = "char"

satisfy (fun ch -> (ch '\\') && (ch '\"') ) label

• Call this "escaped char"

• let jEscapedChar =

[ // each item is (stringToMatch, resultChar)

("\\\"",'\"') // quote

("\\\\",'\\') // reverse solidus

("\\/",'/') // solidus

("\\b",'\b') // backspace

("\\f",'\f') // formfeed

("\\n",'\n') // newline

("\\r",'\r') // cr

("\\t",'\t') // tab

]

// convert each pair into a parser

|> List.map (fun (toMatch,result) -> pstring toMatch >>% result)

// and combine them into one

|> choice

"escaped char" // set label

• Call this "unicode char"

• "unescaped char" or

"escaped char" or

"unicode char"

• let quotedString =

let quote = pchar '\"' "quote"

let jchar =

jUnescapedChar jEscapedChar jUnicodeChar

// set up the main parser

quote >>. manyChars jchar .>> quote

let jString =

// wrap the string in a JString

quotedString

|>> JString // convert to JString

• Parsing a JSON Number

• "int part"

"sign part"

• let optSign = opt (pchar '-')

let zero = pstring "0"

let digitOneNine =

satisfy (fun ch -> Char.IsDigit ch && ch '0') "1-9"

let digit = satisfy (fun ch -> Char.IsDigit ch ) "digit"

let nonZeroInt =

digitOneNine .>>. manyChars digit

|>> fun (first,rest) -> string first + rest

// set up the integer part

let intPart = zero nonZeroInt

• "fraction part"

• // set up the fraction part

let point = pchar '.'

let fractionPart =

point >>. manyChars1 digit

• "exponent part"

• // set up the exponent part

let e = pchar 'e' pchar 'E'

let optPlusMinus = opt (pchar '-' pchar '+')

let exponentPart =

e >>. optPlusMinus .>>. manyChars1 digit

• "exponent part"

"int part"

"fraction part" "sign part"

• // set up the main JNumber parser

optSign

.>>. intPart

.>>. opt fractionPart

.>>. opt exponentPart

|>> convertToJNumber // not shown

• Parsing JSON Arrays and Objects

• Completing the JSON Parser

• // the final parser combines the others together

let jValue = choice

[

jNull

jBool

jNumber

jString

jArray

jObject

]

• Demo: the JSON parser in action

• Summary • Treating a function like an object

– Returning a function from a function

– Wrapping a function in a type

• Working with a "recipe" (aka "effect")

– Combining recipes before running them.

• The power of combinators

– A few basic combinators: "andThen", "orElse", etc.

– Complex parsers are built from smaller components.

• Combinator libraries are small but powerful

– Less than 500 lines for combinator library

– Less than 300 lines for JSON parser itself

• Want more? • For a production-ready library for F#,

search for "fparsec"

• There are similar libraries for other languages

http://www.quanttec.com/fparsec/
• Thanks!

@ScottWlaschin

fsharpforfunandprofit.com/parser

Contact me

Slides and video here

Let us know if you need help with F#

http://fsharpforfunandprofit.com/parser/ http://fsharpforfunandprofit.com/parser/