Understanding parser combinators

  • Published on
    18-Mar-2018

  • View
    1.156

  • Download
    0

Transcript

  • Understanding Parser

    Combinators

    @ScottWlaschin

    fsharpforfunandprofit.com/parser

  • let digit = satisfy (fun ch -> Char.IsDigit ch ) "digit"

    let point = pchar '.'

    let e = pchar 'e' pchar 'E'

    let optPlusMinus = opt (pchar '-' pchar '+')

    let nonZeroInt =

    digitOneNine .>>. manyChars digit

    |>> fun (first,rest) -> string first + rest

    let intPart = zero nonZeroInt

    let fractionPart = point >>. manyChars1 digit

    let exponentPart = e >>. optPlusMinus .>>. manyChars1 digit

    Typical code using parser combinators

  • let digit = satisfy (fun ch -> Char.IsDigit ch ) "digit"

    let point = pchar '.'

    let e = pchar 'e' pchar 'E'

    let optPlusMinus = opt (pchar '-' pchar '+')

    let nonZeroInt =

    digitOneNine .>>. manyChars digit

    |>> fun (first,rest) -> string first + rest

    let intPart = zero nonZeroInt

    let fractionPart = point >>. manyChars1 digit

    let exponentPart = e >>. optPlusMinus .>>. manyChars1 digit

  • Overview

    1. What is a parser combinator library?

    2. The foundation: a simple parser

    3. Three basic parser combinators

    4. Building combinators from other combinators

    5. Improving the error messages

    6. Building a JSON parser

  • Part 1

    What is a parser combinator

    library?

  • Something to match

    Parser Create step in parsing recipe

    Creating a parsing recipe

    A “Parser-making" function

    This is a recipe to make something, not the thing

    itself

  • Parser

    Combining parsing recipes

    A recipe to make a more complicated thing

    Parser Parser combined

    with

    A "combinator"

  • Parser Run

    Running a parsing recipe

    input

    Success or

    Failure

  • Why parser combinators?

    • Written in your favorite programming language

    • No preprocessing needed

    – Lexing, parsing, AST transform all in one.

    – REPL-friendly

    • Easy to create little DSLs

    – Google "fogcreek fparsec"

    • Fun way of understanding functional composition

  • Part 2:

    A simple parser

  • Version 1 – parse the character 'A'

    input

    pcharA remaining input

    true/false

  • Version 1 – parse the character 'A'

    input

    pcharA remaining input

    true/false

  • let pcharA input =

    if String.IsNullOrEmpty(input) then

    (false,"")

    else if input.[0] = 'A' then

    let remaining = input.[1..]

    (true,remaining)

    else

    (false,input)

  • Version 2 – parse any character

    matched char input

    pchar remaining input

    charToMatch failure message

  • let pchar (charToMatch,input) =

    if String.IsNullOrEmpty(input) then

    "No more input"

    else

    let first = input.[0]

    if first = charToMatch then

    let remaining = input.[1..]

    (charToMatch,remaining)

    else

    sprintf "Expecting '%c'. Got '%c'" charToMatch first

  • Fix – create a choice type to capture either case

    Success: matched char input

    pchar Success: remaining input

    charToMatch Failure: message

    type Result

  • Fix – create a choice type to capture either case

    Success: matched char input

    pchar Success: remaining input

    charToMatch Failure: message

    type Result

  • Fix – create a choice type to capture either case

    Success: matched char input

    pchar Success: remaining input

    charToMatch Failure: message

    type Result

  • let pchar (charToMatch,input) =

    if String.IsNullOrEmpty(input) then

    Failure "No more input"

    else

    let first = input.[0]

    if first = charToMatch then

    let remaining = input.[1..]

    Success (charToMatch,remaining)

    else

    let msg = sprintf "Expecting '%c'. Got '%c'" charToMatch first

    Failure msg

  • Version 3 – returning a function

    Success: matched char input

    pchar Success: remaining input

    charToMatch Failure: message

  • Version 3 – returning a function

    Success: matched char input

    pchar Success: remaining input

    charToMatch Failure: message

  • Version 3 – returning a function

    input pchar

    charToMatch

  • Version 3 – returning a function

    charToMatch pchar

  • Version 3 – returning a function

    charToMatch pchar

  • Version 4 – wrapping the function in a type

    charToMatch pchar

    Parser

  • Version 4 – wrapping the function in a type

    charToMatch pchar

    Parser

    type Parser)

    A function that takes a string and returns a Result

  • Version 4 – wrapping the function in a type

    charToMatch pchar

    Parser

    type Parser)

    Wrapper

  • Creating parsing recipes

  • charToMatch input

    Parser

    A parsing recipe for a char

  • Parser Run

    Running a parsing recipe

    input

    Success, or

    Failure

  • Running a parsing recipe

    input

    Parser

    Parser Run

    input

    Success, or

    Failure

  • let run parser input =

    // unwrap parser to get inner function

    let (Parser innerFn) = parser

    // call inner function with input

    innerFn input

  • Enough talk,

    show me some code

  • Part 3:

    Three basic combinators

  • What is a combinator?

    • A “combinator” library is a library designed around

    combining things to get more complex values of

    the same type.

    • integer + integer = integer

    • list @ list = list // @ is list concat

    • Parser ?? Parser = Parser

  • Basic parser combinators

    • Parser andThen Parser => Parser

    • Parser orElse Parser => Parser

    • Parser map (transformer) => Parser

  • AndThen parser combinator

    • Run the first parser.

    – If there is a failure, return.

    • Otherwise, run the second parser with the

    remaining input.

    – If there is a failure, return.

    • If both parsers succeed, return a pair (tuple)

    that contains both parsed values.

  • let andThen parser1 parser2 =

    let innerFn input =

    // run parser1 with the input

    let result1 = run parser1 input

    // test the 1st parse result for Failure/Success

    match result1 with

    | Failure err ->

    Failure err // return error from parser1

    | Success (value1,remaining1) ->

    // run parser2 with the remaining input

    (continued on next slide..)

  • let andThen parser1 parser2 =

    [...snip...]

    let result2 = run parser2 remaining1

    // test the 2nd parse result for Failure/Success

    match result2 with

    | Failure err ->

    Failure err // return error from parser2

    | Success (value2,remaining2) ->

    let combinedValue = (value1,value2)

    Success (combinedValue,remaining2)

    // return the inner function

    Parser innerFn

  • OrElse parser combinator

    • Run the first parser.

    • On success, return the parsed value, along

    with the remaining input.

    • Otherwise, on failure, run the second parser

    with the original input...

    • ...and in this case, return the result (success or

    failure) from the second parser.

  • let orElse parser1 parser2 =

    let innerFn input =

    // run parser1 with the input

    let result1 = run parser1 input

    // test the result for Failure/Success

    match result1 with

    | Success result ->

    // if success, return the original result

    result1

    | Failure err ->

    // if failed, run parser2 with the input

    (continued on next slide..)

  • let orElse parser1 parser2 =

    [...snip...]

    | Failure err ->

    // if failed, run parser2 with the input

    let result2 = run parser2 input

    // return parser2's result

    result2

    // return the inner function

    Parser innerFn

  • Map parser combinator

    • Run the parser.

    • On success, transform the parsed value using

    the provided function.

    • Otherwise, return the failure

  • let mapP f parser =

    let innerFn input =

    // run parser with the input

    let result = run parser input

    // test the result for Failure/Success

    match result with

    | Success (value,remaining) ->

    // if success, return the value transformed by f

    let newValue = f value

    Success (newValue, remaining)

    (continued on next slide..)

  • let mapP f parser =

    [...snip...]

    | Failure err ->

    // if failed, return the error

    Failure err

    // return the inner function

    Parser innerFn

  • Parser combinator operators

    pcharA .>>. pcharB // 'A' andThen 'B'

    pcharA pcharB // 'A' orElse 'B'

    pcharA |>> (...) // map ch to something

  • Demo

  • Part 4:

    Building complex combinators from

    these basic ones

  • [ 1; 2; 3] |> List.reduce (+)

    // 1 + 2 + 3

    [ pcharA; pcharB; pcharC] |> List.reduce ( .>>. )

    // pcharA .>>. pcharB .>>. pcharC

    [ pcharA; pcharB; pcharC] |> List.reduce ( )

    // pcharA pcharB pcharC

    Using reduce to combine parsers

  • let choice listOfParsers =

    listOfParsers |> List.reduce ( )

    let anyOf listOfChars =

    listOfChars

    |> List.map pchar // convert char into Parser

    |> choice // combine them all

    let parseLowercase = anyOf ['a'..'z']

    let parseDigit = anyOf ['0'..'9']

    Using reduce to combine parsers

  • /// Convert a list of parsers into a Parser of list

    let sequence listOfParsers =

    let concatResults p1 p2 = // helper

    p1 .>>. p2

    |>> (fun (list1,list2) -> list1 @ list2)

    listOfParsers

    // map each parser result to a list

    |> Seq.map (fun parser -> parser |>> List.singleton)

    // reduce by concatting the results of AndThen

    |> Seq.reduce concatResults

    Using reduce to combine parsers

  • /// match a specific string

    let pstring str =

    str

    // map each char to a pchar

    |> Seq.map pchar

    // convert to Parser

    |> sequence

    // convert Parser to Parser

    |>> List.toArray

    // convert Parser to Parser

    |>> String

    Using reduce to combine parsers

  • Demo

  • Yet more combinators

  • “More than one” combinators

    let many p = ... // zero or more

    let many1 p = ... // one or more

    let opt p = ... // zero or one

    // example

    let whitespaceChar = anyOf [' '; '\t'; '\n']

    let whitespace = many1 whitespaceChar

  • “Throwing away” combinators p1 .>> p2 // throw away right side

    p1 >>. p2 // throw away left side

    // keep only the inside value

    let between p1 p2 p3 = p1 >>. p2 .>> p3

    // example

    let pdoublequote = pchar '"'

    let quotedInt = between pdoublequote pint pdoublequote

  • “Separator” combinators

    let sepBy1 p sep = ... /// one or more p separated by sep

    let sepBy p sep = ... /// zero or more p separated by sep

    // example

    let comma = pchar ','

    let digit = anyOf ['0'..'9']

    let oneOrMoreDigitList = sepBy1 digit comma

  • Demo

  • Part 5:

    Improving the error messages

  • input

    Parser

    Named parsers

    Name: “Digit”

    Parsing Function:

  • Named parsers

    let ( ) = setLabel // infix version

    run parseDigit "ABC" // without the label

    // Error parsing "9" : Unexpected 'A'

    let parseDigit_WithLabel = anyOf ['0'..'9'] "digit"

    run parseDigit_WithLabel "ABC" // with the label

    // Error parsing "digit" : Unexpected 'A'

  • input

    Parser

    Extra input context

    Input: * Stream of characters * Line, Column

  • Extra input context

    run pint "-Z123"

    // Line:0 Col:1 Error parsing integer

    // -Z123

    // ^Unexpected 'Z'

    run pfloat "-123Z45"

    // Line:0 Col:4 Error parsing float

    // -123Z45

    // ^Unexpected 'Z'

  • Part 6:

    Building a JSON Parser

  • // A type that represents the previous diagram

    type JValue =

    | JString of string

    | JNumber of float

    | JObject of Map

    | JArray of JValue list

    | JBool of bool

    | JNull

  • Parsing JSON Null

  • // new helper operator.

    let (>>%) p x =

    p |>> (fun _ -> x) // runs parser p, but ignores the result

    // Parse a "null"

    let jNull =

    pstring "null"

    >>% JNull // map to JNull

    "null" // give it a label

  • Parsing JSON Bool

  • // Parse a boolean

    let jBool =

    let jtrue = pstring "true"

    >>% JBool true // map to JBool

    let jfalse = pstring "false"

    >>% JBool false // map to JBool

    // choose between true and false

    jtrue jfalse

    "bool" // give it a label

  • Parsing a JSON String

  • Call this "unescaped char"

  • /// Parse an unescaped char

    let jUnescapedChar =

    let label = "char"

    satisfy (fun ch -> (ch '\\') && (ch '\"') ) label

  • Call this "escaped char"

  • let jEscapedChar =

    [ // each item is (stringToMatch, resultChar)

    ("\\\"",'\"') // quote

    ("\\\\",'\\') // reverse solidus

    ("\\/",'/') // solidus

    ("\\b",'\b') // backspace

    ("\\f",'\f') // formfeed

    ("\\n",'\n') // newline

    ("\\r",'\r') // cr

    ("\\t",'\t') // tab

    ]

    // convert each pair into a parser

    |> List.map (fun (toMatch,result) -> pstring toMatch >>% result)

    // and combine them into one

    |> choice

    "escaped char" // set label

  • Call this "unicode char"

  • "unescaped char" or

    "escaped char" or

    "unicode char"

  • let quotedString =

    let quote = pchar '\"' "quote"

    let jchar =

    jUnescapedChar jEscapedChar jUnicodeChar

    // set up the main parser

    quote >>. manyChars jchar .>> quote

    let jString =

    // wrap the string in a JString

    quotedString

    |>> JString // convert to JString

    "quoted string" // add label

  • Parsing a JSON Number

  • "int part"

    "sign part"

  • let optSign = opt (pchar '-')

    let zero = pstring "0"

    let digitOneNine =

    satisfy (fun ch -> Char.IsDigit ch && ch '0') "1-9"

    let digit = satisfy (fun ch -> Char.IsDigit ch ) "digit"

    let nonZeroInt =

    digitOneNine .>>. manyChars digit

    |>> fun (first,rest) -> string first + rest

    // set up the integer part

    let intPart = zero nonZeroInt

  • "fraction part"

  • // set up the fraction part

    let point = pchar '.'

    let fractionPart =

    point >>. manyChars1 digit

  • "exponent part"

  • // set up the exponent part

    let e = pchar 'e' pchar 'E'

    let optPlusMinus = opt (pchar '-' pchar '+')

    let exponentPart =

    e >>. optPlusMinus .>>. manyChars1 digit

  • "exponent part"

    "int part"

    "fraction part" "sign part"

  • // set up the main JNumber parser

    optSign

    .>>. intPart

    .>>. opt fractionPart

    .>>. opt exponentPart

    |>> convertToJNumber // not shown

    "number" // add label

  • Parsing JSON Arrays and Objects

  • Completing the JSON Parser

  • // the final parser combines the others together

    let jValue = choice

    [

    jNull

    jBool

    jNumber

    jString

    jArray

    jObject

    ]

  • Demo: the JSON parser in action

  • Summary • Treating a function like an object

    – Returning a function from a function

    – Wrapping a function in a type

    • Working with a "recipe" (aka "effect")

    – Combining recipes before running them.

    • The power of combinators

    – A few basic combinators: "andThen", "orElse", etc.

    – Complex parsers are built from smaller components.

    • Combinator libraries are small but powerful

    – Less than 500 lines for combinator library

    – Less than 300 lines for JSON parser itself

  • Want more? • For a production-ready library for F#,

    search for "fparsec"

    • There are similar libraries for other languages

    http://www.quanttec.com/fparsec/
  • Thanks!

    @ScottWlaschin

    fsharpforfunandprofit.com/parser

    Contact me

    Slides and video here

    Let us know if you need help with F#

    http://fsharpforfunandprofit.com/parser/ http://fsharpforfunandprofit.com/parser/