5 Replies - 746 Views - Last Post: 10 October 2016 - 03:12 PM

#1 NoYouDidnt  Icon User is offline

  • D.I.C Head

Reputation: 5
  • View blog
  • Posts: 55
  • Joined: 04-July 15

Stuck writing a parser in Haskell

Posted 05 October 2016 - 06:34 AM

Im Haskell newb stuck at trying to build an AST from an input string by first converting it to a list of tokens, and then parsing it recursively.
The point that im stuck at is making the submethods of parse :: [String] -> Ast work together.

This is the grammar that im using, in BNF

    <Block> ::= <Expr>;
    <Expr> ::= <Number> | <App>
    <App> ::= (<Expr>, <Expr>) <Func>
    <Func> ::= + | - | * | \
    <Number> ::= <Digit> | <Digit><Number>
    <Digit> ::= 0 | 1 | .... | 9



I have this type(s):

data Ast = Number Integer | Func String | App Ast [Ast] | Block [Ast] 


I wrote a tokenize method (that is not completely finished, but gives the right idea)

    tokenize :: String -> [String]
    tokenize [] = []
    tokenize xs @ (x : xs')
        | x `elem` t = [x] : tokenize xs'
        | isDigit x = [y | y <- takeWhile isDigit xs] : (tokenize (dropWhile isDigit xs))
        | otherwise = tokenize xs'
            where t = ['+', '-', '*', '/', '(', ')', ';']



To start with, I have the method
    parse :: String -> Ast
    parse xs = parseBlock (tokenize xs)



Then I have these methods that im stuck at writing, trying to make them work together.
    parseBlock :: [String] -> (Ast, [String])

    parseExpr :: [String] -> (Ast, [String])

    parseApp :: [String] -> (Ast, [String])

    parseFunc :: [String] -> (Ast, [String])

    parseNumber :: [String] -> (Ast, [String])



If the input string is e.g. "(14, 27)+;"

it should return (Block [ App (Func "+") [Number 14, Number 12]], []), where the second element should contain the trash that is not parsed.

Now I realise I should make the method parseBlock call the parseExpr, which decides whether to call parseApp or parseNumber etc. But the examples I have found is either different or a bit complicated. Wonder if someone could help me in the right direction I would be very thankful! Thank you.

Is This A Good Question/Topic? 0
  • +

Replies To: Stuck writing a parser in Haskell

#2 NoYouDidnt  Icon User is offline

  • D.I.C Head

Reputation: 5
  • View blog
  • Posts: 55
  • Joined: 04-July 15

Re: Stuck writing a parser in Haskell

Posted 06 October 2016 - 03:52 AM

I realise the question is a bit wide. I have gotten a bit further, but stuck on the parseApp method.


This is what I have

data Ast = Number Integer | Func String | App Ast [Ast] | Block [Ast]

parse :: String -> Ast
parse xs = fst (parseBlock (tokenize xs))

parseBlock :: [String] -> (Ast, [String]) 
parseBlock xss = let (a, b ) = parseExpr (init xss) in (Block [a], b ) 

parseExpr :: [String] -> (Ast, [String]) 
parseExpr xss @ (xs : xss') 
    | all isDigit xs = parseNumber xss
    | otherwise = parseApp xss


{- 
somehow make a string like "((10, 2)-, (0, 2)-)+" become App (Name "+") [ App (Name "-") [Number 10,  Number 2], App (Name "-")  [Number 0, Number 2]] 
-}

parseApp :: [String] -> (Ast, [String]) 
parseApp xss = let (a, b ) = parseExpr xss in
    if 
    else if
    else 

    

parseName :: [String] -> (Ast, [String]) -- + | - | * | /
parseName (xs : xss) = (Name xs, xss)

parseNumber :: [String] -> (Ast, [String]) 
parseNumber (xs : xss) = (Number (read xs :: Integer), xss)

tokenize :: String -> [String]
tokenize [] = []
tokenize xs @ (x : xs')
    | x `elem` t = [x] : tokenize xs'
    | isDigit x = [y | y <- takeWhile isDigit xs] : (tokenize (dropWhile isDigit xs))
    | otherwise = tokenize xs'
        where t = ['+', '-', '*', '/', '(', ')', ';']



This post has been edited by NoYouDidnt: 06 October 2016 - 03:53 AM

Was This Post Helpful? 0
  • +
  • -

#3 sepp2k  Icon User is offline

  • D.I.C Lover
  • member icon

Reputation: 2506
  • View blog
  • Posts: 3,959
  • Joined: 21-June 11

Re: Stuck writing a parser in Haskell

Posted 06 October 2016 - 10:32 AM

I feel like checking whether the current token is a number outside of the parseNumber function goes counter to the separation of concerns. In my mind, the parseNumber function should check whether the current token is a number and fail if it isn't. This means you need to change your signatures to allow failures (e.g. by making the result types Maybes1), but the way I see it, you can't write a proper parser otherwise anyway.

With that done, your parseExpr function should simply invoke parseNumber without any prior checks. Then if that parse failed, it should try parseApp as the alternative. And if you ever add any other types of expression later, those should be tried in case parseApp fails and so. Basically a function like parseExpr should just be a big chain of "parseX orelse parseY orelse ...".

So now for the parseApp function. Applications start with a "(", right? So that's the first thing to check: that the current token is a "(", otherwise the parse fails. If the parse continues you want an expression, a comma, another expression, a ")" and an operator name. The two expressions you get by calling parseExpr and the other tokens you handle the same way as "(": you check that you have the right type of token and if so, consume it (i.e. let the rest of the function use the remaining tokens).

PS: This doesn't address your problem, but unless you were explicitly instructed to define `tokenize` as `String -> [String]`, I'd recommend defining `data TokenType = Number | Operator | LeftParen | RightParen` and `tokenize :: String -> [(String, TokenType)]`. That way you can more easily (and efficiently) check which type of token you have. For example instead of doing all isDigit xs, you could now just check whether the token type is Number.

1 In a real-world compiler, you'd probably want an Either or your own custom type, so you can carry around a proper error message in case of failure (as opposed to just Nothing), but for the beginning it's probably easiest not to worry about error messages.
Was This Post Helpful? 1
  • +
  • -

#4 NoYouDidnt  Icon User is offline

  • D.I.C Head

Reputation: 5
  • View blog
  • Posts: 55
  • Joined: 04-July 15

Re: Stuck writing a parser in Haskell

Posted 10 October 2016 - 05:06 AM

Thank you! I did what you said and now parseExpr is calling parseApp if parseNumber return Nothing and it works nicely. However I am still stuck writing the parseApp function.

I came this far (Havent made parseApp a Maybe type yet as there is no need until I proceed to the next tasks)

 
parseApp :: [String] -> (Ast, [String]) 
parseApp ("(" : xss) = let (a, b ) = parseExpr xss in
    if head b == "," then 
        let (c, d) = parseExpr (tail b ) in 
            if head d == ")" then (App (Name (head (drop 1 d))) [a, c], tail ( drop 2 d))
            else error "expect ')'"
    else (a, b )
parseApp _ = error "expect '('"



But it doesnt work for anything but simple expressions like "(10, 2)+;".

This is what happens currently:

λ > parse "(10, 2)+;"
Block [App (Name "+") [Number 10,Number 2]]
λ > parse "((1, 1)+, 4)-;"
Block [App (Name "+") [Number 1,Number 1]]
λ > parse "(4, (1, 1)+)-;"
Block [*** Exception: expect ')'
CallStack (from HasCallStack):
  error, called at Oblig12016.hs:21:18 in main:Main




Not sure how to write this. I have a vague idea of what to do / what is wrong, but I can't make it work whatever I try. Would be very thankful for any help writing this function
Was This Post Helpful? 0
  • +
  • -

#5 sepp2k  Icon User is offline

  • D.I.C Lover
  • member icon

Reputation: 2506
  • View blog
  • Posts: 3,959
  • Joined: 21-June 11

Re: Stuck writing a parser in Haskell

Posted 10 October 2016 - 07:15 AM

When there's no , after the first expression, you return the expression before the comma. According to the grammar that should be an error though (unless <Expr> ::= (<Expr>) is a valid production and you just missed it in the grammar).

I also feel like your parse function should cause an error when the second element of the tuple (i.e. the remaining source) is not empty and that error message should include the remaining source. So you know that the whole string wasn't passed and which part couldn't be parsed.

It would also be helpful if your "expect" error messages also contain the actual string. That is replace error "expect ')'" with error ("expected ')', but got '" ++ head d ++ "'"). That's make finding mistakes much easier.

At a glance though, it looks like your problem is that you drop 3 token from the stream where you should be dropping only 2: tail (drop 2 d) should be tail (tail d) or just [ul]drop 2 d[/il]. If you use tail and drop 2, the result is the same as drop 3.

In general though you should avoid head and tail altogether and always take lists apart using pattern matching. That way, this kind of off-by-one error becomes virtually impossible:

-- Old code:
-- if head d == ")" then (App (Name (head (drop 1 d))) [a, c], tail ( drop 2 d))
-- else error "expect ')'"
case d of
  ")" : op : rest -> App (Name op) [a, c], rest
  token : _ -> error $ "expected ')', but got '" ++ token ++ "'"
  [] -> error "expected ')', but got end of file"



PS: It would help the readability of your code a lot if you used meaningful variable names rather than consecutive letters of the alphabet.

This post has been edited by sepp2k: 10 October 2016 - 07:33 AM

Was This Post Helpful? 1
  • +
  • -

#6 NoYouDidnt  Icon User is offline

  • D.I.C Head

Reputation: 5
  • View blog
  • Posts: 55
  • Joined: 04-July 15

Re: Stuck writing a parser in Haskell

Posted 10 October 2016 - 03:12 PM

Thank you so much!
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1