Subscribe to The Madman Scribblings        RSS Feed
-----

Are there any 2-D Lexers / Parsers?

Icon 8 Comments
More of an Ask. Is there any 2-D Lexers / Parsers?

All parsers I know read the input text as if it was a stream of characters.

Left To Right, Top to Bottom

Example source text.
 
a ::= [ 1 2 3 ]
        4 5 6

        1 2 3
b ::= [ 4 5 6 ] 
        7 8 9
 



I can sort of define the grammar like so
            space ::= ' '
           spaces ::= space+
            digit ::= '0'...'9'
           digits ::= digit+
             plus ::= '+'
            minus ::= '-'
           number ::= (plus | minus)? digits
definition_symbol ::= "::="
   matrice_values :: = (spaces (number | identifier))+ spaces?  eol
          matrice ::= '[' matrice_values ']'
           define ::= indentfier spaces definition_symbol spaces
       matrice_id ::= define  matrice  spaces eol
           matrix ::= matrice_values?  matric_id matric_values? eol




The 2D Parser / Lexer

This would consider the source text as a two dimensional grid of characters
Now suppose I want a grammar that express the following, and it to mean to matix

        1 2 3       10  0  0
c ::= [ 4 5 6 ] + [  0  1  0 ]
        7 8 9        0  0  1 



The first line of it valid for the matrice definition in the 1D Parser.
When I really think it should be two separate matrices.

matric ::=   matrice_id  /\(matrice_values)? ('[' matrice_values ']') \/(matrice_values)? )  
add_matrice ::= define matric plus matric



Meaning of symbols
\/ Aligned below (to term on left)
/\ Aligned above (to term on right)


So are there any parser / lexer that can encode 2D positional relationships between terms?

8 Comments On This Entry

Page 1 of 1

ishkabible Icon

18 November 2012 - 01:06 PM
interesting idea, I don't think it would be very useful but it's definitely an interesting idea. I know of no languages that do this nor any lexer or parser generators that do this. there are some bigger issues to be worked out about how the characters actually come in and how do you tell the bottom of one thing from the top of another?

       1 2 3
c ::= [1 2 3]
       1 2 3
       1 2 3
d ::= [1 2 3]
       1 2 3



what does this mean? it's an ambiguity if you ask me.

say that the above issue is solved; there is a still the rather real issue of parsing this at all. the characters are stored in serial files yet we seek to imply that the production starts at a location ahead of the current location. If the goal is to have a nice syntax then perhaps it would be ideal to have a editor that could actually display the braces going from top to bottom. it could store it in a mark up or s-expression style language but the user need only see the pretty displayed version to edit it. you could write all sorts of neat mathematical equations in a WYSIWYG style and use all sorts of crazy syntax with a special editor.
0

jon.kiparsky Icon

18 November 2012 - 01:25 PM
A matrix is a linear array. The multidimensional addressing is really just math/syntactic sugar. See Cantor's work on equivalence of number of points on the line/points on the plane.

So you could write a parser with special notations for multidimensional matrices (why stop at 2?) or you could just define a grammar: the grammar would want to treat the incoming data as a linear array while allowing the user to consider it as a matrix in N dimensions. Converting it to the matrix, given the dimensions, is trivial.
Interesting question, then, is to define the grammar that allows those dimensions to be discoverable (by machine) and obvious (by eye) without being explicitly stated outside of the data. That way, changes to the data don't have to be made twice (once in representation, once in description) so you avoid a class of errors that would be easy and annoying.

I won't go to the trouble of suggesting that Lisp and XML have both got reasonable solutions to this, since this will already have occurred to you. There might be better ones, though.

The proposed grammar, as exemplified here:

        1 2 3       10  0  0
c ::= [ 4 5 6 ] + [  0  1  0 ]
        7 8 9        0  0  1 


is awkward in that backtracking is required. I'd need to store some number of lines, unparsed, until I get to the magic line that tells me the dimensions of the matrices being added.

I might want to add larger matrices:

7 8 9 0 0 1
        1 2 3       10  0  0
        7 8 9        0  0  1 
        1 2 3       10  0  0
        1 2 3       10  0  0
c ::= [ 4 5 6 ] + [  0  1  0 ]
        7 8 9        0  0  1 
        1 2 3       10  0  0
        7 8 9        0  0  1 
        1 2 3       10  0  0
        7 8 9        0  0  1 
        1 2 3       10  0  0




It's also not clear to me how this would extend to matrices of greater than 2 dimensions.
0

AdamSpeight2008 Icon

18 November 2012 - 04:02 PM
If we think of a lexeme being encircled by potentially other lexemes
 [ NW ][ N ][ NE ]
 [  W ][   ][ E  ]
 [ SW ][ S ][ SE ]



Then using a modified BFN for the 2D Grammer, then I can potential define the matrix lexeme like so.

 matrix_value ::= [ (numbers | identifier) ]

matrix_values ::= [ matrix_value ]
                  { E:= matrix_value? }

      matrix  ::= [ "[ " ][ matrix_values ][ "] "]
                  { N:= matrix_values? 
                    S:= matrix_values? }

      indexer ::= [ identifier ]
                  { SE:= indice }

0

ishkabible Icon

18 November 2012 - 07:01 PM
I'm sure that this is possible even if back tracking is required but it's still ambiguous. perhaps requiring a new line at the end of the set of possible new lines.

also forcing language constructs to fit in a single token is limiting, not freeing.
1

TheGDeveloper Icon

19 November 2012 - 06:35 AM
this can be done but it is useless. You do all this efford just to have weird text syntax
1

AdamSpeight2008 Icon

20 November 2012 - 02:06 AM
Let's presume the ambiguities are solvable.
ishkabible:=

Quote

... forcing language constructs to fit in a single token is limiting, not freeing.

How is it limiting? Can you expand on that premise?

expression ::= [ value | exponent ] 
exponent ::= [ expression ]{ nw:= ( expression ) }



TheGDeveloper:= Depends on the definition of 'weird'.

Haskell, F# and Python also have positional dependent language constructs, and they are considered "weird".
0

cfoley Icon

23 November 2012 - 05:31 AM
I know this isn't really what you were after, but Piet programs are crafted in 2D.
1

doveraudio Icon

17 December 2012 - 09:49 PM
matA = [0,0,1|0,1,0|1,0,0]
0
Page 1 of 1

Search My Blog

Recent Entries

Recent Comments