10 Replies - 2156 Views - Last Post: 14 December 2013 - 12:29 PM

#1 Gerry Rzeppa  Icon User is offline

  • New D.I.C Head

Reputation: 3
  • View blog
  • Posts: 6
  • Joined: 13-December 13

Natural Language Programming

Posted 13 December 2013 - 01:38 AM

Having programmed for many years in many languages, I often find myself thinking in a kind of natural language pseudo-code, then translating it into whatever I'm working with at that time. So one day I thought, Why not simply code at a natural language level and skip the translation step? I talked it over with my elder son, also a programmer, and we decided to test the theory. Specficially, we wanted to know:

1. Is it easier to program when you don’t have to translate your natural-language thoughts into an alternate syntax?

2. Can natural languages be parsed in a relatively “sloppy” manner (as humans apparently parse them) and still provide a stable enough environment for productive programming?

3. Can complex low-level programs (like compilers) be conveniently and efficiently written in high level languages (like English)?

And so we set about developing a "plain english" compiler in the interest of answering those questions. And we are happy to report that we can now answer each of those three questions, from direct experience, with a resounding “Yes!” Here are some details:

Our parser operates, we think, something like the parsing centers in the human brain. Consider, for example, a father saying to his baby son:

“Want to suck on this bottle, little guy?”

And the kid hears,

“blah, blah, SUCK, blah, blah, BOTTLE, blah, blah.”

But he properly responds because he’s got a “picture” of a bottle in the right side of his head connected to the word “bottle” on the left side, and a pre-existing “skill” near the back of his neck connected to the term “suck”. In other words, the kid matches what he can with the pictures (types) and skills (routines) he’s accumulated, and simply disregards the rest. Our compiler does very much the same thing, with new pictures (types) and skills (routines) being defined -- not by us, but -- by the programmer, as he writes new application code.

A typical type definition looks like this:

A polygon is a thing with some vertices.

Internally, the name “polygon” is now associated with a type of dynamically-allocated structure that contains a doubly-linked list of vertices. “Vertex” is defined elsewhere (before or after this definition) in a similar fashion; the plural is automatically understood.

A typical routine looks like this:

To append an x coord and a y coord to a polygon:
Create a vertex given the x and the y.
Append the vertex to the polygon’s vertices.


Note that formal names (proper nouns) are not required for parameters and variables. This, we believe, is a major insight. My real-world chair and table are never (in normal conversation) called “c” or “myTable” -- I refer to them simply as “the chair” and “the table”. Likewise here: “the vertex” and “the polygon” are the natural names for such things.

Note also that spaces are allowed in routine and variable “names” (like “x coord”). This is the 21st century, yes? And that “nicknames” are also allowed (such as “x” for “x coord”). And that possessives (“polygon’s vertices”) are used in a very natural way to reference “fields” within “records”.

Note, as well, that the word “given” could have been “using” or “with” or any other equivalent since our sloppy parsing focuses on the pictures (types) and skills (routines) needed for understanding, and ignores, as much as possible, the rest.

At the lowest level, things look like this:

To add a number to another number:
Intel $8B85080000008B008B9D0C0000000103.


Note that in this case we have both the highest and lowest of languages — English and machine code (in hexadecimal) -- in a single routine. The insight here is that (like a typical math book) a program should be written primarily in a natural language, with appropriate snippets in more convenient syntaxes as (and only as) required.

We hope someday soon to extend the technology to include Plain Spanish, and Plain French, and Plain German, etc.

Anyway, if you're interested, you can download the whole thing here: http://www.osmosian.com/cal-3040.zip . It’s a small Windows program, less than a megabyte in size. No installation necessary; just unzip and execute. But it's a complete development environment, including a unique interface, a simplified file manager, an elegant text editor, a handy hexadecimal dumper, a native-code-generating compiler/linker, and even a wysiwyg page layout facility (that we used to produce the documentation). If you start with the "instructions.pdf" in the “documentation” directory, before you go ten pages you won't just be writing "Hello, World!" to the screen: you’ll be re-compiling the whole shebang in itself (in less than three seconds on a bottom-of-the-line machine from Walmart).

Take a look. Then let us know what you think here on the forum.

Thanks,

Gerry Rzeppa
Grand Negus of the Osmosian Order of Plain English Programmers

Dan Rzeppa
Prime Assembler of the Osmosian Order of Plain English Programmers

Is This A Good Question/Topic? 1
  • +

Replies To: Natural Language Programming

#2 mostyfriedman  Icon User is offline

  • The Algorithmi
  • member icon

Reputation: 727
  • View blog
  • Posts: 4,473
  • Joined: 24-October 08

Re: Natural Language Programming

Posted 14 December 2013 - 12:30 AM

Impressive, I like how you used declarative and procedural semantics. How do you handle ambiguity though?. I think you're using pattern matching technique. You primarily look for verbs and translate those to functions, and nouns translate to objects. I am only guessing though from the top of my head.

This post has been edited by mostyfriedman: 14 December 2013 - 12:34 AM

Was This Post Helpful? 0
  • +
  • -

#3 Gerry Rzeppa  Icon User is offline

  • New D.I.C Head

Reputation: 3
  • View blog
  • Posts: 6
  • Joined: 13-December 13

Re: Natural Language Programming

Posted 14 December 2013 - 01:05 AM

View Postmostyfriedman, on 14 December 2013 - 12:30 AM, said:

Impressive, I like how you used declarative and procedural semantics. How do you handle ambiguity though?.

We deal with ambiguity as you would when speaking to, say, an employee: when he does what you asked, you're done; when he doesn't, you elaborate. Likewise when we're coding. We code a few lines, then test: if it works as intended, we assume the compiler has arrived at the proper interpretation; if it doesn't work properly, we provide further clarification. We don't see this as a shortcoming. We're convinced that computers of the future will work in a similar way -- after all, eventually we want to get to the point where we "program" our machines simply by talking to them, yes? And sometimes they'll misunderstand, and need clarification, as humans do.


View Postmostyfriedman, on 14 December 2013 - 12:30 AM, said:

I think you're using pattern matching technique. You primarily look for verbs and translate those to functions, and nouns translate to objects. I am only guessing though from the top of my head.

We actually make several passes at the code, compiling types first, then global variables, then routine headers. Most of that is standard recursive descent parsing in accord with a standard EBNF definition of the language. See page 11 of the instructions for a summary of that parsing. Then the fun begins. We compile the routine bodies by breaking each sentence into phrases (at article, conjunction, and prepositional boundaries); then we recursively interpret each phrase as a possible (1) string or numeric literal, (2) variable reference, (3) mathematical expression, or (4) irrelevant "noise words", and look for a matching routine header (compiled earlier) to call. I say "recursively interpret" because if one interpretation of the sentence doesn't yield a match, we try another, and another, etc, until we've exhausted the possibilities.
Was This Post Helpful? 1
  • +
  • -

#4 mostyfriedman  Icon User is offline

  • The Algorithmi
  • member icon

Reputation: 727
  • View blog
  • Posts: 4,473
  • Joined: 24-October 08

Re: Natural Language Programming

Posted 14 December 2013 - 01:19 AM

Interesting. Are there mechanisms to reason about context?
Was This Post Helpful? 0
  • +
  • -

#5 Gerry Rzeppa  Icon User is offline

  • New D.I.C Head

Reputation: 3
  • View blog
  • Posts: 6
  • Joined: 13-December 13

Re: Natural Language Programming

Posted 14 December 2013 - 01:49 AM

View Postmostyfriedman, on 14 December 2013 - 01:19 AM, said:

Interesting. Are there mechanisms to reason about context?

Only very primitive ones at present. If, for example, you reference a "field" in a "record" without specifying the record-level component, the compiler will look within all the records in the immediate context for a suitably unambiguous field; or if you specify a call with variable types that are not directly supported by any of the existing routines, the compiler will recursively reduce those types to more basic types and attempt to call a compatible lower-level routine. Some string and number conversions also take place automatically. And all variables can be reference by "nickname" ("the left side", for example, can be specified simply by saying "the left"). But that's all I can think of off the top of my head. We hope to add significant support for pronouns in the next version.
Was This Post Helpful? 0
  • +
  • -

#6 mostyfriedman  Icon User is offline

  • The Algorithmi
  • member icon

Reputation: 727
  • View blog
  • Posts: 4,473
  • Joined: 24-October 08

Re: Natural Language Programming

Posted 14 December 2013 - 01:58 AM

How about time and space complexity?. Will the programmer have full control over the amount of memory the program uses as well as the running time of the code?.
Was This Post Helpful? 0
  • +
  • -

#7 Gerry Rzeppa  Icon User is offline

  • New D.I.C Head

Reputation: 3
  • View blog
  • Posts: 6
  • Joined: 13-December 13

Re: Natural Language Programming

Posted 14 December 2013 - 02:27 AM

View Postmostyfriedman, on 14 December 2013 - 01:58 AM, said:

How about time and space complexity?. Will the programmer have full control over the amount of memory the program uses as well as the running time of the code?.

The programmer has as much control over memory use and execution time as the Windows operating system allows :)/>/> Global variables are allocated as start up; local variables are allocated on the stack at the time of each call and discarded on exit; parameters are passed by reference (though you can "privatize" them to get a copy; dynamic variables are allocated and destroyed by the programmer (no automatic garbage collection, though we do provide a handy one-command "deep deallocation" for certain structures. Execution of the code is linear (except for loops and calls) and deterministic, but since Windows multitasks there is no guarantee regarding execution times. It's a very small and efficient program, however, considering the range of it's functionality.

I think you'd enjoy the instruction manual, even if you don't want to run the program. Why not download it and see what it has to say?
Was This Post Helpful? 1
  • +
  • -

#8 mostyfriedman  Icon User is offline

  • The Algorithmi
  • member icon

Reputation: 727
  • View blog
  • Posts: 4,473
  • Joined: 24-October 08

Re: Natural Language Programming

Posted 14 December 2013 - 02:30 AM

Thanks for the responses. I will play around with it and send in any questions :).
Was This Post Helpful? 0
  • +
  • -

#9 jon.kiparsky  Icon User is online

  • Pancakes!
  • member icon


Reputation: 7961
  • View blog
  • Posts: 13,580
  • Joined: 19-March 11

Re: Natural Language Programming

Posted 14 December 2013 - 07:45 AM

I hate to be the one to throw cold water on the party, but looking at your documentation, I came across this:

Quote

So you can see that my power is rooted in my simplicity. I parse sentences pretty much the same way you do. I look for marker words — articles, verbs, conjunctions, prepositions — then work my way around them. No involved grammars, no ridiculously complicated parse trees, no obscure keywords.




I'm afraid that's not how you or I parse sentences. In fact "ridiculously complicated parse trees" are a vastly simplified version of the models developed to understand natural-language sentences. We don't just "look for marker words", in fact natural languages have syntax. So there's a difference between "John hit Mary" and "Mary hit John", to take an obvious case. Things get more interesting when we build more involved sentences, like

"This is the boy that John said he'd give one of his socks to, but he didn't."

Who does the final "he" refer to? What didn't he do? Who was to get the sock? How do we know these things? That's all syntax. So if you're not actually parsing sentences, you're not doing natural language. I suspect that you're doing something a lot more like Weizenbaums's Eliza, which is a fun toy but not something you could use for computation.

Quote

Our parser operates, we think, something like the parsing centers in the human brain. Consider, for example, a father saying to his baby son:

“Want to suck on this bottle, little guy?”

And the kid hears,

“blah, blah, SUCK, blah, blah, BOTTLE, blah, blah.”


No, this is more of a Gary Larson theory of language - it's much like the view that Augustine of Hippo proposed, but I don't think it's ever been proposed as a serious contender for a theory of language acquisition in the modern era. You have a little bootstrapping problem here. The child's input is not a delineated sequence of conveniently marked and isolated tokens, "What" and "to" and "suck" etc. Instead, it's a continuous stream of sounds. The child's task is much more complicated and much more interesting than you're suggesting here. In any case, the point is: this is not how any human brain works on language.

So just from the point of view of the linguist, my suggestion would be "go study some linguistics".


So much for the linguistics. As a programmer, this looks horrible. Why would I want to type "A polygon is a thing with vertices" and hope that my testing reveals all of the errors of interpretation that this allows, when I can use a well-designed language (or even something like VB or ML) and know exactly what I'm going to get? That is, either "A polygon is a thing with vertices" is a precise construct - in which case I might as well learn a less atrocious syntax - or or it's not a precise construct - in which case this is completely useless for programming.


Quote

We deal with ambiguity as you would when speaking to, say, an employee: when he does what you asked, you're done; when he doesn't, you elaborate.


Seriously? Why not just use a syntax that allows me to state my meaning unambiguously?

So from the point of view of the programmer, my suggestion would be that you study language design a little.

This post has been edited by jon.kiparsky: 14 December 2013 - 07:47 AM

Was This Post Helpful? 2
  • +
  • -

#10 modi123_1  Icon User is offline

  • Suitor #2
  • member icon



Reputation: 9497
  • View blog
  • Posts: 35,844
  • Joined: 12-June 08

Re: Natural Language Programming

Posted 14 December 2013 - 09:01 AM

Moving to 'share you project'.
Was This Post Helpful? 0
  • +
  • -

#11 Gerry Rzeppa  Icon User is offline

  • New D.I.C Head

Reputation: 3
  • View blog
  • Posts: 6
  • Joined: 13-December 13

Re: Natural Language Programming

Posted 14 December 2013 - 12:29 PM

View Postjon.kiparsky, on 14 December 2013 - 07:45 AM, said:

I hate to be the one to throw cold water on the party, but looking at your documentation, I came across this:

Quote

So you can see that my power is rooted in my simplicity. I parse sentences pretty much the same way you do. I look for marker words — articles, verbs, conjunctions, prepositions — then work my way around them. No involved grammars, no ridiculously complicated parse trees, no obscure keywords.

I'm afraid that's not how you or I parse sentences.

We believe that it is, based on years of study with small children. It appears that we have "buckets" in our heads for the who, what, when, where, why and how of a thought, and when someone speaks to us, we mentally divide the statement at certain marker words (like articles, conjunctions, and prepositions) and attempt to fill up the buckets with the resulting phrases. Which is why children will consistently reply to a statement like, "I'm going to the store" with a question like "When?" -- seeking information for the as-yet unfilled bucket(s).

View Postjon.kiparsky, on 14 December 2013 - 07:45 AM, said:

In fact "ridiculously complicated parse trees" are a vastly simplified version of the models developed to understand natural-language sentences. We don't just "look for marker words", in fact natural languages have syntax.

Of course they do. But we contend that most of that syntax is idiomatic, learned by rote; while the marker-word/bucket processing is in operation at least a year before a child learns to speak; and many years before a child has a clear concept of nouns, verbs, and other formal language classifications.

View Postjon.kiparsky, on 14 December 2013 - 07:45 AM, said:

So there's a difference between "John hit Mary" and "Mary hit John", to take an obvious case.

In some languages, yes; others use verb inflection to indicate subject and object and word order is less important, etc.

View Postjon.kiparsky, on 14 December 2013 - 07:45 AM, said:

Things get more interesting when we build more involved sentences, like

"This is the boy that John said he'd give one of his socks to, but he didn't."

Who does the final "he" refer to?

What didn't he do? Who was to get the sock? How do we know these things? That's all syntax.

It can be analyzed syntactically, yes. Or it can be interpreted with the marker-word/bucket approach -- which (we argue) is how most people would handle it. How else could the vast majority of Americans -- who have no idea what the word "antecedent" means, and who couldn't delineate a subordinate clause to save their lives -- understand anything?

View Postjon.kiparsky, on 14 December 2013 - 07:45 AM, said:

So if you're not actually parsing sentences, you're not doing natural language.

Or, we're processing natural language as small children do (rather than the way a professional linguist would).

View Postjon.kiparsky, on 14 December 2013 - 07:45 AM, said:

I suspect that you're doing something a lot more like Weizenbaums's Eliza...

Perhaps. But don't forget that Eliza also processes language as humans do. When I'm trying to make conversation with a stranger, for example, I let their words more-or-less wash over me until I hear a term that I think we might have in common; then I reply, as Eliza would, "Tell me about your vacation (or your piano, or your mother)." And I know this is what I do, because I've consciously watched myself doing it.

View Postjon.kiparsky, on 14 December 2013 - 07:45 AM, said:

which is a fun toy but not something you could use for computation.

But we have used this technique for computation, and non-trivial computation at that: a complete development system including desktop, file manager, editor, dumper, native-code-generating compiler/linker, and wysiwyg page layout facility.

View Postjon.kiparsky, on 14 December 2013 - 07:45 AM, said:

Quote

Our parser operates, we think, something like the parsing centers in the human brain. Consider, for example, a father saying to his baby son:

“Want to suck on this bottle, little guy?”

And the kid hears,

“blah, blah, SUCK, blah, blah, BOTTLE, blah, blah.”


No, this is more of a Gary Larson theory of language - it's much like the view that Augustine of Hippo proposed, but I don't think it's ever been proposed as a serious contender for a theory of language acquisition in the modern era. You have a little bootstrapping problem here. The child's input is not a delineated sequence of conveniently marked and isolated tokens, "What" and "to" and "suck" etc. Instead, it's a continuous stream of sounds. The child's task is much more complicated and much more interesting than you're suggesting here. In any case, the point is: this is not how any human brain works on language.

Glad you got the Gary Larson allusion. But I've studied a lot of kids in the process of understanding language, and I have a speaking brain of my own, and it appears to me that we actually do process language along these lines. And it worked, as I mentioned a moment ago, in a practical, real-world programming project.

View Postjon.kiparsky, on 14 December 2013 - 07:45 AM, said:

So just from the point of view of the linguist, my suggestion would be "go study some linguistics".

Have done so, and will continue to do so. But I'll always be a fan of the simplest solution to a problem, and the solution that appears to be most similar to natural systems.

View Postjon.kiparsky, on 14 December 2013 - 07:45 AM, said:

So much for the linguistics. As a programmer, this looks horrible. Why would I want to type "A polygon is a thing with vertices" and hope that my testing reveals all of the errors of interpretation that this allows, when I can use a well-designed language (or even something like VB or ML) and know exactly what I'm going to get? That is, either "A polygon is a thing with vertices" is a precise construct - in which case I might as well learn a less atrocious syntax - or or it's not a precise construct - in which case this is completely useless for programming.

Alternately, why would I want to type "Runtime.getRuntime().exec("cls");" instead of "Clear the screen."? Our general position on the matter, as intimated in my original post, is that a program should be written more-or-less like a math book: a natural-language framework punctuated with specialized syntaxes as appropriate.

View Postjon.kiparsky, on 14 December 2013 - 07:45 AM, said:

Quote

We deal with ambiguity as you would when speaking to, say, an employee: when he does what you asked, you're done; when he doesn't, you elaborate.


Seriously? Why not just use a syntax that allows me to state my meaning unambiguously?


Because your unambiguous syntax has to be memorized, while our occasionally-ambiguous syntax is already known. And because your unambiguous syntax doesn't get us closer to creating a HAL 9000, which our natural-language syntax just might.

[quote name='jon.kiparsky' date='14 December 2013 - 07:45 AM' timestamp='1387032305' post='1947146']

Quote

So from the point of view of the programmer, my suggestion would be that you study language design a little.

Do you really think we could have written the program we're talking about if we hadn't studied, in significant depth, both linguistics and programming language design? I'm afraid you're mistaking our love of simplicity and our out-of-the-box thinking for lack of education. The proof is in the pudding. The thing works, and it has answered the three questions we posed at the outset. We can now say, with years of experience -- and a tangible product -- to back up the assertions:

1. It is easier (at least for us!) to program in a natural language rather than translate our thoughts into specialized syntaxes; and

2. We can parse English in a relatively sloppy, marker-word/bucket fashion and still produce a system that is sufficiently precise to write non-trivial computer programs; and that

3. Low level programs (like compilers) can indeed be conveniently and efficiently written in high-level languages (like English).

Thanks for the challenging remarks.
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1