Formal Theory of Complexity Analysis in Specific ProgrammingLanguages?

  • (2 Pages)
  • +
  • 1
  • 2

24 Replies - 1657 Views - Last Post: 01 June 2015 - 11:47 PM

#1 gonzaw  Icon User is offline

  • New D.I.C Head

Reputation: 7
  • View blog
  • Posts: 46
  • Joined: 18-December 12

Formal Theory of Complexity Analysis in Specific ProgrammingLanguages?

Posted 22 March 2015 - 05:41 PM

Hi.

Recently I've started to feel a little weird about complexity analysis in programming languages. It's started to feel a little arbitrary to me. By this I mean analysis of space/time complexity of certain programs in any arbitrary programming language.

Let me explain myself. Take for instance time complexity analysis. We all know Big O/Omega/Theta notation, and what it basically comes down to (lower/upper bound on the asymptotic amount of instructions a specific program/algorithm takes to execute to an arbitrary input, depending on the case).
But the connection between the maths behind it (e.g the actual mathematical definition of O(f(n)) ) and the specific programming language you are analysing seems blurry to me.
For instance, if imagine I have this program below:
public int foo(int n){
    int res = 0;     
    for(i = 1; i <= n; i++){
        res += i;
    }
    return res;
}


We can easily analyse it and say it has O(n) complexity. But why? The usual argument seems to be a little ad hoc to the specific language. Like in this case, you'd say something like "Hmm, well, we can assume that we take the unit as an assignment or operator application, like ":=" or "+", so we try to count the number of times those are called. Also, the parameter is an integer 'n', so we know that the input is this 'n', so that will be the domain of our Big O function. Then, we know that for-loops represent an interation of n amount of instructions when they have the specific form "for(i = 0; i < n; i++)", so we can figure out this whole function is O(n)"

But to me, all of the claims you would make in a statement like the above are not subtantiated. I mean, they are intuitive and make sense, but there is no formal process that proves that this function has O(n) time complexity. You just basically figured that out of thin air.
But this is in no way a good way to reason about arbitrary programs. Arbitrary programs are not pure, don't have a neatly defined interface where the "inputs" are nicely shown, nor they have the specific constructs (like this specific for-loop, assignments, etc) that allow you to reason about its time complexity as neatly as you did above.
In those cases there is no way to reason about the asymptotic complexity of the program, other than trying some ad hoc reasoning, greatly simplifying it. But in that case it's very possible you get it wrong, and have no way to verify it (other than thinking about it harder).

One of the problems I find is that this type of complexity analysis is defined abstractly. It's basically pure math, which we then try to "bring down" to specific languages and their behavior. Big O never says anything about for-loops, or assignments. No, it only talks about mathematical functions. The rest is all on us. The translation between the mathematical realm of the analysis, and the computational realm doesn't seem well defined to me.


Is it possible to have a formal theory of complexity analysis, at the very least for a specific programming language, or a subset of such a programming language? I want to be able to deconstruct a program into its abstract syntax tree, and be able to traverse it using logic rules to find out its time and space complexity in a sure way, just like a typechecker traverses the abstract syntax tree, following logic rules to find out if the program is correctly typed in a sure way.
Is this possible?

This post has been edited by gonzaw: 22 March 2015 - 05:44 PM


Is This A Good Question/Topic? 0
  • +

Replies To: Formal Theory of Complexity Analysis in Specific ProgrammingLanguages?

#2 jon.kiparsky  Icon User is online

  • Chinga la migra
  • member icon


Reputation: 10729
  • View blog
  • Posts: 18,359
  • Joined: 19-March 11

Re: Formal Theory of Complexity Analysis in Specific ProgrammingLanguages?

Posted 22 March 2015 - 05:52 PM

Algorithmic complexity is generally applied to algorithms, not programs. This distinction is important: algorithms can be specified at a level abstract from implementation details, programs cannot.

Proofs about complexity of algorithms are therefore mathematical proofs. I suspect that they are not any more or less amenable to automated proof than other sorts of mathematical proofs.

Quote

there is no formal process that proves that this function has O(n) time complexity. You just basically figured that out of thin air.


Well, yes. Again, you're talking about a program, not an algorithm, but if you have a similarly linear algorithm, for example searching for an item in a linked list, it's easy enough to prove that the algorithm has O(n) complexity. In this case, it would be an induction on the number of elements in the list. I suppose you do have to "figure it out of thin air", and there isn't a formal process that gets you from the statement of the problem to the solution, but that's kind of the mathematician's job: figuring stuff out.
Was This Post Helpful? 1
  • +
  • -

#3 macosxnerd101  Icon User is online

  • Games, Graphs, and Auctions
  • member icon




Reputation: 12186
  • View blog
  • Posts: 45,250
  • Joined: 27-December 08

Re: Formal Theory of Complexity Analysis in Specific ProgrammingLanguages?

Posted 22 March 2015 - 06:04 PM

Quote

It's basically pure math, which we then try to "bring down" to specific languages and their behavior. Big O never says anything about for-loops, or assignments. No, it only talks about mathematical functions. The rest is all on us. The translation between the mathematical realm of the analysis, and the computational realm doesn't seem well defined to me.


We define a function T(n) to describe the number of steps the algorithm will take with respect to the input parameter n.

In the case of your code, we have:
public int foo(int n){
    int res = 0;  //Theta(1)

    /**
     * i = 1: Theta(1)
     * i <= n: Theta(1)
     * i++: Theta(1)
     * n iterations of the loop
     */
    for(i = 1; i <= n; i++){
        res += i; //Theta(1)
    }
    return res; //Theta(1)
}



So considering int res = 0; and i = 1;, we have T(n) = Theta(1). Then the loop takes n iterations with 3 * Theta(1) = Theta(1) time operations on the interior. So T(n) = Theta(1) + (\sum_{i=1}^{n} \Theta(1)) + \Theta(1).

We then observe that \sum_{i=1}^{n} \Theta(1) = \Theta(n). So T(n) = Theta(1) + \Theta(n) + \Theta(1) = \Theta(n).

I could have instead given you the exact runtime function and applied the limit test to prove T(n) = \Theta(n).

Quote

I want to be able to deconstruct a program into its abstract syntax tree, and be able to traverse it using logic rules to find out its time and space complexity in a sure way, just like a typechecker traverses the abstract syntax tree, following logic rules to find out if the program is correctly typed in a sure way.

This is probably overkill. Tools like recursion tree analysis and the Master Theorem are probably better to use when possible. Or a direct argument like I did above.

Quote

Hmm, well, we can assume that we take the unit as an assignment or operator application, like ":=" or "+", so we try to count the number of times those are called.

Just as in any area of mathematics, we need assumptions or axioms with which to start. Does assignment have a specific cost as a function of n? If n = 100, does it take me more time to assign int res = 0; than if I have n = 1000? Even if you go down to the circuit level and apply asymptotic analysis there, you hit the same assumptions just at a lower level.

By the way- our theory of complexity is formal. We talk about algorithms in terms of RAM computations. You could instead analyze the complexity of Turing computations, if you wanted. The same goes for other models of computations. When you get into complexity classes (P, NP, etc.), they are defined in terms of Turing Machines. Even though we give verification or decision algorithms (when writing say, NP-Completeness proofs) in terms of RAM computations, Turing Machines and RAM are polynomial time equivalent.
Was This Post Helpful? 0
  • +
  • -

#4 gonzaw  Icon User is offline

  • New D.I.C Head

Reputation: 7
  • View blog
  • Posts: 46
  • Joined: 18-December 12

Re: Formal Theory of Complexity Analysis in Specific ProgrammingLanguages?

Posted 22 March 2015 - 06:13 PM

View Postjon.kiparsky, on 22 March 2015 - 10:52 PM, said:

Algorithmic complexity is generally applied to algorithms, not programs. This distinction is important: algorithms can be specified at a level abstract from implementation details, programs cannot.

Proofs about complexity of algorithms are therefore mathematical proofs. I suspect that they are not any more or less amenable to automated proof than other sorts of mathematical proofs.


I'm not sure I get what's the difference between an algorithm and a program. Sure, if you take a specific Java/C/etc program it obviously has "implementation details". But you can also define programs in lambda calculus, or System F, or any other "theoretical" framework for programs. Those are not bound by any kind of implementation details at all. So in those cases, what is the difference between an algorithm and a program?

I think this "too abstract to be useful" focus is hurting us. There has been plenty of theoretical work done with programming languages. We can reason about computations in more ways other than "pseudocode" (like done in the usual algorithm analysis).
Why can't we construct a new theory of complexity analysis using one of these systems, like maybe System F, or calculus of constructions or something? Or create a subset of those systems that allows you to perform that kind of analysis?

View Postjon.kiparsky, on 22 March 2015 - 10:52 PM, said:

I suppose you do have to "figure it out of thin air", and there isn't a formal process that gets you from the statement of the problem to the solution, but that's kind of the mathematician's job: figuring stuff out.


Well, if I had a handy mathematician in my pocket I could pull out whenever I have to figure out the complexity of a program then there wouldn't be a problem. Sadly I don't have any :P/>
Was This Post Helpful? 0
  • +
  • -

#5 macosxnerd101  Icon User is online

  • Games, Graphs, and Auctions
  • member icon




Reputation: 12186
  • View blog
  • Posts: 45,250
  • Joined: 27-December 08

Re: Formal Theory of Complexity Analysis in Specific ProgrammingLanguages?

Posted 22 March 2015 - 06:25 PM

Quote

I'm not sure I get what's the difference between an algorithm and a program. Sure, if you take a specific Java/C/etc program it obviously has "implementation details". But you can also define programs in lambda calculus, or System F, or any other "theoretical" framework for programs.

A program is a specific-implementation that you can run. An algorithm is a bit more abstract. If we are talking about formal languages, we say that there exists an algorithm to solve a problem iff there exists a Turing Machine to decide the language formulation. That is, problems are equivalent to languages and algorithms are equivalent to Turing Machines.

There are many other models of computation, such as lambda calculus, RAM, and others. These are all no more powerful than the Turing model with regards to issues of computability. The complexity analysis doesn't change though. You are still analyzing how many computations the model will take to decide a string with regards to the size of the input. The RAM model, for instance, is more efficient than the Turing model. There is no new theory to construct. And as I mentioned in my last post, any model polynomially-equivalent to the Turing model (with respect to time complexity) is fine for using when dealing with complexity classes.
Was This Post Helpful? 0
  • +
  • -

#6 gonzaw  Icon User is offline

  • New D.I.C Head

Reputation: 7
  • View blog
  • Posts: 46
  • Joined: 18-December 12

Re: Formal Theory of Complexity Analysis in Specific ProgrammingLanguages?

Posted 22 March 2015 - 06:30 PM

View Postmacosxnerd101, on 22 March 2015 - 11:04 PM, said:

Quote

It's basically pure math, which we then try to "bring down" to specific languages and their behavior. Big O never says anything about for-loops, or assignments. No, it only talks about mathematical functions. The rest is all on us. The translation between the mathematical realm of the analysis, and the computational realm doesn't seem well defined to me.


We define a function T(n) to describe the number of steps the algorithm will take with respect to the input parameter n.


The thing is that "the number of steps the algorithm will take" and "input parameter" are too abstract for most problems. For instance, it assumes we are dealing with an imperative language, that has a sequence of "steps" that the program can follow. Second, it assumes you can easily define what the "input parameter" is, which at times may be almost impossible (specially if it's not a real number/integer/natural)

Quote

Quote

I want to be able to deconstruct a program into its abstract syntax tree, and be able to traverse it using logic rules to find out its time and space complexity in a sure way, just like a typechecker traverses the abstract syntax tree, following logic rules to find out if the program is correctly typed in a sure way.

This is probably overkill. Tools like recursion tree analysis and the Master Theorem are probably better to use when possible. Or a direct argument like I did above.


My argument is that "direct arguments" should not be viable at all and we need a different tool or method to perform this kind of analysis on code/algorithms/programs we create. This program above is the easiest program ever, but when you deal with other complexities, specially ones that are inherent to a specific model of computation, or programming language (like recursion, side effects, mutability, and even something as simple as dynamic conditionals) trying to do a "direct argument" is not feasible.

The analysis of recursion done in this kind of complexity analysis always seemed too complex to me too. Trying to find out the complexity of something as mergesort needs you to make this weird analysis where you "iterate" over the number of steps the recursion takes to end. It seems fine when you apply structural recursion (recursing down a list or tree), or even something as factorial. But if you have general recursion, again, it doesn't seem feasible (like trying to apply that analysis to ackerman's function for instance).

Quote

Quote

Hmm, well, we can assume that we take the unit as an assignment or operator application, like ":=" or "+", so we try to count the number of times those are called.

Just as in any area of mathematics, we need assumptions or axioms with which to start. Does assignment have a specific cost as a function of n? If n = 100, does it take me more time to assign int res = 0; than if I have n = 1000? Even if you go down to the circuit level and apply asymptotic analysis there, you hit the same assumptions just at a lower level.

By the way- our theory of complexity is formal. We talk about algorithms in terms of RAM computations. You could instead analyze the complexity of Turing computations, if you wanted. The same goes for other models of computations. When you get into complexity classes (P, NP, etc.), they are defined in terms of Turing Machines. Even though we give verification or decision algorithms (when writing say, NP-Completeness proofs) in terms of RAM computations, Turing Machines and RAM are polynomial time equivalent.


It may be formal regarding the model you chose (RAM computations), but it sure is not formal regarding the model everybody else uses (specific programming languages). Nobody programs directly with RAM. We need a theory that is based on the other equivalent models we do use, giving us tools to reason about this theory on that specific programming language.
Even if you could perfectly apply the formal theory for Turing machines, to be able to do ANY kind of formal analysis on any program you make, you would need to transform it into its corresponding Turing Machine first. You would NOT be able to perform the complexity analysis using the programming language the program was original written in.

EDIT:

Quote

A program is a specific-implementation that you can run. An algorithm is a bit more abstract. If we are talking about formal languages, we say that there exists an algorithm to solve a problem iff there exists a Turing Machine to decide the language formulation. That is, problems are equivalent to languages and algorithms are equivalent to Turing Machines.


I call that a program :P
If something is modelled with a Turing Machine, or with lambda calculus, or with any other equivalent model, then it's a program, whether it is a "specific-implementation that you can run" or not.
If I take a piece of paper and write "(\f\g\x.f x) (\x.x)" then it is a program.

This post has been edited by gonzaw: 22 March 2015 - 06:35 PM

Was This Post Helpful? 0
  • +
  • -

#7 macosxnerd101  Icon User is online

  • Games, Graphs, and Auctions
  • member icon




Reputation: 12186
  • View blog
  • Posts: 45,250
  • Joined: 27-December 08

Re: Formal Theory of Complexity Analysis in Specific ProgrammingLanguages?

Posted 22 March 2015 - 06:40 PM

Quote

The thing is that "the number of steps the algorithm will take" and "input parameter" are too abstract for most problems. For instance, it assumes we are dealing with an imperative language, that has a sequence of "steps" that the program can follow. Second, it assumes you can easily define what the "input parameter" is, which at times may be almost impossible (specially if it's not a real number/integer/natural)

If you think they are too abstract, then you haven't defined your problem well enough. Every problem can be written as follows:

INSTANCE: Describe the inputs.

SOLUTION: Describe what you expect back.


Let's use the HAMILTONIAN CIRCUIT problem as an example:
INSTANCE: A simple, undirected graph G(V, E).

SOLUTION: YES if G contains a Hamiltonian circuit, and NO otherwise.

Now G has a size of O(V2) using either the adjacency list or adjacency matrix representations.


Quote

My argument is that "direct arguments" should not be viable at all and we need a different tool or method to perform this kind of analysis on code/algorithms/programs we create.

Why not? I just used a direct argument on your example. If a direct argument works, why use something different? As Jon said- complexity analysis is probably just as amenable to automated proofs as other areas of math.

Quote

ike recursion, side effects, mutability, and even something as simple as dynamic conditionals) trying to do a "direct argument" is not feasible.

When we have recursion, we derive T(n) as a recurrence and either analyze it using recursion trees or the Master theorem. When we have dynamic conditionals, we apply a worst-case analysis.

Quote

The analysis of recursion done in this kind of complexity analysis always seemed too complex to me too. Trying to find out the complexity of something as mergesort needs you to make this weird analysis where you "iterate" over the number of steps the recursion takes to end.

Just because you need more practice doesn't make the model faulty. I'm not sure why Mergesort is that perplexing. We divide the list in half and recurse on each of the halves. Then we merge all the elements. So T(n) = 2T(n/2) + Theta(n), where the Theta(n) step is the merge operation.

Quote

It may be formal regarding the model you chose (RAM computations), but it sure is not formal regarding the model everybody else uses (specific programming languages).

Nobody really does formal complexity analysis on a business problem. People may learn to eyeball things. Unless you're working in an area like computational biology, etc., then formal complexity isn't a big deal. And if you are in an area like computational biology and come across a significantly difficult problem, then it's probably worth dealing with it in the abstract to determine if it is NP-Complete.

Modern computers are some blend of RAM and linear-bounded automata. So our abstractions are good tools to use. Analyzing a Java program would be quite similar to analyzing a RAM computation.

Quote

You would NOT be able to perform the complexity analysis using the programming language the program was original written in.

Again, I did for your example above. The bottom line is that when you analyze an algorithm using a specific model (ie., RAM, Turing Machines, etc.), you seek to determine how many steps asymptotically the model will take with respect to the size of the input.

For snippets of Java code (or the like), we generally treat those as RAM computations because it makes sense to do so. You don't have a tape head moving one cell at a time to the left or right in order to access memory. You have random access.
Was This Post Helpful? 0
  • +
  • -

#8 jon.kiparsky  Icon User is online

  • Chinga la migra
  • member icon


Reputation: 10729
  • View blog
  • Posts: 18,359
  • Joined: 19-March 11

Re: Formal Theory of Complexity Analysis in Specific ProgrammingLanguages?

Posted 22 March 2015 - 07:00 PM

View Postgonzaw, on 22 March 2015 - 08:13 PM, said:

Well, if I had a handy mathematician in my pocket I could pull out whenever I have to figure out the complexity of a program then there wouldn't be a problem. Sadly I don't have any :P/>/>/>


If you're doing complexity analysis on an algorithm, you're doing computer science. CS is a branch of math, so this makes you a mathematician. I suppose you might not have a mathematician in your pocket exactly, but you do have one handy.

I think the confusion I'm seeing is about what exactly this complexity analysis is for. In real-world applications, it's primarily used to gauge the practicality of a particular approach. If I tell you that I think you've got an O(n^2) algorithm somewhere in the code you've given me, I'm telling you something about the sort of behavior when the program scales. I'm not telling you anything about how long the program will take to execute for some particular input - that's the whole point of the abstractions built into the model in the first place.


Quote

The thing is that "the number of steps the algorithm will take" and "input parameter" are too abstract for most problems. For instance, it assumes we are dealing with an imperative language, that has a sequence of "steps" that the program can follow


Not at all. This is exactly why we're getting away from programs and getting to algorithms. If you have a problem about graphs, you're going to be able to reason about solutions to that problem in terms of the number of nodes and the number of vertices. Once you find a solution which is as good as you can get - and this is generally provable as well - you can then turn to specific languages to implement that solution, and none of those languages will do better than that solution. And, in general, no language will force you to do worse than that solution. (in fact, I could remove the words "in general", if we're limiting ourselves to the Turing complete languages, since you could in principle write a lisp interpreter in your favorite language and then solve the problem with that - or a python interpreter, or a java compiler, or whatever you like)
Was This Post Helpful? 1
  • +
  • -

#9 jon.kiparsky  Icon User is online

  • Chinga la migra
  • member icon


Reputation: 10729
  • View blog
  • Posts: 18,359
  • Joined: 19-March 11

Re: Formal Theory of Complexity Analysis in Specific ProgrammingLanguages?

Posted 22 March 2015 - 07:18 PM

View Postmacosxnerd101, on 22 March 2015 - 08:40 PM, said:

Nobody really does formal complexity analysis on a business problem. People may learn to eyeball things. Unless you're working in an area like computational biology, etc., then formal complexity isn't a big deal. And if you are in an area like computational biology and come across a significantly difficult problem, then it's probably worth dealing with it in the abstract to determine if it is NP-Complete.


To be precise about this: since complexity is basically additive - you can't remove it by adding more code - you analyze pieces of the code if you want to understand its performance. Generally this is not, as you say, a formal analysis, but a fairly close estimate. But you're not analyzing the program as a whole. This would be pointless, since you don't generally care about the program as a whole. You care about what's slow about it, and how to make it less slow. So you look for things that are likely to be slow - either by reasoning about the sorts of problems that are solved there, or by looking at the code - or you instrument the code and look for the actual time sinks and reason about why they're slow.

All of this is a lot easier if you've done the work to be good at complexity analysis, but I've never seen anyone do formal analysis of a problem in the workplace.
Was This Post Helpful? 0
  • +
  • -

#10 macosxnerd101  Icon User is online

  • Games, Graphs, and Auctions
  • member icon




Reputation: 12186
  • View blog
  • Posts: 45,250
  • Joined: 27-December 08

Re: Formal Theory of Complexity Analysis in Specific ProgrammingLanguages?

Posted 22 March 2015 - 07:24 PM

Those are all good points. The complexity work, though, one would do in this situation is more in-line with this tutorial rather than formally deriving T(n) or writing an NP-Completeness proof, though.
Was This Post Helpful? 1
  • +
  • -

#11 gonzaw  Icon User is offline

  • New D.I.C Head

Reputation: 7
  • View blog
  • Posts: 46
  • Joined: 18-December 12

Re: Formal Theory of Complexity Analysis in Specific ProgrammingLanguages?

Posted 22 March 2015 - 08:01 PM

View Postjon.kiparsky, on 23 March 2015 - 12:00 AM, said:

If you're doing complexity analysis on an algorithm, you're doing computer science. CS is a branch of math, so this makes you a mathematician. I suppose you might not have a mathematician in your pocket exactly, but you do have one handy.


I guess my dream of portable midget mathematicians will have to wait :(

Quote

Not at all. This is exactly why we're getting away from programs and getting to algorithms. If you have a problem about graphs, you're going to be able to reason about solutions to that problem in terms of the number of nodes and the number of vertices. Once you find a solution which is as good as you can get - and this is generally provable as well - you can then turn to specific languages to implement that solution, and none of those languages will do better than that solution. And, in general, no language will force you to do worse than that solution. (in fact, I could remove the words "in general", if we're limiting ourselves to the Turing complete languages, since you could in principle write a lisp interpreter in your favorite language and then solve the problem with that - or a python interpreter, or a java compiler, or whatever you like)


Is this ideal? This assumes your problem has a nice mathematical abstraction to it (graphs, matrices, vectors, etc). What if it doesn't? Or at least not in the usual sense of "maths" (and not, like, homotopy type theory or stuff like that).
Plus, yes, your implementation would need to implement this algorithm. But this means you need to find a representation for graphs too. Do you use a transition/edge matrix? Or do you keep a list of vertices each having a list of adyacent neighbours? Or do you choose a different implementation? The algorithm would behave differently in each implementation right? In some maybe even worse.
So in the end, you would still need to have a 2nd auxiliary analysis of your implementation, to see whether it checks with the "ideal algorithm complexity" or not.
Why not cut the middle man and do the analysis right there on the "implementation"?


I mean. This is one of those things I mentioned, there is a mismatch between the model the algorithms are based on, and the analysis is done to, and the computational model we use to "implement" the problem (or basically traducing it to a computation).
At times this mismatch is enormous, like trying to reason about graphs using regular maths, and then trying to implement it in C using raw access to memory. That's as big as a mismatch as you can get, there is basically no way to smoothly get from one representation of the model to the other.
So why can't we use theories where that mismatch is minimal? For instance, homotopy type theory plus calculus of constructions, or something similar? Define your mathematical models the same way you define your computations, and then your complexity analysis will be the same in both, since your modelling of the problem will be the same.
Or define a new theory that would still have some mismatch, but not such a big gap, and could still allow you to have formal rules to pass to define that complexity in the new system.


Or straight up define a new system to analize the complexity of programs. Yes, the usual complexity analysis is done to algorithms, but why couldn't you have a different type of analysis on generic programs? Forget about Big O, we'd be talking about a completely new system here.


View Postmacosxnerd101, on 23 March 2015 - 12:24 AM, said:

Those are all good points. The complexity work, though, one would do in this situation is more in-line with this tutorial rather than formally deriving T(n) or writing an NP-Completeness proof, though.


That kind of tutorial is what I'm talking about. It is specifically discussing how to determine Big O for a specific programming language, it's not discussing it about how to do it abstractly. Almost all "Big O tutorials" do the same

View Postmacosxnerd101, on 22 March 2015 - 11:40 PM, said:

If you think they are too abstract, then you haven't defined your problem well enough. Every problem can be written as follows:

INSTANCE: Describe the inputs.

SOLUTION: Describe what you expect back.


Imagine I'm a developer at Facebook. Now I have to find a way to get all the likes a user has, but only the likes of the 15 most popular friends he has, but only of those who have visited at most 2 pages who the original user has visited in the last 2 months.
What is the input of this algorithm? As in, what is the "n" if this algorithm has O(f(n)) complexity?

The problem may be perfectly identified or defined, it just may not map to this model as easily as you make it out to be.
Sure, if you only try to analyze algorithms over generic vectors/lists or graphs, then it'd be pretty easy to correctly "define" the problem. But I don't think it's ideal to try and blame the "problem" for not fitting into this nice little model, instead of modifying the model to make it adapt to the problem instead.

This post has been edited by gonzaw: 22 March 2015 - 08:03 PM

Was This Post Helpful? 0
  • +
  • -

#12 macosxnerd101  Icon User is online

  • Games, Graphs, and Auctions
  • member icon




Reputation: 12186
  • View blog
  • Posts: 45,250
  • Joined: 27-December 08

Re: Formal Theory of Complexity Analysis in Specific ProgrammingLanguages?

Posted 22 March 2015 - 08:35 PM

Quote

Is this ideal? This assumes your problem has a nice mathematical abstraction to it (graphs, matrices, vectors, etc). What if it doesn't? Or at least not in the usual sense of "maths" (and not, like, homotopy type theory or stuff like that).

Your problem should be clearly enough defined that it does. And by "mathematical," I mean a formal statement of the problem where everything is clearly defined. It just so happens that graphs are commonly analyzed in CS classes.

Quote

That kind of tutorial is what I'm talking about. It is specifically discussing how to determine Big O for a specific programming language, it's not discussing it about how to do it abstractly. Almost all "Big O tutorials" do the same

I wrote one that doesn't.


Quote

Or straight up define a new system to analize the complexity of programs. Yes, the usual complexity analysis is done to algorithms, but why couldn't you have a different type of analysis on generic programs? Forget about Big O, we'd be talking about a completely new system here.

Complexity cares about how many steps asymptotically will be required when executing a sequence of commands. There is no new type of analysis to define. Big-O is just a notation, as are the other asymptotics. We use them where appropriate. They are not the analyses themselves.

Quote

Plus, yes, your implementation would need to implement this algorithm. But this means you need to find a representation for graphs too. Do you use a transition/edge matrix? Or do you keep a list of vertices each having a list of adyacent neighbours? Or do you choose a different implementation? The algorithm would behave differently in each implementation right? In some maybe even worse.

We can talk about algorithms based on underlying data structures. Quicksort is very inefficient on Linked Lists. Dijkstra's algorithm is more efficient using an adjacency list representation over an adjacency matrix representation of a graph. Data structures do matter. But we can still talk about algorithms and data structures in a language agnostic manner.

Quote

So in the end, you would still need to have a 2nd auxiliary analysis of your implementation, to see whether it checks with the "ideal algorithm complexity" or not.
Why not cut the middle man and do the analysis right there on the "implementation"?

It's important to understand the pros and cons of underlying data structures to understand when you would choose an appropriate data structure. This is why we don't just "cut out the middle man." We don't know which data structure is best sometimes, until we actually do the analysis.

Quote

Imagine I'm a developer at Facebook. Now I have to find a way to get all the likes a user has, but only the likes of the 15 most popular friends he has, but only of those who have visited at most 2 pages who the original user has visited in the last 2 months.
What is the input of this algorithm? As in, what is the "n" if this algorithm has O(f(n)) complexity?

Are you talking about a status? Or a page? Let's suppose a status on a personal page. So the input would be a collection of tuples noting (user_id, { set of pages visited in last two months }), with an initial user and the user's friends being the users specified.

The n would be the number of friends the user has.

Quote

The problem may be perfectly identified or defined, it just may not map to this model as easily as you make it out to be.

I'm not saying any of this is easy. I'm saying it's possible and doable. Clearly defining components and describing how they work together is non-trivial. However, making the effort to define these things is important for not only mathematical analysis but also in understanding how the component works.

Quote

Sure, if you only try to analyze algorithms over generic vectors/lists or graphs, then it'd be pretty easy to correctly "define" the problem. But I don't think it's ideal to try and blame the "problem" for not fitting into this nice little model, instead of modifying the model to make it adapt to the problem instead.

How many NP-Completeness proofs have you worked through? Have you flipped through a book on Algorithm Analysis like Cormen? Have you read any of the literature on complexity? These things are not easy to do. These models are not simplistic. Complexity theory is involved. Analyzing algorithms is only a small part of it. The important results are in regards to which problems can (and cannot) be solved efficiently. What does it mean for a computer to be able to solve a problem efficiently? This is why we care about complexity.

Quote

I mean. This is one of those things I mentioned, there is a mismatch between the model the algorithms are based on, and the analysis is done to, and the computational model we use to "implement" the problem (or basically traducing it to a computation).

I still don't see a mismatch that you are talking about.

Quote

At times this mismatch is enormous, like trying to reason about graphs using regular maths, and then trying to implement it in C using raw access to memory. That's as big as a mismatch as you can get, there is basically no way to smoothly get from one representation of the model to the other.

Again, I still don't see the mismatch. I can implement both adjacency list and adjacency matrix representations in C or C++. Just because the implementation details are non-trivial doesn't mean the theory should be thrown out.

Quote

So why can't we use theories where that mismatch is minimal? For instance, homotopy type theory plus calculus of constructions, or something similar? Define your mathematical models the same way you define your computations, and then your complexity analysis will be the same in both, since your modelling of the problem will be the same.

So what are you attacking? Again, complexity analysis of algorithms counts the number of steps an algorithm will take asymptotically related to the size of the input. If we used homotopy type theory and calculus of constructions, we would still be doing the same thing regarding complexity.
Was This Post Helpful? 0
  • +
  • -

#13 astonecipher  Icon User is offline

  • Too busy for this
  • member icon

Reputation: 2343
  • View blog
  • Posts: 9,400
  • Joined: 03-December 12

Re: Formal Theory of Complexity Analysis in Specific ProgrammingLanguages?

Posted 22 March 2015 - 08:50 PM

I thought Mac was the handy mathematician? He won't fit in my pocket, but my phone is sufficiently small enough to fit.
Was This Post Helpful? 0
  • +
  • -

#14 jon.kiparsky  Icon User is online

  • Chinga la migra
  • member icon


Reputation: 10729
  • View blog
  • Posts: 18,359
  • Joined: 19-March 11

Re: Formal Theory of Complexity Analysis in Specific ProgrammingLanguages?

Posted 22 March 2015 - 09:10 PM

View Postgonzaw, on 22 March 2015 - 10:01 PM, said:

View Postjon.kiparsky, on 23 March 2015 - 12:00 AM, said:

If you're doing complexity analysis on an algorithm, you're doing computer science. CS is a branch of math, so this makes you a mathematician. I suppose you might not have a mathematician in your pocket exactly, but you do have one handy.


I guess my dream of portable midget mathematicians will have to wait :(/>


I guess so. Well, maybe not - Mac, how tall are you? :)



Quote

This assumes your problem has a nice mathematical abstraction to it (graphs, matrices, vectors, etc). What if it doesn't? Or at least not in the usual sense of "maths" (and not, like, homotopy type theory or stuff like that).


Well, all programs are representations, and they're all mathematical abstractions.

Quote

Plus, yes, your implementation would need to implement this algorithm. But this means you need to find a representation for graphs too. Do you use a transition/edge matrix? Or do you keep a list of vertices each having a list of adyacent neighbours? Or do you choose a different implementation?


Translation from one representation to another is pretty straightforward. I believe that cost is generally O(N), so if your algorithm is linear or worse, this doesn't add any complexity to the overall picture. So, for most algorithms, this is not a cost you're too concerned about)
Notice that this is a great example of the difference between complexity and clock time - it's certainly possible that this translation might take some large amount of time - but the time it takes will increase linearly, and not, for example, exponentially. By the same logic, even though sorting is not a costless operation, it's basically "free" for any algorithm that scales at O(nlogn) or worse.


Quote

The algorithm would behave differently in each implementation right? In some maybe even worse.


Well, not exactly. An algorithm is usually defined over a particular data structure. Binary search doesn't make sense if you're talking about linked list, it only makes sense in an array-like structure with random access to sequential elements, and specifically to such a structure with contents sorted sequentially. So if the algorithm requires an adjacency matrix and you have an edge list, you can't apply that algorithm. Fortunately, this is an easy problem to solve, as I've already said.

Quote

So in the end, you would still need to have a 2nd auxiliary analysis of your implementation, to see whether it checks with the "ideal algorithm complexity" or not.
Why not cut the middle man and do the analysis right there on the "implementation"?


Well, you need both. You need a good algorithm, since a good implementation of a lousy algorithm is still lousy. And then you need a good implementation, since a lousy implementation of a good algorithm is still lousy. Doing this in two steps is sensible since the general problem need only be solved once - we don't need to solve sorting for Pascal and for Python and for Java and for each other language. Tony Hoare gave us one solution that works in any language. (it's less suited to lisp's native data structure, the linked list, so it might be harder to implement there, but it's certainly doable - we know this because Lisp is Turing complete). So now we have a much simpler problem, which is to come up with an implementation of a known solution, and to show that this is a correct implementation. There is certainly some nontrivial work to do here, and this is why we use library solutions in preference to home-rolled ones, but it should be obvious that it's a lot easier to implement quicksort than it was to invent it.


Quote

Or straight up define a new system to analize the complexity of programs. Yes, the usual complexity analysis is done to algorithms, but why couldn't you have a different type of analysis on generic programs? Forget about Big O, we'd be talking about a completely new system here.


Well, that sounds exciting. I mean, Big O is awesome, but tell me more about how your system works. If it's awesome too, maybe I'll use it.

Quote

Imagine I'm a developer at Facebook. Now I have to find a way to get all the likes a user has, but only the likes of the 15 most popular friends he has, but only of those who have visited at most 2 pages who the original user has visited in the last 2 months.
What is the input of this algorithm? As in, what is the "n" if this algorithm has O(f(n)) complexity?


This is exactly what I mean about not analyzing a business program, but instead analyzing the parts of it. If we go into the problem and find out some details about the representations involved, we can easily determine the bounds on each part of the problem. This is where it's helpful to know the codebase.


Quote

The problem may be perfectly identified or defined, it just may not map to this model as easily as you make it out to be.
Sure, if you only try to analyze algorithms over generic vectors/lists or graphs, then it'd be pretty easy to correctly "define" the problem. But I don't think it's ideal to try and blame the "problem" for not fitting into this nice little model, instead of modifying the model to make it adapt to the problem instead.


The model is well suited to the job it's designed for. I think you're trying to do optimization problems here. You're talking about clock time, not complexity. The two areas are related, but again: time complexity doesn't tell you about the clock time that a process will take. It tells you how clock time changes as your input size changes.
Was This Post Helpful? 0
  • +
  • -

#15 mojo666  Icon User is offline

  • D.I.C Addict
  • member icon

Reputation: 408
  • View blog
  • Posts: 882
  • Joined: 27-June 09

Re: Formal Theory of Complexity Analysis in Specific ProgrammingLanguages?

Posted 22 March 2015 - 09:37 PM

Quote

The thing is that "the number of steps the algorithm will take" and "input parameter" are too abstract for most problems. For instance, it assumes we are dealing with an imperative language, that has a sequence of "steps" that the program can follow.

Can you provide an example? The best I can think of are declaritive languages that aren't used for implementing algorithms. Rather they use prebuilt algorithms to accomplish specific tasks (SQL, XSLT). Analysis cannot be done in such cases...which has caused quite a bit of problems for me.

Quote

Second, it assumes you can easily define what the "input parameter" is, which at times may be almost impossible (specially if it's not a real number/integer/natural)

If I don't know what the input is, how can I even write a program?

Quote

This program above is the easiest program ever, but when you deal with other complexities, specially ones that are inherent to a specific model of computation, or programming language (like recursion, side effects, mutability, and even something as simple as dynamic conditionals) trying to do a "direct argument" is not feasible.

What can we do with an implementation that we cannot do with an algorithm. Big-o is fine for all the things you listed. You are coming up with a function that tells you how many times the program repeats basic tasks with varying input. We are basically asking "how long will my algorithm run as the input gets bigger" or "Will my program still be effective when I have twice as many customers/vendors/users/products/etc".

Quote

Is this ideal? This assumes your problem has a nice mathematical abstraction to it (graphs, matrices, vectors, etc). What if it doesn't?

Then you can't write an algorithm or a program for it.

Quote

So in the end, you would still need to have a 2nd auxiliary analysis of your implementation, to see whether it checks with the "ideal algorithm complexity" or not.
Why not cut the middle man and do the analysis right there on the "implementation"?


You can't write a program (implementation) before you have an algorithm. If the implementation is failing, you now have to see if it is a problem with the implementation or the algorithm, which means you need to perform analysis on the algorithm anyways. If it turns out the algorithm sucks, then you just wasted a lot of money having a programmer working on a fruitless idea. You can get away with this for simpler problems, but you seem to be implying more complex problems.

Quote

Or straight up define a new system to analize the complexity of programs....Forget about Big O, we'd be talking about a completely new system here....That kind of tutorial is what I'm talking about. It is specifically discussing how to determine Big O for a specific programming language, it's not discussing it about how to do it abstractly. Almost all "Big O tutorials" do the same


specific programming language (c++) implementation: for(int i=0; i<=n; i++){cout<<n;}
"abstract": loop from 0 to n and print each number

Analysis is the same process for both. Can you give an example of a problem that is drastically different between different programming languages or even the abstraction.

Quote

Imagine I'm a developer at Facebook. Now I have to find a way to get all the likes a user has, but only the likes of the 15 most popular friends he has, but only of those who have visited at most 2 pages who the original user has visited in the last 2 months.
What is the input of this algorithm? As in, what is the "n" if this algorithm has O(f(n)) complexity?


O(f(likes_u, pages_u, friends_u, pages_f[, likes_p])). Big-O can have multiple variables. For example, radix sort is O(k*n)

Quote

But I don't think it's ideal to try and blame the "problem" for not fitting into this nice little model, instead of modifying the model to make it adapt to the problem instead.


What problem doesn't fit into the model?
Was This Post Helpful? 1
  • +
  • -

  • (2 Pages)
  • +
  • 1
  • 2