0 Replies - 503 Views - Last Post: 14 October 2018 - 11:16 AM

#1 Skydiver   User is online

  • Code herder
  • member icon

Reputation: 7056
  • View blog
  • Posts: 23,986
  • Joined: 05-May 12

Binary heap (like linked list) is not optimal

Posted 14 October 2018 - 11:16 AM

To add to my crusade about how the way computer science is being taught needs updating, here is an article from the ACM published all the way back in 2010:

You're Doing It Wrong. Think you've mastered the art of server performance? Think again.


Would you believe me if I claimed that an algorithm that has been on the books as "optimal" for 46 years, which has been analyzed in excruciating detail by geniuses like Knuth and taught in all computer science courses in the world, can be optimized to run 10 times faster?
A quick browse of the mental catalog flipped up the binary-heap card, which not only sports an O(log2(n)) transaction performance, but also has a meta-data overhead of only a pointer to each object—which is important if you have 10 million+ objects.

Careful rereading of Knuth confirmed that this was the sensible choice, and the implementation was trivial: "Ponto facto, Cæsar transit," etc.

On a recent trip by night-train to Amsterdam, my mind wandered, and it struck me that Knuth might be terribly misleading on the performance of the binary heap, possibly even by an order of magnitude. On the way home, also on the train, I wrote a simulation that proved my hunch right.

Before any fundamentalist CS theoreticians choke on their coffees: don't panic! The P vs. NP situation is unchanged, and I have not found a systematic flaw in the quality of Knuth et al.'s reasoning. The findings of CS, as we know it, are still correct. They are just a lot less relevant and useful than you think—at least with respect to performance.
... The fact is, however, 46 years later most CS-educated professionals still ignore VM as a matter of routine. This is an embarrassment for CS as a discipline and profession, not to mention wasting enormous amounts of hardware and electricity.
It is, amazingly, the only conceptual model used in computer education, despite the fact that it has next to nothing to do with the execution environment on a modern computer. And just for the record: by modern, I mean VAX 11/780 or later.

The past 30 or 40 years of hardware and operating-systems development seems to have only marginally impinged on the agenda in CS departments' algorithmic analysis sections, and as far as my anecdotal evidence, it has totally failed to register in the education they provide.
On a modern multi-issue CPU, running at some gigahertz clock frequency, the worst-case loss is almost 10 million instructions per VM page fault. If you are running with a rotating disk, the number is more like 100 million instructions.

What good is an O(log2(n)) algorithm if those operations cause page faults and slow disk operations? For most relevant datasets an O(n) or even an O(n^2) algorithm, which avoids page faults, will run circles around it.

Performance analysis of algorithms will always be a cornerstone achievement of computer science, and like all of you, I really cherish the foldout chart with the tape-sorts in volume 3 of The Art of Computer Programming. But the results coming out of the CS department would be so much more interesting and useful if they applied to real computers and not just toys like ZX81, C64, and TRS-80.

Is This A Good Question/Topic? 1
  • +

Page 1 of 1