Function Challenge #2: New Years Challenge

  • (2 Pages)
  • +
  • 1
  • 2

18 Replies - 12022 Views - Last Post: 16 January 2011 - 01:51 PM

#1 Dogstopper  Icon User is offline

  • The Ninjaducky
  • member icon



Reputation: 2870
  • View blog
  • Posts: 11,025
  • Joined: 15-July 08

Function Challenge #2: New Years Challenge

Post icon  Posted 31 December 2010 - 08:10 PM

*
POPULAR

Welcome to the second function challenge! It is similar to the last one, but since it's New Years, I decided to shake it up a little bit. There are three problems, each progressing in difficulty. If you can, try all three. Thanks to mostyfriedman for the first two challenges!

*Note* This competition is always going to be available, but will be unpinned by January 15th.




Easy problem:

Given a set of integers represented as an array, print all the possible subsets of the set.
Example:
Input: {1, 2, 3}
Output:
[1, 2, 3]
[1, 2]
[1, 3]
[1]
[2, 3]
[2]
[3]
[]


Medium problem:

Given a string of characters, your job is to find a permutation of the string such that it is a palindrome, if more than one palindrome can be found, return the smallest one lexicographically. If no palindromes can be constructed, return "NOT FOUND".

examples:

Input:
"ABBAC"

Output:
"ABCBA"

Input:
"AB"

Output:
"NOT FOUND"

Hard Problem:

Document Distance

This is an easy problem if implemented a certain way compared to other potential ways. There are efficient methods and non-efficient methods of doing this. I have two documents attached - lewis.txt and Tom Sawyer.txt. Try to compare the Document Distance of these two files in the most efficient method possible.

You can find the methodology and information on Document Distance here:
http://www.andrew.cm...stance/lab.html

This process should take no longer than a minute or two to calculate. ;)

Text files:
Attached File  lewis.txt (1007.29K)
Number of downloads: 133
Attached File  Tom Sawyer.txt (390.07K)
Number of downloads: 1228

Good Luck!

If you have any ideas or suggestions for future challenges, feel free to PM me with it.

This post has been edited by Dogstopper: 31 December 2010 - 08:13 PM


Is This A Good Question/Topic? 5
  • +

Replies To: Function Challenge #2: New Years Challenge

#2 Raynes  Icon User is offline

  • D.I.C Lover
  • member icon

Reputation: 611
  • View blog
  • Posts: 2,815
  • Joined: 05-January 09

Re: Function Challenge #2: New Years Challenge

Posted 01 January 2011 - 12:31 PM

(ns ctwo.core
  (:use [clojure.contrib.combinatorics :only [permutations subsets]]))

(defn find-palindrome [s]
  (if-let [result (first (filter #(= % (reverse %)) (permutations s)))]
    (apply str result)
    "NOT FOUND!"))

(defn find-subsets [coll]
  (doseq [sub (subsets coll)]
    (prn sub))) ; use prn because we want the collection printed all pretty-like.



My just-for-fun Clojure implementations of the first two problems.

First and foremost, I'm using a library in contrib for the hard parts. You never specified whether or not reader need to implement the relevant operations themselves. If this library wasn't in contrib, I wouldn't have bothered. Contrib libraries can sometimes make it into Clojure itself, thus what I used could be core functions in the future. Nonetheless, it's very apparent that I cheated and that it is easy to cheat. If you want to avoid these types of solutions, you might need to clarify. Before Adam gets his hands on it. ;)
Was This Post Helpful? 1
  • +
  • -

#3 mostyfriedman  Icon User is offline

  • The Algorithmi
  • member icon

Reputation: 727
  • View blog
  • Posts: 4,473
  • Joined: 24-October 08

Re: Function Challenge #2: New Years Challenge

Posted 01 January 2011 - 01:47 PM

Here's my solution for the first 2 problems, I will post the third one later

Spoiler

This post has been edited by mostyfriedman: 01 January 2011 - 01:51 PM

Was This Post Helpful? 1
  • +
  • -

#4 KYA  Icon User is offline

  • g++ jameson.cpp -o beverage
  • member icon

Reputation: 3101
  • View blog
  • Posts: 19,140
  • Joined: 14-September 07

Re: Function Challenge #2: New Years Challenge

Posted 01 January 2011 - 03:50 PM

Problem 3:

Spoiler



I'd probably make it prettier/a little more clean [like methods, heh] for an under 100 LOC solution.

O(n)
Was This Post Helpful? 2
  • +
  • -

#5 Dogstopper  Icon User is offline

  • The Ninjaducky
  • member icon



Reputation: 2870
  • View blog
  • Posts: 11,025
  • Joined: 15-July 08

Re: Function Challenge #2: New Years Challenge

Posted 01 January 2011 - 04:23 PM

View PostRaynes, on 01 January 2011 - 01:31 PM, said:

First and foremost, I'm using a library in contrib for the hard parts. You never specified whether or not reader need to implement the relevant operations themselves. If this library wasn't in contrib, I wouldn't have bothered. Contrib libraries can sometimes make it into Clojure itself, thus what I used could be core functions in the future. Nonetheless, it's very apparent that I cheated and that it is easy to cheat. If you want to avoid these types of solutions, you might need to clarify. Before Adam gets his hands on it. ;)


NO CHEATING! There. I said it.

I did say simply said to find a good implementation. Whether permutations are done through and external library or not. However, to be able to effectively judge whether or not it is fast or not (Big O), we have to know what the implementation looks like!

@KYA, Nice use of HashMaps! It reduced the efficiency from O(n2) to O(n)

This post has been edited by Dogstopper: 01 January 2011 - 04:24 PM

Was This Post Helpful? 0
  • +
  • -

#6 cfoley  Icon User is offline

  • Cabbage
  • member icon

Reputation: 1949
  • View blog
  • Posts: 4,048
  • Joined: 11-December 07

Re: Function Challenge #2: New Years Challenge

Posted 01 January 2011 - 06:51 PM

Here's my code for problem 2:

Spoiler

This post has been edited by cfoley: 02 January 2011 - 07:33 AM

Was This Post Helpful? 1
  • +
  • -

#7 skaoth  Icon User is offline

  • D.I.C Addict
  • member icon

Reputation: 91
  • View blog
  • Posts: 601
  • Joined: 07-November 07

Re: Function Challenge #2: New Years Challenge

Posted 01 January 2011 - 10:59 PM

code for #3
Spoiler

This post has been edited by skaoth: 01 January 2011 - 11:01 PM

Was This Post Helpful? 1
  • +
  • -

#8 Raynes  Icon User is offline

  • D.I.C Lover
  • member icon

Reputation: 611
  • View blog
  • Posts: 2,815
  • Joined: 05-January 09

Re: Function Challenge #2: New Years Challenge

Posted 02 January 2011 - 12:02 AM

Here is my Clojure solution for 3.

Spoiler


And here is my output:

Quote

user=> (document-distance)
Count: 3257 3257
0.22798180494943945


I'm weirded out. I've scrutinized every character of KYA's code and my own code and I am almost absolutely certain that what we're doing is exactly the same. I'm just doing it in a functional way. Nonetheless, we're getting different outputs. I've tested my code against several different files including the test case (whose output I've documented above) and it appears to work just fine.

He noted that his frequency vector count was 3327, but when I ran his code on my machine against the exact same files, it gave me different output and said that the count was 3256. My code's count is 3257. What is up with this? What is the real correct answer here?

Note that I'm not saying that KYA's code is wrong. It's just as likely that my code is wrong. I've sat here for an hour trying to figure out if *I* did something wrong, but comparing our code, I'm pretty sure mine is solid enough.

Anyways, just a bizarre situation. Nonetheless, this is (once again) a just-for-fun solution because I have fun doing these challenges and it gives people a chance to compare Java to Clojure. Enjoy! :>

This post has been edited by Raynes: 02 January 2011 - 12:28 PM

Was This Post Helpful? 1
  • +
  • -

#9 Dogstopper  Icon User is offline

  • The Ninjaducky
  • member icon



Reputation: 2870
  • View blog
  • Posts: 11,025
  • Joined: 15-July 08

Re: Function Challenge #2: New Years Challenge

Posted 02 January 2011 - 05:32 AM

File lewis.txt : 15996 lines, 182355 words, 8530 distinct words
File Tom Sawyer.txt : 10272 lines, 73844 words, 7400 distinct words
The distance between the documents is: 0.565878 (radians)

That's the official program's output (the MIT solution). Unfortunately, it counts things like O'neil as to words - O and neil. However, as long as it is consistent, I would think it would be ok. KYA got really close to the correct answer.

Raynes, the prompt states:

Quote

A word is a sequence of letters [a..zA..Z] that might include digits [0..9] and the undescore character. All delimiters are thrown away and not kept as part of the word. Here are examples of words:

abcd
abcd12
abc_
a12cd
15111
We'll treat all upper-case letters as if they are lower-case, so that "CMU" and "cmu" are the same word.


That means that mine is potentially incorrect because I forgot about the last part...shoot.

This post has been edited by Dogstopper: 02 January 2011 - 05:36 AM

Was This Post Helpful? 0
  • +
  • -

#10 cfoley  Icon User is offline

  • Cabbage
  • member icon

Reputation: 1949
  • View blog
  • Posts: 4,048
  • Joined: 11-December 07

Re: Function Challenge #2: New Years Challenge

Posted 02 January 2011 - 07:33 AM

I fixed a bug in my code. :)
Was This Post Helpful? 0
  • +
  • -

#11 Raynes  Icon User is offline

  • D.I.C Lover
  • member icon

Reputation: 611
  • View blog
  • Posts: 2,815
  • Joined: 05-January 09

Re: Function Challenge #2: New Years Challenge

Posted 02 January 2011 - 12:22 PM

Oh goody, so we were both wrong at some level? It was just bizarre. It feels like the only reason my output was different from KYA's is because of that single word difference in count. Somehow, I'm getting one extra word. I actually tested that theory by just removing a random word from both frequency vectors and got almost the exact same results as he did. Thus, either he or I are doing something magical and getting a frequency vector that is one element too large.

In any case, yes, it is consistent. I've tested it with several duplicate and non-duplicate files and it appears to give sensible distance measures on all of them.

Cheers!

EDIT: I read the header of that page as well. I actually originally did a little filtering to make sure it was only underscores, spaces, letters, and digits, but I removed that after I got weird results in order to compare them with KYA's.

EDIT2: I fixed the code tags in my previous post. I'm surprised nobody mentioned that I screwed those up. :P

This post has been edited by Raynes: 02 January 2011 - 12:29 PM

Was This Post Helpful? 0
  • +
  • -

#12 KYA  Icon User is offline

  • g++ jameson.cpp -o beverage
  • member icon

Reputation: 3101
  • View blog
  • Posts: 19,140
  • Joined: 14-September 07

Re: Function Challenge #2: New Years Challenge

Posted 02 January 2011 - 12:57 PM

Quote

File lewis.txt : 15996 lines, 182355 words, 8530 distinct words
File Tom Sawyer.txt : 10272 lines, 73844 words, 7400 distinct words
The distance between the documents is: 0.565878 (radians)


That explains my difference:

8539 7400 distinct words for lewis and sawyer respectfully.
Was This Post Helpful? 0
  • +
  • -

#13 skaoth  Icon User is offline

  • D.I.C Addict
  • member icon

Reputation: 91
  • View blog
  • Posts: 601
  • Joined: 07-November 07

Re: Function Challenge #2: New Years Challenge

Posted 02 January 2011 - 09:26 PM

@Raynes
Are you checking to make sure you are not counting empty strings. I found the samething to be happening. In this case the empty string was happening so frequently that it screwed up the document distance function
Was This Post Helpful? 0
  • +
  • -

#14 Raynes  Icon User is offline

  • D.I.C Lover
  • member icon

Reputation: 611
  • View blog
  • Posts: 2,815
  • Joined: 05-January 09

Re: Function Challenge #2: New Years Challenge

Posted 05 January 2011 - 04:31 PM

View Postskaoth, on 03 January 2011 - 03:26 AM, said:

@Raynes
Are you checking to make sure you are not counting empty strings. I found the samething to be happening. In this case the empty string was happening so frequently that it screwed up the document distance function


That was it! Deadly easy to fix too.

Here is the amended code:

Spoiler


I also generalized it a bit so that it can be called on any two documents rather than hardcode the file names.

Here is the output:

Quote

user=> (document-distance "Tom Sawyer.txt" "lewis.txt")
Count: 3256 3256
0.5410870400062462

Was This Post Helpful? 0
  • +
  • -

#15 Dogstopper  Icon User is offline

  • The Ninjaducky
  • member icon



Reputation: 2870
  • View blog
  • Posts: 11,025
  • Joined: 15-July 08

Re: Function Challenge #2: New Years Challenge

Posted 05 January 2011 - 04:34 PM

Raynes, can you even WRITE Java code?
Was This Post Helpful? 0
  • +
  • -

  • (2 Pages)
  • +
  • 1
  • 2