7 Replies - 1496 Views - Last Post: 13 December 2012 - 11:56 AM Rate Topic: -----

#1 marth17  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 4
  • Joined: 11-December 12

precision and recall on a spam filter

Posted 11 December 2012 - 10:00 PM

I have to write a program that uses predefined functions 'recall' and 'precision' to calculate the effectiveness of a spam filter. Been a while since I've done any programming so I'm a little rusty. so far all I have is 'import sys'
The functions are defined in another program, so to get them into my program would I do 'import [program]' or do I have to import the functions individually?

The data files consist of two fields that are separated by a tab space. The first field tells the name of each email and the second field tells whether it is a spam or not: 1 if it is a spam and 0 if it is not. The second file tells what the filter thought: 1 if it thought the email was a spam and 0 if it did not.So would I need to strip and compare the files? Do I need to use pickle? Not really sure where to begin on this one. Any help or info is appreciated!

Is This A Good Question/Topic? 0
  • +

Replies To: precision and recall on a spam filter

#2 Python_4_President  Icon User is offline

  • D.I.C Regular

Reputation: 53
  • View blog
  • Posts: 321
  • Joined: 13-August 11

Re: precision and recall on a spam filter

Posted 12 December 2012 - 11:41 AM

This ought to help you get started

say I have spammy.py which is a library that contains two functions, mine, and yours.

import spammy



This lets you say:
spammy.mine
spammy.yours



from spammy import mine
from spammy import yours



this lets you say:
mine
yours




file_truth_text = open("replace_this_with_path_to_your_file_truth_table.txt", 'r').readlines()
for line in file_truth_text:
    print line



you can also say:
truth_file = open("path/to/your/file", 'r')
for line in truth_file:
    print line




Do some stuff with that and come back when you hit another wall you can't beat down.
Was This Post Helpful? 1
  • +
  • -

#3 marth17  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 4
  • Joined: 11-December 12

Re: precision and recall on a spam filter

Posted 12 December 2012 - 01:19 PM

here's what I have so far:

import hw10_lib
from hw10_lib import precision
from hw10_lib import recall

actual = open("~/hw10/hw10.ref", 'r').readlines()
for line in actual:
        print (line)



I'm getting an IOError: [Errno 2] No such file or directory: '~/hw10/hw10.ref'
Which is odd cause I did an ls and the file is right there in the directory
Was This Post Helpful? 0
  • +
  • -

#4 Nallo  Icon User is offline

  • D.I.C Head
  • member icon

Reputation: 161
  • View blog
  • Posts: 247
  • Joined: 19-July 09

Re: precision and recall on a spam filter

Posted 12 December 2012 - 02:52 PM

Python doesn't understand the ~ for your home directory. You have to spell the path out.

So open("/home/marth17/hw10/hw10.ref") instead of open("~/hw10/hw10.ref"). Assuming that /home/marth17 is your homedirectory.
Was This Post Helpful? 1
  • +
  • -

#5 Nallo  Icon User is offline

  • D.I.C Head
  • member icon

Reputation: 161
  • View blog
  • Posts: 247
  • Joined: 19-July 09

Re: precision and recall on a spam filter

Posted 12 December 2012 - 03:01 PM

On second thought it might be better to use pythons os.path module to get the path:

import os

home_path = os.path.expanduser("~")
full_path = os.path.join(home_path, "hw10/hw10.ref")
actual = open(full_path, "r").readlines()
...


Was This Post Helpful? 0
  • +
  • -

#6 marth17  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 4
  • Joined: 11-December 12

Re: precision and recall on a spam filter

Posted 12 December 2012 - 06:02 PM

ok fixed that, and got it to print out hw10.ref, but now how do I compare that file with the other file in terms of precision and accuracy?
Was This Post Helpful? 0
  • +
  • -

#7 marth17  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 4
  • Joined: 11-December 12

Re: precision and recall on a spam filter

Posted 12 December 2012 - 11:42 PM

when using predefined functions, I cannot get them to return anything other than the value 1.0.

I know that this is not correct because I are supposed to get a precision result of 0.529411764706.

Also, I am using pop because for some reason the first entry of each list is not a number, so I can't use append(int(...

here's what I have:

import hw10_lib
from hw10_lib import precision
from hw10_lib import recall

actual = []
for line in open("/path/hw10.ref", 'r'):
    actual.append(line.strip().split('\t')[-1])
actual.pop(0)

predicted = []
for line in open("/path/hw10.hyp", 'r'):
    predicted.append(line.strip().split('\t')[-1])
predicted.pop(0)

prec = precision(actual, predicted)
rec = recall(actual, predicted)

print ('Precision: ', prec)
print ('Recall: ', rec)

Was This Post Helpful? 0
  • +
  • -

#8 Nallo  Icon User is offline

  • D.I.C Head
  • member icon

Reputation: 161
  • View blog
  • Posts: 247
  • Joined: 19-July 09

Re: precision and recall on a spam filter

Posted 13 December 2012 - 11:56 AM

You should show us the recall and precision functions. Otherwise there is no way to tell what went wrong.
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1