set operations on sets with more than one of the same member

  • (3 Pages)
  • +
  • 1
  • 2
  • 3

38 Replies - 2056 Views - Last Post: 13 March 2018 - 01:54 PM Rate Topic: -----

#16 DK3250   User is offline

  • Pythonian
  • member icon

Reputation: 420
  • View blog
  • Posts: 1,341
  • Joined: 27-December 13

Re: set operations on sets with more than one of the same member

Posted 10 March 2018 - 12:29 AM

The code in post #14 was build direct from yours in post #13.
I still find it somewhat convoluted, and decided to start over.

I find this simpler and easier to read (and shorter..!!!):
lsts = {'e': ['a', 'b', 'c', 'a', 'd', 'b', 'a'],
        'f': ['1', '2', '3', '4', '1', '4', '3', '5', '4'],
        'g': ['qq', 'qq', 'rr', 'gg',  'qq', 'rr']}

def add_number(lst, item, number):
    indx = lst.index(item)
    lst[indx] += '_' + str(number)

def differentiate(lst):
    for pos, item in enumerate(lst):
        number = 0
        while item in lst[pos+1:]:
            add_number(lst, item, number)
            number += 1
    return lst

for key, lst in lsts.items():
    lsts[key] = differentiate(lst)

print(lsts)


EDIT: minor corrction

This post has been edited by DK3250: 10 March 2018 - 12:39 AM

Was This Post Helpful? 0
  • +
  • -

#17 jon.kiparsky   User is online

  • Beginner
  • member icon


Reputation: 11121
  • View blog
  • Posts: 19,065
  • Joined: 19-March 11

Re: set operations on sets with more than one of the same member

Posted 10 March 2018 - 12:52 AM

So I've hinted at this before, but the seed seems to have fallen on the stony ground. No matter, ever the optimist I raise the notion once again: you are trying to write functions about the behavior of a particular data structure, specifically a "multiset". I think it would be both a good exercise and good programming practice (in the "best practice" sense) to write a class representing this data structure, and to write the relevant functions governing this data structure directly into the class.
I am confident that this will bring you much more joy than writing this as free-standing functions. Go, write a Multiset class. Then I can start bugging you about writing tests for it.
Was This Post Helpful? 1
  • +
  • -

#18 ndc85430   User is offline

  • I think you'll find it's "Dr"
  • member icon

Reputation: 869
  • View blog
  • Posts: 3,491
  • Joined: 13-June 14

Re: set operations on sets with more than one of the same member

Posted 10 March 2018 - 01:03 AM

Or start the other way by thinking about how you would use such a class and write tests first!
Was This Post Helpful? 1
  • +
  • -

#19 DK3250   User is offline

  • Pythonian
  • member icon

Reputation: 420
  • View blog
  • Posts: 1,341
  • Joined: 27-December 13

Re: set operations on sets with more than one of the same member

Posted 10 March 2018 - 06:12 AM

I guess that jon.kiparskys remarks in post #17 are meant for OP.
As autodidact, however, I also feel challenged by this.
As for a formalized test, I am completely blank. Don't know anything about it - I just test as I think is best (adequate).

I have tried to make a "Multiset" class. I have no idea if this is anyway near what jon has in mind.
It is enclosed here under spoiler protection, if any other should want to give it a try...

Spoiler

EDIT: In mulitline description changed b to c because of DIC graphical artefact
EDIT: Updates in __init__ and __repr__

This post has been edited by DK3250: 10 March 2018 - 06:48 AM

Was This Post Helpful? 1
  • +
  • -

#20 jon.kiparsky   User is online

  • Beginner
  • member icon


Reputation: 11121
  • View blog
  • Posts: 19,065
  • Joined: 19-March 11

Re: set operations on sets with more than one of the same member

Posted 10 March 2018 - 09:17 AM

Cool, that's a start. However I would refer you to the definition of a multiset. In general, it should support the set operations, so I would want to see intersection and symmetric difference supported. These are sort of like your "in_both" and "xor" methods, but those don't quite hit the mark as they do not respect the number of occurrences, only the presence of an item.

So I might want to have
m1 = Multiset([1,1,2,3,4,4])
m2 = Multiset([1,2,2,4,4])
m3 = Multiset([1,2,3,4,4])
m4 = Multiset([1,3])

assert m1.intersection(m2) == m2.intersection(m1)  # this is a general expectation of sets, we should preserve it
assert m1.intersection(m2) == m3
assert m1.set_difference(m2) == m4


The assert statement is the basis for most automated testing: it evaluates an expression e as a boolean (using the standard truthiness convention) and either does nothing (if e is truthy) or raises an Assertionerror (if e is falsey)

If I evaluate a set of assertions in a try block, I can make myself a very simple test runner. Let's assume that I have a list of test cases which are simply functions which either raise an Assertionerror or do not. I'm going to be very permissive here and assume we don't mind other errors and exceptions - we assume those are intentional.
Note that my print functions assume python 3.

def run_tests(test_cases):
  failures = []
  for i, test_case in enumerate(test_cases, 1):
    try:
      test_case()
      print (".", end="")  # visual representation of success. You can watch your tests passing!
    except Assertionerror:
      failures.append(i)
      print ("F", end="") 
    except:  # intentional bare except, as we're going to pass on any other exception or error
      pass
  print_test_report(failures) 

def print_test_report(failures):
  if failures:
    print ("You had %d failing tests. Cases %s failed." % (len(failures), ", ".join(map(str, failures))))



Try converting your printed test cases to assertions and see if you can make this work for your tests. What you'd like is to be able to define a bunch of tests and call run_tests and just see it succeed or fail. You should not be evaluating success, as this requires a programmer to think and is therefore costly, inefficient, and error-prone.
Obviously, this is pretty minimal, and improving it might be a fun project (actually, there's a lot to learn there) but also obviously in the real world there are existing test runners which are quite effective, and you should use those.


I might also wonder if you've used the most efficient underlying representation. It seems to me that storing all of the data as a list is sort of the maximal representation and requires a certain amount of work for each operation. The Counter object provides you with a more effective representation. You might also consider making use of a defaultdict if you go this route, it's likely to simplify your computations a little.

EDIT: One way to think about this is, it should form an "accidental superclass" of the existing python set. That is, I would expect Multisets with no duplicates to behave exactly like python sets.
Was This Post Helpful? 1
  • +
  • -

#21 bobsmith76   User is offline

  • D.I.C Head

Reputation: 10
  • View blog
  • Posts: 244
  • Joined: 14-February 17

Re: set operations on sets with more than one of the same member

Posted 10 March 2018 - 04:51 PM

View Postjon.kiparsky, on 10 March 2018 - 12:52 AM, said:

So I've hinted at this before, but the seed seems to have fallen on the stony ground. No matter, ever the optimist I raise the notion once again: you are trying to write functions about the behavior of a particular data structure, specifically a "multiset". I think it would be both a good exercise and good programming practice (in the "best practice" sense) to write a class representing this data structure, and to write the relevant functions governing this data structure directly into the class.
I am confident that this will bring you much more joy than writing this as free-standing functions. Go, write a Multiset class. Then I can start bugging you about writing tests for it.



Why didn't you tell me about multisets before I rewrote the code? They seem to be exactly what I want. When you put 'multisets' in quotes in your first reply I didn't assume you were referring to some objects that actually existed in Python.

This post has been edited by bobsmith76: 10 March 2018 - 04:51 PM

Was This Post Helpful? 0
  • +
  • -

#22 jon.kiparsky   User is online

  • Beginner
  • member icon


Reputation: 11121
  • View blog
  • Posts: 19,065
  • Joined: 19-March 11

Re: set operations on sets with more than one of the same member

Posted 10 March 2018 - 09:14 PM

To be fair, I did mention them.
Was This Post Helpful? 0
  • +
  • -

#23 DK3250   User is offline

  • Pythonian
  • member icon

Reputation: 420
  • View blog
  • Posts: 1,341
  • Joined: 27-December 13

Re: set operations on sets with more than one of the same member

Posted 11 March 2018 - 03:17 AM

Thank you, jon.kiparsky, both for the crash course on testing and the feedback on my first attempt of writing a Multiset class.

It makes perfectly sense and I will soon make and post an updated version.

If anyone else work along:
Please DO NOT show you code (or use spoiler protection). I really want to work myself through this for optimized learning.

To bobsmith76: 'Multiset' is not an existing Python type; this is why we need to write it ourselves...
Was This Post Helpful? 0
  • +
  • -

#24 ndc85430   User is offline

  • I think you'll find it's "Dr"
  • member icon

Reputation: 869
  • View blog
  • Posts: 3,491
  • Joined: 13-June 14

Re: set operations on sets with more than one of the same member

Posted 11 March 2018 - 04:08 AM

Jon alludes to this in his post, but the Python standard library includes a module for writing unit tests. There are also third party testing libraries (nose, py.test and others probably).
Was This Post Helpful? 1
  • +
  • -

#25 DK3250   User is offline

  • Pythonian
  • member icon

Reputation: 420
  • View blog
  • Posts: 1,341
  • Joined: 27-December 13

Re: set operations on sets with more than one of the same member

Posted 11 March 2018 - 05:06 AM

View Postjon.kiparsky, on 10 March 2018 - 05:17 PM, said:

So I might want to have
m1 = Multiset([1,1,2,3,4,4])
m2 = Multiset([1,2,2,4,4])
m3 = Multiset([1,2,3,4,4])
m4 = Multiset([1,3])

assert m1.intersection(m2) == m2.intersection(m1)  # this is a general expectation of sets, we should preserve it
assert m1.intersection(m2) == m3
assert m1.set_difference(m2) == m4

@jon.kiparsky: While working with my second version of 'Multiset', I wonder if you made two small mistakes. I think it should be:
m1 = Multiset([1,1,2,3,4,4])
m2 = Multiset([1,2,2,4,4])
m3 = Multiset([1,2,4,4])  # no '3'
m4 = Multiset([1,2,3])  # added '2'

Sorry for being so detailed, but I need to be sure...
Was This Post Helpful? 1
  • +
  • -

#26 DK3250   User is offline

  • Pythonian
  • member icon

Reputation: 420
  • View blog
  • Posts: 1,341
  • Joined: 27-December 13

Re: set operations on sets with more than one of the same member

Posted 11 March 2018 - 06:51 AM

Ok, here is version 2 of 'Multiset', including my test in version 1.
I decided to stay with lists as the internal representation of data. This is simply to avoid too many changes at the same time.
Also, I am not so familiar with the collection module; so, one step at the time...

I'm not so happy with the __eq__ function, but it has been a hard Saturday evening, so with your permission :gun_bandana:/>/> I'll need to revert to this later.

Version 2, ready for feedback:
Spoiler

This post has been edited by DK3250: 11 March 2018 - 06:54 AM

Was This Post Helpful? 1
  • +
  • -

#27 jon.kiparsky   User is online

  • Beginner
  • member icon


Reputation: 11121
  • View blog
  • Posts: 19,065
  • Joined: 19-March 11

Re: set operations on sets with more than one of the same member

Posted 11 March 2018 - 11:09 AM

Your _frequence function is more or less reinventing the Counter, so that's cool. You could swap that out if you want but you're at least thinking along the right lines, representing the set's membership as a dict of {item: frequency}. defaultdict, again, would cut a few lines out of that as well:

Spoiler


or just

Spoiler


The rest of your code looks pretty sound, though again I think you'll find that if you use the self.frq as the basis of your representation you'll be able to get some real improvements in both time and space. I'm particularly not too hot on all the list copying. Doesn't actually make any difference until the sets get pretty huge, but it's nice to be better.
That being said, as a proof of concept, yeah, this is good. I think you clearly get the idea.

Little nitpicking, I would usually define the intersection and union and set difference functions and then let the dunder methods call those, but that's a matter of style. I don't know how deep you want me to go on the code review, but that's a first pass at least.

What do you think? Do you think you got where you wanted to with it? Did you enjoy writing it? Did you enjoy having a few tests on hand to drive your work? :)
If you were going to improve anything here, where would you go next?

Quote

I'm not so happy with the __eq__ function


Since you called that out in particular, yeah, that feels a little forced. One simple improvement would be to notice that you have an underlying representation which knows about equality. If sorted(self.lst) == sorted(other.lst) then that should do you, right? :)
Was This Post Helpful? 1
  • +
  • -

#28 jon.kiparsky   User is online

  • Beginner
  • member icon


Reputation: 11121
  • View blog
  • Posts: 19,065
  • Joined: 19-March 11

Re: set operations on sets with more than one of the same member

Posted 11 March 2018 - 11:17 AM

View Postndc85430, on 11 March 2018 - 06:08 AM, said:

Jon alludes to this in his post, but the Python standard library includes a module for writing unit tests. There are also third party testing libraries (nose, py.test and others probably).


Yes, thanks for bringing those into the discussion. I absolutely do not recommend reinventing the testing library for production code unless you can point to specific flaws in the existing libraries that you want to remedy.
That being said, writing a testing library is a pretty interesting pedagogical exercise, particularly if you start from something as rudimentary as what I've sketched out here. There's clearly plenty of room for improvement, but you have to decide what the improvements would be, which is kind of the fun part. It's nice to have a chance to be both the programmer and the product owner/client once in a while.

View PostDK3250, on 11 March 2018 - 07:06 AM, said:

@jon.kiparsky: While working with my second version of 'Multiset', I wonder if you made two small mistakes.


You are correct.
Was This Post Helpful? 0
  • +
  • -

#29 ndc85430   User is offline

  • I think you'll find it's "Dr"
  • member icon

Reputation: 869
  • View blog
  • Posts: 3,491
  • Joined: 13-June 14

Re: set operations on sets with more than one of the same member

Posted 11 March 2018 - 11:21 AM

Of course. IIRC, the second part of Beck's TDD book does just that (in Python too, I think).
Was This Post Helpful? 0
  • +
  • -

#30 jon.kiparsky   User is online

  • Beginner
  • member icon


Reputation: 11121
  • View blog
  • Posts: 19,065
  • Joined: 19-March 11

Re: set operations on sets with more than one of the same member

Posted 11 March 2018 - 11:22 AM

View PostDK3250, on 11 March 2018 - 05:17 AM, said:

To bobsmith76: 'Multiset' is not an existing Python type; this is why we need to write it ourselves...


Actually, I did a quick search on pypi and I see that there is an implementation out there. Of course, one of the reasons we use python is because this is usually true. Still a worthwhile exercise, IMO.
Was This Post Helpful? 0
  • +
  • -

  • (3 Pages)
  • +
  • 1
  • 2
  • 3