2 Replies - 154 Views - Last Post: 09 October 2018 - 07:21 AM Rate Topic: -----

#1 chloeCodes   User is offline

  • D.I.C Head

Reputation: 4
  • View blog
  • Posts: 215
  • Joined: 05-January 17

Python & MapReduce Model

Posted 09 October 2018 - 03:53 AM

I wrote my first parallel program using MapReduce Programming model.
/Self is standard for every python method, equivalent of this in Java
/_ refers to the key of the line
/line contains the String line input passed to the mapper
def mapper(self,_,line):

//For each word in the String line.
//words contain every word in the String input

//regular expression that obtains every word in String input
words = WORD_REGEX.findall(line)
for word in words:
yield(word,1)
My understanding of the map function. In the mapper phase of the MapReduce model, there are various map nodes.
The big data input, is split into smaller chunks. Each chunk is sent to separate map node. Each map node then
emits intermediate key value terms. For example, if I give the above mapper "Ibiza, Coimbra, Porto, Lisboa, Lisboa"
Will yield/emit: Ibiza 1, Coimbra 1, Porto 1, Lisboa 1, Lisboa 1

****Now the reduce function****
def reduce(self,word,counts):
//For each unique (key/word) calculates the sum of the number of occurrences.
yield(word, sum(counts))
****output****
Would output Ibiza 1, Coimbra 1, Lisboa 2....

(The above code works when I run with Hadoop)
Now, I want to change the word count algorithm such that only words that have 10 or more occurrences are printed
After some thought, I thought that I must manipulate the reduce() method. As that's where the sum(counts) happens
I thought about including an if statement, if sum(values)>10 then yield(word, sum(counts)). However, when I open the
resultant file I see a 0 by each word.


How should I adjust the reduce() method such that only words that occur more than 10 times in the input string text file
are output?

Is This A Good Question/Topic? 0
  • +

Replies To: Python & MapReduce Model

#2 andrewsw   User is offline

  • head thrashing
  • member icon

Reputation: 6630
  • View blog
  • Posts: 27,105
  • Joined: 12-December 12

Re: Python & MapReduce Model

Posted 09 October 2018 - 06:46 AM

I think you will encourage responses if you can provide an SSCCE or something approaching one.
Was This Post Helpful? 0
  • +
  • -

#3 chloeCodes   User is offline

  • D.I.C Head

Reputation: 4
  • View blog
  • Posts: 215
  • Joined: 05-January 17

Re: Python & MapReduce Model

Posted 09 October 2018 - 07:21 AM

View Postandrewsw, on 09 October 2018 - 06:46 AM, said:

I think you will encourage responses if you can provide an SSCCE or something approaching one.


Hehe you're right :) my q's are getting more technical as I advance xD

This post has been edited by chloeCodes: 09 October 2018 - 07:22 AM

Was This Post Helpful? 0
  • +
  • -

Page 1 of 1