Most Common Words using LINQ

  • (2 Pages)
  • +
  • 1
  • 2

15 Replies - 1407 Views - Last Post: 17 June 2013 - 06:09 PM Rate Topic: -----

#1 ianpb  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 14
  • Joined: 11-July 12

Most Common Words using LINQ

Posted 15 June 2013 - 12:29 AM

I wanna get the most common words in a text file using LINQ, but still not successful, need your suggestion on this.

thanks

public void MostUsedWords()
        {
            string sentence;
            sentence = txtParagraph.Text;
            char[] delimiters = new char[] { ' ', '.', '?', '!' };
            List<string> splitStr = sentence.Split(delimiters, StringSplitOptions.RemoveEmptyEntries).ToList();

            var orderedDic = splitStr.OrderByDescending(x => x).GroupBy(y => y.Count()).Take(1).ToArray();

            if (orderedDic.Length == 1)
            {
                txtFreqWord.Text = orderedDic[0].ToString();
            }
            else
            {
                string words = "";

                for (int i = 0; i < orderedDic.Length; i++)
                {
                    if (i == 0)
                    {
                        words = orderedDic[i].ToString();
                    }
                    else
                    {
                        words += ", " + orderedDic[i].ToString();
                    }
                }

                txtFreqWord.Text = words;

            }
        }



Is This A Good Question/Topic? 0
  • +

Replies To: Most Common Words using LINQ

#2 sepp2k  Icon User is offline

  • D.I.C Lover
  • member icon

Reputation: 2116
  • View blog
  • Posts: 3,242
  • Joined: 21-June 11

Re: Most Common Words using LINQ

Posted 15 June 2013 - 02:36 AM

I don't see why you start by sorting the words. The words don't need to be sorted in order to use GroupBy on them and since you only care about the most common word, it doesn't really help you if the words are in alphabetical order.

You're then grouping the words by their length (calling Count() on a String returns its length), so the words with the same length will be in the same group. Since the length of a word is irrelevant to your problem, this will also not help you.

If you group the words by themselves instead, you'll get groups that only contain occurrences of the same word, so the length of each group would tell you how often each word occurred.
Was This Post Helpful? 2
  • +
  • -

#3 AdamSpeight2008  Icon User is offline

  • MrCupOfT
  • member icon


Reputation: 2263
  • View blog
  • Posts: 9,467
  • Joined: 29-May 08

Re: Most Common Words using LINQ

Posted 15 June 2013 - 04:29 AM

Also you are only return a single value because using .Take(1)

This post has been edited by AdamSpeight2008: 15 June 2013 - 04:36 AM

Was This Post Helpful? 1
  • +
  • -

#4 coder3788  Icon User is offline

  • D.I.C Head

Reputation: 38
  • View blog
  • Posts: 62
  • Joined: 06-November 12

Re: Most Common Words using LINQ

Posted 15 June 2013 - 05:02 AM

Your problem solved with RegEx
Was This Post Helpful? 2
  • +
  • -

#5 ianpb  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 14
  • Joined: 11-July 12

Re: Most Common Words using LINQ

Posted 15 June 2013 - 05:09 AM

Thanks everyone for the valuable inputs, I will make some adjustment and try also using Regex beside the one that I have started for the sake of learning....cheers.
Was This Post Helpful? 0
  • +
  • -

#6 AdamSpeight2008  Icon User is offline

  • MrCupOfT
  • member icon


Reputation: 2263
  • View blog
  • Posts: 9,467
  • Joined: 29-May 08

Re: Most Common Words using LINQ

Posted 15 June 2013 - 05:19 AM

coder3788 That's a lot code for this task.

This show the power LINQ (~9 LOC).
var words     = sentence.Split(delimiters, StringSplitOptions.RemoveEmptyEntries);
var wordFreqs = words.GroupBy( word => word).OrderByDescending( wordGroup => WordGroup.Count() ).ToArray();
var mostPop   = wordFreqs.FirstOrDefault();
Console.WriteLine("{0} Unique Words",wordFreqs.Count());
Console.WriteLine("{0} Total Words", word.Freqs.Sum( WordGroup => WordGroup.Count()));
foreach(var wordFreq In wordFreqs)
{
  Console.WriteLine("{0} x {1}",word.Count(), word.Key);
}


This post has been edited by AdamSpeight2008: 15 June 2013 - 05:27 AM

Was This Post Helpful? 4
  • +
  • -

#7 ianpb  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 14
  • Joined: 11-July 12

Re: Most Common Words using LINQ

Posted 15 June 2013 - 06:23 AM

View PostAdamSpeight2008, on 15 June 2013 - 05:19 AM, said:

coder3788 That's a lot code for this task.

This show the power LINQ (~9 LOC).
var words     = sentence.Split(delimiters, StringSplitOptions.RemoveEmptyEntries);
var wordFreqs = words.GroupBy( word => word).OrderByDescending( wordGroup => WordGroup.Count() ).ToArray();
var mostPop   = wordFreqs.FirstOrDefault();
Console.WriteLine("{0} Unique Words",wordFreqs.Count());
Console.WriteLine("{0} Total Words", word.Freqs.Sum( WordGroup => WordGroup.Count()));
foreach(var wordFreq In wordFreqs)
{
  Console.WriteLine("{0} x {1}",word.Count(), word.Key);
}



@Adam, but how to display only Key value in Textbox, it just need the top freq word(s), no need the count(number of repetitions).

Because I just realize that GroupBy gives 2 result to (TSource, TKey).
Was This Post Helpful? 0
  • +
  • -

#8 andrewsw  Icon User is online

  • Fire giant boob nipple gun!
  • member icon

Reputation: 3480
  • View blog
  • Posts: 11,874
  • Joined: 12-December 12

Re: Most Common Words using LINQ

Posted 15 June 2013 - 09:20 AM

Did you try anything? What amendment might you make to the following to change the output?

Console.WriteLine("{0} x {1}",word.Count(), word.Key);

You need to make a similar adjustment when writing to a TextBox.

This post has been edited by andrewsw: 15 June 2013 - 09:22 AM

Was This Post Helpful? 0
  • +
  • -

#9 ianpb  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 14
  • Joined: 11-July 12

Re: Most Common Words using LINQ

Posted 15 June 2013 - 05:12 PM

View Postandrewsw, on 15 June 2013 - 09:20 AM, said:

Did you try anything? What amendment might you make to the following to change the output?

Console.WriteLine("{0} x {1}",word.Count(), word.Key);

You need to make a similar adjustment when writing to a TextBox.


The conversion would be: txtFreqWord.Text = txtFreqWord.Text + word.Count() + word.Key + ",";

But, it wil print both value of the word and its repetition while I am trying just to print the top common word(s) only without their respective repetitions but when I tried to put only word.Key, it incurred an error cause can not explicitly convert (string,string) to string, so I tried to change other things such as changing the var word to string, etc,.not.workable...greatly appreciate ur help, sorry for the basic things here that I am asking to, cheers
Was This Post Helpful? 0
  • +
  • -

#10 andrewsw  Icon User is online

  • Fire giant boob nipple gun!
  • member icon

Reputation: 3480
  • View blog
  • Posts: 11,874
  • Joined: 12-December 12

Re: Most Common Words using LINQ

Posted 15 June 2013 - 06:33 PM

Note that Adam's example code should use wordFreq.Key in the foreach loop (and wordFreqs.Sum earlier in the code).

Anyway, usually ToString() works:

txtFreqWord.Text = txtFreqWord.Text + word.Key.ToString() + ",";

or wordFreq.Key if you have followed Adam's code too literally.

This post has been edited by andrewsw: 15 June 2013 - 06:34 PM

Was This Post Helpful? 1
  • +
  • -

#11 AdamSpeight2008  Icon User is offline

  • MrCupOfT
  • member icon


Reputation: 2263
  • View blog
  • Posts: 9,467
  • Joined: 29-May 08

Re: Most Common Words using LINQ

Posted 15 June 2013 - 09:04 PM

ianpb:= Or thinking about the code your write or just copying and pasting my code example? Why are you including wordFreq.Count() in the string? If you don't require it. Is it not obvious?
Was This Post Helpful? 0
  • +
  • -

#12 ianpb  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 14
  • Joined: 11-July 12

Re: Most Common Words using LINQ

Posted 16 June 2013 - 03:11 AM

Andrew n Adam, thanks for the input, sorry just reply again, just came back home...I made the code exactly as Adam's post, but it I didn't include the Count, because it is obvious that I don't need it, and it gives result all the words that has been parsed in descending order.

But the output that is need is only to show the top word(s) only....

This is the code I got from Adam:

public void MostUsedWords()
        {
            string sentence;
            sentence = txtParagraph.Text;
            char[] delimiters = new char[] { ' ', '.', '?', '!' };
            var words = sentence.Split(delimiters, StringSplitOptions.RemoveEmptyEntries);

            var wordFreqs = words.GroupBy(word => word).OrderByDescending(wordgroup => wordgroup.Count()).ToArray();
            var mostPop = wordFreqs.FirstOrDefault();

            foreach (var wordFreq in wordFreqs)
            {
                txtFreqWord.Text = txtFreqWord.Text + wordFreq.Key.ToString() + ", ";
            }



And it gives result all the parsed words in descending order, so I tried to tweak it with if function, so it would give only the top words as below:

public void MostUsedWords()
        {
            string sentence;
            sentence = txtParagraph.Text;
            char[] delimiters = new char[] { ' ', '.', '?', '!' };
            var words = sentence.Split(delimiters, StringSplitOptions.RemoveEmptyEntries);

            var wordFreqs = words.GroupBy(word => word).OrderByDescending(wordgroup => wordgroup.Count()).ToArray();
            var mostPop = wordFreqs.FirstOrDefault();

                 
        if (wordFreqs.Length == 1)
        {
            txtFreqWord.Text = wordFreqs.Key.ToArray[0];
        }
        else
        {
            string words = "";

            for (int i = 0; i < wordFreqs[i].Length; i++)
            {
                if (i == 0)
                {
                    words = wordFreqs[i];
                }
                else
                {
                    words += ", " + wordFreqs[i];

                }
            }

            txtFreqWord.Text = words;




But the logic behind that "if functions", it needs input from Array, so I can loop it to get only the most top common Word(s) or do you have any suggestion on it?

Many thanks fr your patience.
Was This Post Helpful? 0
  • +
  • -

#13 ianpb  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 14
  • Joined: 11-July 12

Re: Most Common Words using LINQ

Posted 16 June 2013 - 03:23 AM

While if I am using FirstOrDefault it would only give 1 top Word while it might be several other words with the same amount of occurrences in the parsed text file. i.e: instead of David, John, Albert; it would only show David while those 3 names are occurred in the same amount of appearances in the text.
Was This Post Helpful? 0
  • +
  • -

#14 Skydiver  Icon User is offline

  • Code herder
  • member icon

Reputation: 3576
  • View blog
  • Posts: 11,121
  • Joined: 05-May 12

Re: Most Common Words using LINQ

Posted 16 June 2013 - 07:07 PM

So in that case, you'll have to probe the results to see if there is a tie among the top contenders. Personally, I would use a while loop or a TakeWhile() extension method, but there are other approaches as well.
Was This Post Helpful? 1
  • +
  • -

#15 AdamSpeight2008  Icon User is offline

  • MrCupOfT
  • member icon


Reputation: 2263
  • View blog
  • Posts: 9,467
  • Joined: 29-May 08

Re: Most Common Words using LINQ

Posted 17 June 2013 - 06:35 AM

Something like this.
.TakeWhile( wf => wf.Count() = mostPop.Count() )

This post has been edited by AdamSpeight2008: 17 June 2013 - 08:27 AM

Was This Post Helpful? 1
  • +
  • -

  • (2 Pages)
  • +
  • 1
  • 2