Word sorting program

  • (10 Pages)
  • +
  • 1
  • 2
  • 3
  • 4
  • 5
  • Last »

140 Replies - 7218 Views - Last Post: 04 November 2013 - 03:14 PM Rate Topic: -----

#31 lewm  Icon User is offline

  • D.I.C Head

Reputation: 7
  • View blog
  • Posts: 160
  • Joined: 29-March 13

Re: Word sorting program

Posted 04 September 2013 - 12:05 PM

Ok im completely lost, on the array of pointers and how to assign memory. I cant get my head round how this would work i still have to store all the words into memory dont i? Or can i just save the address of each word to a pointer and if so then when i use the pointer it still has to access the file so i still have same performance problem.

Ok can i start again by asking what is the best way to read 200,000 or an unknown amount of words into memory so i can work with them?

This post has been edited by lewm: 04 September 2013 - 12:16 PM

Was This Post Helpful? 0
  • +
  • -

#32 jimblumberg  Icon User is online

  • member icon


Reputation: 4074
  • View blog
  • Posts: 12,571
  • Joined: 25-December 09

Re: Word sorting program

Posted 04 September 2013 - 12:46 PM

Post your current code. Also quite a few posts ago I asked some questions about your files, I haven't seen any answers to those questions (post #12).

What is the size of your input file, and what does that file contain?

Show a small sample.

Is that file sorted before you start trying to add your new words into this file?

Jim
Was This Post Helpful? 0
  • +
  • -

#33 lewm  Icon User is offline

  • D.I.C Head

Reputation: 7
  • View blog
  • Posts: 160
  • Joined: 29-March 13

Re: Word sorting program

Posted 04 September 2013 - 12:56 PM

Ok file size is 1.5Mb it contains a list of words in alphabetical order which i am going to use for a game.
I got the words online, i keep adding words (a few hundred at a time) to the file so i need my program to sort through and remove duplicates, my original program did this but was slow.

Sample:
ABARTICULATION
ABASIA
ABASIC
ABATJOUR
ABATTIS
ABAXIAL
ABBATICAL
ABBATIS
ABBREVIATURE
ABCOULOMB
ABDAL
ABDERITE
ABDICANT
ABDITORY
ABDITOS
ABDOMINOCENTESIS
ABDOMINOSCOPE
ABDOMINOSCOPY
ABDOMINOUS
ABDOMINOUSNESS
ABDOMINOVESICAL
ABDUCE
ABDUCENT
ABDUCTIVE
ABECEDARIUS

This post has been edited by lewm: 04 September 2013 - 01:08 PM

Was This Post Helpful? 0
  • +
  • -

#34 lewm  Icon User is offline

  • D.I.C Head

Reputation: 7
  • View blog
  • Posts: 160
  • Joined: 29-March 13

Re: Word sorting program

Posted 04 September 2013 - 01:12 PM

I dont know what you mean by sorted, i dont have any new code to post.

All i want to do now is read an unknown amount of words into memory so i can work with them in a number of different ways: remove duplicates, sort into alphabetical order etc. I want to write the sorted words into a new file.

This post has been edited by lewm: 04 September 2013 - 01:21 PM

Was This Post Helpful? 0
  • +
  • -

#35 lewm  Icon User is offline

  • D.I.C Head

Reputation: 7
  • View blog
  • Posts: 160
  • Joined: 29-March 13

Re: Word sorting program

Posted 04 September 2013 - 01:24 PM

When i run the program i will be able to count how many words there are so i suppose its not an unknown amount of words...

This post has been edited by lewm: 04 September 2013 - 01:25 PM

Was This Post Helpful? 0
  • +
  • -

#36 jimblumberg  Icon User is online

  • member icon


Reputation: 4074
  • View blog
  • Posts: 12,571
  • Joined: 25-December 09

Re: Word sorting program

Posted 04 September 2013 - 01:28 PM

Okay, you have a few words to add to a sorted file. Take the words you want to insert and put them into an array, sort that array. Open your sorted word file and create another file to hold the combined word list.

Read the first word from the file, if it compares less than the first word from your array, write the word from the file to the combined word list file. Continue doing this until you find a word from the file that compares either equal to or greater than the array element.

If they compare equal then you will write the word from the original file to the combined word list file, and increment the array element.

Jim
Was This Post Helpful? 0
  • +
  • -

#37 lewm  Icon User is offline

  • D.I.C Head

Reputation: 7
  • View blog
  • Posts: 160
  • Joined: 29-March 13

Re: Word sorting program

Posted 04 September 2013 - 01:34 PM

I thought of just checking for the new words i want to add but thats cheating. Lets say the list isnt sorted and there are a ton of duplicates, thats my objective for this program.
Was This Post Helpful? 0
  • +
  • -

#38 lewm  Icon User is offline

  • D.I.C Head

Reputation: 7
  • View blog
  • Posts: 160
  • Joined: 29-March 13

Re: Word sorting program

Posted 04 September 2013 - 01:41 PM

I want the program to sort through an unsorted list of words check for duplicates arrange them into alphabetical change them all to uppercase and write them all to a new file.

I just need to know how to read all the words into memory as i have created individual programs that does all this.
Was This Post Helpful? 0
  • +
  • -

#39 Adak  Icon User is offline

  • D.I.C Lover
  • member icon

Reputation: 331
  • View blog
  • Posts: 1,168
  • Joined: 01-April 11

Re: Word sorting program

Posted 04 September 2013 - 02:59 PM

Ha! You're doing the same thing I did awhile back (create a large word list, remove duplicates, etc.).
What worked well for me was trimming the word list down to a size I knew, and could fit in an array, all at once. (about 70,000 words). That was easy, (as you've done), but working with a very large amount of words, requires a different approach.

I'll show you the tricks I've learned, in about 4 hours, after work.

This post has been edited by Adak: 04 September 2013 - 03:01 PM

Was This Post Helpful? 0
  • +
  • -

#40 #define  Icon User is online

  • Duke of Err
  • member icon

Reputation: 1346
  • View blog
  • Posts: 4,637
  • Joined: 19-February 09

Re: Word sorting program

Posted 04 September 2013 - 03:11 PM

You have a few of options.

1) Try and learn about pointers, malloc and free, which can be tricky for beginners.

2) Split the words into several files eg wordsA.txt, wordsB.txt etc.

3) Read a few thousand words at a time, check the words, any that are out of place put in another file (transaction file), while creating an updated words file. Sort the transaction file. Merge the main words file and the transaction file.
Was This Post Helpful? 1
  • +
  • -

#41 Adak  Icon User is offline

  • D.I.C Lover
  • member icon

Reputation: 331
  • View blog
  • Posts: 1,168
  • Joined: 01-April 11

Re: Word sorting program

Posted 04 September 2013 - 07:28 PM

This is what I'd call "the EZ external sorter", since it uses the system sorter (which is VERY fast, btw), and then I added a duplicate word remover.

This program has only been *very* slightly checked over for bugs, so be sure and test it thoroughly, before you trust it.

The whole program is quite fast, but it's not been optimized.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void) {
   char word1[80],word2[80], *p;
   int n=0;
   FILE *fpIn,*fpOut; 
   /* If you want to change the words to all uppercase letters, you should open the words file,
      here, and make those changes, before the sorting takes place.
   */

   /* for the sort command to work, the program must be in a directory
      which is "in the path". One such directory is C:\Users\YourUsersName */
   
   printf("enter 'sort' to sort all words in the word list\n");
   fgets(word1, sizeof(word1), stdin);
   p=strchr(word1, '\n');
   if(p) 
      *p='\0';
   if((word1,"sort")==0) {
      system("sort <words.txt >wordsSorted.txt");
   }      
   

   //words are sorted, remove all the duplicate words
   fpIn=fopen("wordsSorted.txt", "r");
   fpOut=fopen("wordsFinal.txt", "w");
   if(!fpIn || !fpOut) {
      printf("Files did not open, terminating program.\n");
      return 1;
   }
   word1[0]='\0'; 
   fgets(word1, sizeof(word1), fpIn);
   while((fgets(word2, sizeof(word2),fpIn))!=NULL) {
      if((strcmp(word1,word2))!= 0) {
         printf("%s",word1);
         ++n;

      }
      strcpy(word1,word2);
   }
   printf("%s\n",word1);
   ++n;
   fclose(fpIn);
   fclose(fpOut);

   printf("Word list has %d unique words\n",n);
   return 0;
}




I noticed that some words, like ABECEDARIUS , have a space after them, in your posted words. You might want to code up a fix to that, in the program. The problem being that ABECEDARIUS , and ABECEDARIUS, will be seen by the program, as two different words, not a duplicate word.

Remember to compile the above program first, and put it into a directory that is in the system path, as noted in the program. Otherwise, you'll get a "sort is not recognized as an internal or external command by the system", error.

The word files must be in the same directory as the exe program.
Was This Post Helpful? 1
  • +
  • -

#42 Skydiver  Icon User is offline

  • Code herder
  • member icon

Reputation: 3576
  • View blog
  • Posts: 11,125
  • Joined: 05-May 12

Re: Word sorting program

Posted 05 September 2013 - 05:33 AM

View Postlewm, on 04 September 2013 - 03:56 PM, said:

Ok file size is 1.5Mb it contains a list of words in alphabetical order


With a 32-bit compiler, and any modern OS that supports virtual memory, 1.5Mb will fit in "memory" since the addressable space for a 32-bit pointer is 4GB.
Was This Post Helpful? 1
  • +
  • -

#43 jimblumberg  Icon User is online

  • member icon


Reputation: 4074
  • View blog
  • Posts: 12,571
  • Joined: 25-December 09

Re: Word sorting program

Posted 05 September 2013 - 07:12 AM

Quote

Ok file size is 1.5Mb it contains a list of words in alphabetical order which i am going to use for a game.

Quote

I dont know what you mean by sorted

Your file is sorted (in alphabetical order), so you don't need to load that large file into memory to add more items.

If the file isn't sorted then you can use the method Adak mentioned in post #41 to sort the values. You may need to insure all the items are in the proper case before you do the sort.

Jim
Was This Post Helpful? 0
  • +
  • -

#44 lewm  Icon User is offline

  • D.I.C Head

Reputation: 7
  • View blog
  • Posts: 160
  • Joined: 29-March 13

Re: Word sorting program

Posted 05 September 2013 - 10:48 AM

View PostSkydiver, on 05 September 2013 - 05:33 AM, said:

View Postlewm, on 04 September 2013 - 03:56 PM, said:

Ok file size is 1.5Mb it contains a list of words in alphabetical order


With a 32-bit compiler, and any modern OS that supports virtual memory, 1.5Mb will fit in "memory" since the addressable space for a 32-bit pointer is 4GB.

So where am i going wrong trying to get them all into memory with my second attempt?
Was This Post Helpful? 0
  • +
  • -

#45 lewm  Icon User is offline

  • D.I.C Head

Reputation: 7
  • View blog
  • Posts: 160
  • Joined: 29-March 13

Re: Word sorting program

Posted 05 September 2013 - 11:02 AM

I dont mean to be rude but i am still no closer to understanding how to get them ALL into memory if i wanted to.
I know a little about pointers and a little about malloc but ive never used them together.
Im using a 64bit system with 8Gb ram.
Is there a way around using malloc other than splitting them up into seperate files could i use a 3 dimentional array or something?
Could someone help me understand how i would read them into memory using pointers and malloc.
The code that adak provided looks very useful and is much appreciated but id rather not use the system to sort id rather do it myself if possible.
Was This Post Helpful? 0
  • +
  • -

  • (10 Pages)
  • +
  • 1
  • 2
  • 3
  • 4
  • 5
  • Last »