3 Replies - 948 Views - Last Post: 01 April 2013 - 07:48 AM Rate Topic: -----

#1 Dude22  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 4
  • Joined: 31-March 13

program to spell check a file seg fault

Posted 31 March 2013 - 09:07 PM

Hi,

I am currently working on completing harvards cs50 course, and I need some help. I have 3 problem sets to go, the first is to create a spell checker, the second is to decompress, huffman compressed files, and the third is to create a website for buying and selling stocks. All that is followed by a test and final project!!!

Luckly, not all hope of finishing in time is lost! I am close to completing the site, and am about half done the decompressing program... The one I am here asking for help with, is the spell checker.

I, with the help of the users on this forum:

http://cboard.cprogr...check-file.html

Got a program, that is close to working. There was some miscommunication, however, about the provided programs, that it must work with. And that post is now dead, rather then open a new thread there I though I would try here. Properly explaining this, might take a bit of text, so If you are prepared to help me continue to learn, grab a drink and settle in for a long(ish) read.

I will start by defining, what the spell checker must do, infact I will copy this directly from the instructions:

-Alright, the challenge ahead of you is to implement load, check, size, and unload as efficiently as possible, in such a way that TIME IN load, TIME IN check, TIME IN size, and TIME IN unload are all minimized. To be sure, it's not obvious what it even means to be minimized, inasmuch as these benchmarks will certainly vary as you feed speller different values for dictionary and for text. But therein lies the challenge, if not the fun, of this problem set. This problem set is your chance to design. Although we invite you to minimize space, your ultimate enemy is time. But before you dive in, some specifications from us.

-You may not alter speller.c.

-You may alter dictionary.c (and, in fact, must in order to complete the implementations of load, check, size, and unload), but you may not alter the declarations of load, check, size, or unload.

-You may alter dictionary.h, but you may not alter the declarations of load, check, size, or unload.

-You may alter Makefile.

-You may add functions to dictionary.c or to files of your own creation so long as all of your code compiles via make.

-Your implementation of check must be case-insensitive. In other words, if foo is in dictionary, then check should return true given any capitalization thereof; none of foo, foO, fOo, fOO, fOO, Foo, FoO, FOo, and FOO should be considered misspelled.

-Capitalization aside, your implementation of check should only return true for words actually in dictionary. Beware hard-coding common words (e.g., the), lest we pass your implementation a dictionary without those same words. Moreover, the only possessives allowed are those actually in dictionary. In other words, even if foo is in dictionary, check should return false given foo's if foo's is not also in dictionary.
You may assume that check will only be passed strings with alphabetical characters and/or apostrophes.

-You may assume that any dictionary passed to your program will be structured exactly like ours, lexicographically sorted from top to bottom with one word per line, each of which ends with \n. You may also assume that dictionary will contain at least one word, that no word will be longer than LENGTH (a constant defined in dictionary.h) characters, that no word will appear more than once, and that each word will contain only lowercase alphabetical characters and possibly apostrophes.

-Your spell-checker may only take text and, optionally, dictionary as input. Although you might be inclined (particularly if among those more comfortable) to "pre-process" our default dictionary in order to derive an "ideal hash function" for it, you may not save the output of any such pre-processing to disk in order to load it back into memory on subsequent runs of your spell-checker in order to gain an advantage.



So, Here is the code they supply:

Speller.c

/****************************************************************************
 * speller.c
 *
 * Computer Science 50
 * Problem Set 5
 *
 * Implements a spell-checker.
 ***************************************************************************/

#include <ctype.h>
#include <stdio.h>
#include <sys/resource.h>
#include <sys/time.h>

#include "dictionary.h"

// default dictionary
#define DICTIONARY "/home/cs50/pset5/dictionaries/large"

// prototype
double calculate(const struct rusage* b, const struct rusage* a);

int main(int argc, char* argv[])
{
    // check for correct number of args
    if (argc != 2 && argc != 3)
    {
        printf("Usage: speller [dictionary] text\n");
        return 1;
    }

    // structs for timing data
    struct rusage before, after;

    // benchmarks
    double ti_load = 0.0, ti_check = 0.0, ti_size = 0.0, ti_unload = 0.0;

    // determine dictionary to use
    char* dictionary = (argc == 3) ? argv[1] : DICTIONARY;

    // load dictionary
    getrusage(RUSAGE_SELF, &before);
    bool loaded = load(dictionary);
    getrusage(RUSAGE_SELF, &after);

    // abort if dictionary not loaded
    if (!loaded)
    {
        printf("Could not load %s.\n", dictionary);
        return 1;
    }

    // calculate time to load dictionary
    ti_load = calculate(&before, &after);

    // try to open text
    char* text = (argc == 3) ? argv[2] : argv[1];
    FILE* fp = fopen(text, "r");
    if (fp == NULL)
    {
        printf("Could not open %s.\n", text);
        unload();
        return 1;
    }

    // prepare to report misspellings
    printf("\nMISSPELLED WORDS\n\n");

    // prepare to spell-check
    int index = 0, misspellings = 0, words = 0;
    char word[LENGTH+1];

    // spell-check each word in text
    for (int c = fgetc(fp); c != EOF; c = fgetc(fp))
    {
        // allow only alphabetical characters and apostrophes
        if (isalpha(c) || (c == '\'' && index > 0))
        {
            // append character to word
            word[index] = c;
            index++;

            // ignore alphabetical strings too long to be words
            if (index > LENGTH)
            {
                // consume remainder of alphabetical string
                while ((c = fgetc(fp)) != EOF && isalpha(c));

                // prepare for new word
                index = 0;
            }
        }

        // ignore words with numbers (like MS Word can)
        else if (isdigit(c))
        {
            // consume remainder of alphanumeric string
            while ((c = fgetc(fp)) != EOF && isalnum(c));

            // prepare for new word
            index = 0;
        }

        // we must have found a whole word
        else if (index > 0)
        {
            // terminate current word
            word[index] = '\0';

            // update counter
            words++;

            // check word's spelling
            getrusage(RUSAGE_SELF, &before);
            bool misspelled = !check(word);
            getrusage(RUSAGE_SELF, &after);

            // update benchmark
            ti_check += calculate(&before, &after);

            // print word if misspelled
            if (misspelled)
            {
                printf("%s\n", word);
                misspellings++;
            }

            // prepare for next word
            index = 0;
        }
    }

    // check whether there was an error
    if (ferror(fp))
    {
        fclose(fp);
        printf("Error reading %s.\n", text);
        unload();
        return 1;
    }

    // close text
    fclose(fp);

    // determine dictionary's size
    getrusage(RUSAGE_SELF, &before);
    unsigned int n = size();
    getrusage(RUSAGE_SELF, &after);

    // calculate time to determine dictionary's size
    ti_size = calculate(&before, &after);

    // unload dictionary
    getrusage(RUSAGE_SELF, &before);
    bool unloaded = unload();
    getrusage(RUSAGE_SELF, &after);

    // abort if dictionary not unloaded
    if (!unloaded)
    {
        printf("Could not unload %s.\n", dictionary);
        return 1;
    }

    // calculate time to unload dictionary
    ti_unload = calculate(&before, &after);

    // report benchmarks
    printf("\nWORDS MISSPELLED:     %d\n", misspellings);
    printf("WORDS IN DICTIONARY:  %d\n", n);
    printf("WORDS IN TEXT:        %d\n", words);
    printf("TIME IN load:         %.2f\n", ti_load);
    printf("TIME IN check:        %.2f\n", ti_check);
    printf("TIME IN size:         %.2f\n", ti_size);
    printf("TIME IN unload:       %.2f\n", ti_unload);
    printf("TIME IN TOTAL:        %.2f\n\n", 
     ti_load + ti_check + ti_size + ti_unload);

    // that's all folks
    return 0;
}

/**
 * Returns number of seconds between b and a.
 */
double calculate(const struct rusage* b, const struct rusage* a)
{
    if (b == NULL || a == NULL)
    {
        return 0.0;
    }
    else
    {
        return ((((a->ru_utime.tv_sec * 1000000 + a->ru_utime.tv_usec) -
                 (b->ru_utime.tv_sec * 1000000 + b->ru_utime.tv_usec)) +
                ((a->ru_stime.tv_sec * 1000000 + a->ru_stime.tv_usec) -
                 (b->ru_stime.tv_sec * 1000000 + b->ru_stime.tv_usec)))
                / 1000000.0);
    }
}


Dictionary.h

/****************************************************************************
 * dictionary.h
 *
 * Computer Science 50
 * Problem Set 5
 *
 * Declares a dictionary's functionality.
 ***************************************************************************/

#ifndef DICTIONARY_H
#define DICTIONARY_H

#include <stdbool.h>

// maximum length for a word
// (e.g., pneumonoultramicroscopicsilicovolcanoconiosis)
#define LENGTH 45

/**
 * Returns true if word is in dictionary else false.
 */
bool check(const char* word);

/**
 * Loads dictionary into memory.  Returns true if successful else false.
 */
bool load(const char* dictionary);

/**
 * Returns number of words in dictionary if loaded else 0 if not yet loaded.
 */
unsigned int size(void);

/**
 * Unloads dictionary from memory.  Returns true if successful else false.
 */
bool unload(void);

#endif // DICTIONARY_H


Makefile

/****************************************************************************
 * dictionary.h
 *
 * Computer Science 50
 * Problem Set 5
 *
 * Declares a dictionary's functionality.
 ***************************************************************************/

#ifndef DICTIONARY_H
#define DICTIONARY_H

#include <stdbool.h>

// maximum length for a word
// (e.g., pneumonoultramicroscopicsilicovolcanoconiosis)
#define LENGTH 45

/**
 * Returns true if word is in dictionary else false.
 */
bool check(const char* word);

/**
 * Loads dictionary into memory.  Returns true if successful else false.
 */
bool load(const char* dictionary);

/**
 * Returns number of words in dictionary if loaded else 0 if not yet loaded.
 */
unsigned int size(void);

/**
 * Unloads dictionary from memory.  Returns true if successful else false.
 */
bool unload(void);

#endif // DICTIONARY_H


And, here is MY code!

/****************************************************************************
 * dictionary.c
 *
 * Computer Science 50
 * Problem Set 5
 *
 * Implements a dictionary's functionality.
 ***************************************************************************/

#include <stdio.h>
#include "dictionary.h"
#include <string.h>
#include <stdlib.h>

#define MAXWORDS 26
#define DLENGTH 46

int count=0;
/**
 * Returns true if word is in dictionary else false.
 */
bool check(const char* word)
{
   char *words[LENGTH];
   int n =0;

   int j,lo=0,hi=n-1,mid;
   
   while(lo<=hi) 
   {
      mid=(lo+hi)/2; //printf("lo: %d  hi: %d  mid: %d\n",lo,hi,mid);getchar();
      j=strcmp(words[mid],word);
      if(j>0)
      { 
         hi=mid-1;
      } 
      else if(j<0)
      {
         lo=mid+1;
      }
      else
      {
         return true;
      }
   }
   return false;
}
/**
 * Loads dictionary into memory.  Returns true if successful else false.
 */
bool load(const char* dictionary)
{
 
    int input(FILE *fp,char *words[DLENGTH],int getData);

    char **words = NULL;
    char buff[BUFSIZ];  //BUFSIZ or BUFSIZE is a macro for your system - usually 256 or 512 char's in size. A "natural" buffer length, for your system.
 
    FILE *fp=fopen("test.txt","r");
   
    count=input(fp,words,0);   //just counting this time
    rewind(fp);              //going back to the start of the file
 
    //malloc the right number of words here
    words=malloc(count * sizeof(char *));
    for(int i=0;i<count;i++) 
    {
        words[i]=malloc(DLENGTH * sizeof(char));  //#define LENGTH  29
    }
    input(fp,words,1);   //now getting the words
  
    //all the other stuff, here (mostly calling some functions)
    
    printf("%s\n",buff);
 
    return 0;
}
    int input(FILE *fp, char *words[DLENGTH], int getData)
   { 
        int i=0;
        char buff[128];
        while((fgets(buff, BUFSIZ, fp)) != NULL) 
        {
             if(getData) 
             {
                  //remove the newline here
                  strcpy(words[i],buff);
             }
        ++i;
        }
        if(getData==1)
            return i;
        else
            return -1;
   }
 

/**
 * Returns number of words in dictionary if loaded else 0 if not yet loaded.
 */
unsigned int size(void)
{
//unsigned int count;
if (count > 1)
    return count;
else
    return 0;
}
/**
 * Unloads dictionary from memory.  Returns true if successful else false.
 */
bool unload(void)
{
    if(fclose(*fp)==0)
        return true;
    else
        return false;
}




When I try to compile it as it is now, I get this error:

jharvard@appliance (~/Dropbox/pset5): make
clang -ggdb -O0 -Qunused-arguments -std=c99 -Wall -Werror -c -o speller.o speller.c
clang -ggdb -O0 -Qunused-arguments -std=c99 -Wall -Werror -c -o dictionary.o dictionary.c
dictionary.c:114:16: error: use of undeclared identifier 'fp'
if(fclose(*fp)==0)
^
1 error generated.
make: *** [dictionary.o] Error 1

If I comment out that whole section of code (unloading from memory) it compiles successfully, but I get a seg fault when I try to run it.....

If you have made it this far, then I thank you for reading, and ask you to give me some hints/help for making the code work. Just to be clear, I am NOT asking you do 'do it for me' but to give me hints/help.

If there is anything I forgot to explain, or didn't properly explain, please let me know!

Thanks,
Josh

Is This A Good Question/Topic? 0
  • +

Replies To: program to spell check a file seg fault

#2 Skydiver  Icon User is offline

  • Code herder
  • member icon

Reputation: 3476
  • View blog
  • Posts: 10,721
  • Joined: 05-May 12

Re: program to spell check a file seg fault

Posted 01 April 2013 - 05:20 AM

I'll quote what jimblumberg wrote in that thread in the other forum since the advice still applies:

Quote

No, learn to properly pass the variables to and from your functions. And you should be calling close in the function where you opened the file, not is some function that doesn't even know if the file has already been closed.


Any way, you forgot to mention where you were getting a segmentation fault when you did comment out the code that was causing a compilation error. Is the callstack from gdb in the other thread still accurate or has it changed?
Was This Post Helpful? 0
  • +
  • -

#3 jimblumberg  Icon User is offline

  • member icon


Reputation: 3993
  • View blog
  • Posts: 12,322
  • Joined: 25-December 09

Re: program to spell check a file seg fault

Posted 01 April 2013 - 07:03 AM

Also look at your comments from your include file for unload:
/**
 * Unloads dictionary from memory.  Returns true if successful else false.
 */
bool unload(void);[


Closing the file doesn't do anything with memory. Where are you freeing the memory you malloced? I think this is what unload is to do. Not play with the file.

Also if you run your program thru your debugger you should find the exact spot where the compiler detects the problem.

Why are you calling fopen() in main() and load()? It really looks like load is the only function that should need file access and the file probably should only be opened in this function.


Jim
Was This Post Helpful? 0
  • +
  • -

#4 jimblumberg  Icon User is offline

  • member icon


Reputation: 3993
  • View blog
  • Posts: 12,322
  • Joined: 25-December 09

Re: program to spell check a file seg fault

Posted 01 April 2013 - 07:48 AM

Oh and since you're now asking this question in multiple places at once you'll probably find you get less and less useful answers not more, especially since the post you linked has over 100 posts to begin with.

Jim
Was This Post Helpful? 3
  • +
  • -

Page 1 of 1