Array: Checking for censored words and replacing words problem

  • (2 Pages)
  • +
  • 1
  • 2

25 Replies - 733 Views - Last Post: 24 September 2012 - 12:03 AM Rate Topic: -----

#16 Skydiver  Icon User is online

  • Code herder
  • member icon

Reputation: 2039
  • View blog
  • Posts: 6,072
  • Joined: 05-May 12

Re: Array: Checking for censored words and replacing words problem

Posted 19 September 2012 - 12:06 AM

As for fixing the two problems:

You should only process the input that you got. So stop processing the string when you get to the null terminator. For most of the English speaking world, this will make your program work without assertions about 90% of the time. (Actually, it should only be about 50% of the time since half of range: 128-255 has the interesting characters with the accents and the umlauts etc.)

The issue with the high bit being on will require a bit more work. I believe that is an effect of the compiler. Somebody correct me if wrong, but I believe that it is actually expected behavior for the compiler to sign extend a char that has the high bit on when the char is being converted to an int. isspace() actually takes an int as its parameter. A char with hex value of 0xCC becomes 0xFFFFFFCC when sign extended. So to fix this issue, you'll need to assign the char value to an integer and undo the sign extension, or prevent it all together. I prefer to do this by masking against hex 0xFF, but there are other techniques which maybe more efficient or clearer.
Was This Post Helpful? 1
  • +
  • -

#17 Skydiver  Icon User is online

  • Code herder
  • member icon

Reputation: 2039
  • View blog
  • Posts: 6,072
  • Joined: 05-May 12

Re: Array: Checking for censored words and replacing words problem

Posted 19 September 2012 - 12:15 AM

View Postrethc, on 19 September 2012 - 12:05 AM, said:

The error is caused by unterminated string when isspace and ispunct is called. The strings are being terminated before they are being used. To fix it, just do what i suggested in post #7.

Nope. You are just masking the issue by zero filling the output array. As aresh had pointed out, this is a waste of CPU cycles.

If this were really just a matter of not null terminating the string then as aresh pointed out lines 35-36 will null terminate the output array (most of the time). Since the string is null terminated but assert still happens, then clearly the issue is not because of lack of null termination.

If you really thinking this is an issue of null termination, I invite you to run this code:
    char test[] = "Test \xCC me.";
    for(int i = 0; test[i] != 0; i++)
        isspace(test[i]);



Notice that the assert fires with the value of i not even being close to the end of the string yet.
Was This Post Helpful? 0
  • +
  • -

#18 Charlwillia6  Icon User is offline

  • D.I.C Head

Reputation: -3
  • View blog
  • Posts: 58
  • Joined: 17-September 12

Re: Array: Checking for censored words and replacing words problem

Posted 19 September 2012 - 12:20 AM

View PostSkydiver, on 19 September 2012 - 12:06 AM, said:

As for fixing the two problems:

You should only process the input that you got. So stop processing the string when you get to the null terminator. For most of the English speaking world, this will make your program work without assertions about 90% of the time. (Actually, it should only be about 50% of the time since half of range: 128-255 has the interesting characters with the accents and the umlauts etc.)

The issue with the high bit being on will require a bit more work. I believe that is an effect of the compiler. Somebody correct me if wrong, but I believe that it is actually expected behavior for the compiler to sign extend a char that has the high bit on when the char is being converted to an int. isspace() actually takes an int as its parameter. A char with hex value of 0xCC becomes 0xFFFFFFCC when sign extended. So to fix this issue, you'll need to assign the char value to an integer and undo the sign extension, or prevent it all together. I prefer to do this by masking against hex 0xFF, but there are other techniques which maybe more efficient or clearer.


Umm..ok. Well this seems like a lot just for a homework program in the first two weeks of class. I guess I am going about this all wrong then, because 90% of what you are saying is over my head. I wouldn't think using the isspace function would require all this. So is there a different work around in the code I can do then to accomplish what I am trying to do with this program? Anybody?
Was This Post Helpful? 0
  • +
  • -

#19 Skydiver  Icon User is online

  • Code herder
  • member icon

Reputation: 2039
  • View blog
  • Posts: 6,072
  • Joined: 05-May 12

Re: Array: Checking for censored words and replacing words problem

Posted 19 September 2012 - 12:24 AM

Well instead of using isspace(), you could check each character against space, tab, carriage return, line feeds, and other ASCII characters that are classified as space characters.

But then you'll also need do something similar for ispunct(). You need to check against comma, period, semi-colon, colon, question mark, exclamation point, parenthesis, braces, brackets, tilde, quotation marks, dashes, hyphens, etc.
Was This Post Helpful? 0
  • +
  • -

#20 Charlwillia6  Icon User is offline

  • D.I.C Head

Reputation: -3
  • View blog
  • Posts: 58
  • Joined: 17-September 12

Re: Array: Checking for censored words and replacing words problem

Posted 19 September 2012 - 12:39 AM

View PostSkydiver, on 19 September 2012 - 12:24 AM, said:

Well instead of using isspace(), you could check each character against space, tab, carriage return, line feeds, and other ASCII characters that are classified as space characters.

But then you'll also need do something similar for ispunct(). You need to check against comma, period, semi-colon, colon, question mark, exclamation point, parenthesis, braces, brackets, tilde, quotation marks, dashes, hyphens, etc.


I figured that would be the case. Why would using the ctype.h library be "advised" then for this assignment if the possibility of isspace not working is an issue? If isspace is meant to check for whitespaces, you think it wouldn't be so limited as to just checking an array of characters? I really don't understand what is so complicated about my code that I can't just use isspace for this simple search. Yes I studied bits and bytes, and how they relate to int, long, double, and so on, but no, I have not learned how it refers to things like this, or hex, or any of that. So can you explain in beginner terms what the big issues is and why I can't use isspace? Is there something in beginner C++ that I can do to use isspace?

You have been very helpful so far, and I do appreciate your time, I just can't seem to grasp why this program is being so difficult and what I am missing that could make this a little easier for the concepts I have been taught.
Was This Post Helpful? 0
  • +
  • -

#21 rethc  Icon User is offline

  • D.I.C Head

Reputation: 12
  • View blog
  • Posts: 76
  • Joined: 23-April 12

Re: Array: Checking for censored words and replacing words problem

Posted 19 September 2012 - 12:40 AM

View PostSkydiver, on 19 September 2012 - 12:15 AM, said:

View Postrethc, on 19 September 2012 - 12:05 AM, said:

The error is caused by unterminated string when isspace and ispunct is called. The strings are being terminated before they are being used. To fix it, just do what i suggested in post #7.

Nope. You are just masking the issue by zero filling the output array. As aresh had pointed out, this is a waste of CPU cycles.

If this were really just a matter of not null terminating the string then as aresh pointed out lines 35-36 will null terminate the output array (most of the time). Since the string is null terminated but assert still happens, then clearly the issue is not because of lack of null termination.

If you really thinking this is an issue of null termination, I invite you to run this code:
    char test[] = "Test \xCC me.";
    for(int i = 0; test[i] != 0; i++)
        isspace(test[i]);



Notice that the assert fires with the value of i not even being close to the end of the string yet.


I'm just learning programming myself and remember from last semester that you should initialize C strings as empty before you use them. I just tried emptying the string first and the error went away so i thought it was the cause of the issue. Thanks for correcting me

This post has been edited by rethc: 19 September 2012 - 12:43 AM

Was This Post Helpful? 0
  • +
  • -

#22 Skydiver  Icon User is online

  • Code herder
  • member icon

Reputation: 2039
  • View blog
  • Posts: 6,072
  • Joined: 05-May 12

Re: Array: Checking for censored words and replacing words problem

Posted 19 September 2012 - 02:22 AM

View PostCharlwillia6, on 19 September 2012 - 12:39 AM, said:

Why would using the ctype.h library be "advised" then for this assignment if the possibility of isspace not working is an issue? If isspace is meant to check for whitespaces, you think it wouldn't be so limited as to just checking an array of characters?


One of the more modern adages in programming is don't reinvent the wheel. If there is something readily available in your standard library you should take advantage of it. (For some reason, though professors always want their students to reinvent complex number wrapper classes, linked lists, trees, hash tables, etc. but that's wandering off topic.)

There is nothing wrong with isspace() as it stands alone and as originally conceived. The traditional way a C program in the old Kernighan and Ritchie style was along the lines of:
int input;
char word[256];
while ((input = fgetc(file)) != EOF)
{
    if (isspace(input))
    {
        // do something with spaces
    }
    else if (isalpha(input))
    {
        // only at this point convert int to character
        word[len++] = input;
    }
    else if (isdigit(input))
    :
}



The various ctype macros/functions expect to take a integer as input. If I recall correctly, the various ctype helpers were also used to be implemented as macros rather than function calls. The way the helpers work efficiently is by looking up the value in a lookup table by using the value as the index into an array. The code correctly asserts that is shouldn't index into the array at a negative index and do a buffer under run (like the unfortunate code previously posted), nor should it index outside the range of the allocated lookup table and do a buffer over run.

(Additionally, C also used to assume that programmer is intimately aware of the datatypes he is playing, the functions and macros he is using, with and how he is passing them around to a macro or a function.)

This isn't quite how it is implemented, but view this as a primitive implementation of isspace() the old fashioned way:
#define isspace(_ch) lookupTable[_ch]

// ones in the lookup table indicate true
int lookupTable[256];

lookupTable['\t'] = 1;    // tab is a space
lookupTable['\r'] = 1;    // carriage return is a space
lookupTable['\n'] = 1;    // line feed is a space
:
lookupTable[' '] = 1;    // space is a space
:
lookupTable['0'] = 0;
lookupTable['1'] = 0;
:
lookupTable['A'] = 0;
lookepTable['B'] = 0;
:
lookupTable['a'] = 0;
lookepTable['b'] = 0;



So the way the isspace() macro works is that the parameter that you pass in is used directly is used to quickly look something up in the array. isspace(' ') will return non-zero, while isspace('A') will return 0. As you can obviously see, it would be bad to pass in an index that is less than 0 or more than 255 because that would give unpredictable results. There is an implicit assumption that the programmer will stay within the valid range.

With C++, the implementation changed to use functions rather than macros. So the same isspace() could be naively implemented as:
int isspace(int ch)
{
    // assume 0 <= ch <= 255
    assert(ch >= 0);     // report an error in debug mode if ch is less than zero 
    assert(ch <= 255);   // report an error in debug mode if ch is greater than 255
    return lookupTable[ch];
}

// ones in the lookup table indicate true
int lookupTable[256];

lookupTable['\t'] = 1;    // tab is a space
lookupTable['\r'] = 1;    // carriage return is a space
lookupTable['\n'] = 1;    // line feed is a space
:
lookupTable[' '] = 1;    // space is a space
:
lookupTable['0'] = 0;
lookupTable['1'] = 0;
:
lookupTable['A'] = 0;
lookepTable['B'] = 0;
:
lookupTable['a'] = 0;
lookepTable['b'] = 0;


Notice the change from a macro to a function, and within the body of the function, assert()'s that enforces the assumptions made by the code.

View PostCharlwillia6, on 19 September 2012 - 12:39 AM, said:

So can you explain in beginner terms what the big issues is and why I can't use isspace? Is there something in beginner C++ that I can do to use isspace?


Things only start getting weird if you have code that looks like this:
int input;
char ch;
char word[256];
while ((input = fgetc(file)) != EOF)
{
    // convert an int into a char
    ch = input;

    // In various checks below, the compiler convert char to int
    if (isspace(ch))
    {
        // do something with spaces
    }
    else if (isalpha(ch))
    {
        // do something with word letters
        word[len++] = ch;
    }
    else if (isdigit(ch))
    :
}



The reason for the weirdness is since ch is a char, but the parameter is an int, the compiler has to convert the char to an int. The typical rule is that if you have signed data type, like a char, the compiler should retain the sign of the value. So for example, -1, when stored as a char, is 0xFF in hexadecimal. When stored as an integer, it is 0xFFFFFFFF in hexadecimal.

So the issue with your code is that aresh adviced against zero filling your input and output arrays. So after you get your input, you copy the input in the output array only until the null terminator. You then start to process the output array but blow past the null terminator. Since arrays when first allocated will contain random data, eventually you stumble across a value that will be sign extended.

You can use isspace(). You just have to be aware that it taken an integer, and the integer value must range from 0 to 255.

How you go about ensuring your values are within that range is something that you should first try to explore and learn yourself. As I hinted before, one option is to use a bit mask, but it's not the only option.
Was This Post Helpful? 1
  • +
  • -

#23 JackOfAllTrades  Icon User is offline

  • Saucy!
  • member icon

Reputation: 5723
  • View blog
  • Posts: 22,637
  • Joined: 23-August 08

Re: Array: Checking for censored words and replacing words problem

Posted 19 September 2012 - 02:57 AM

Brutal honesty time.

For whatever reason, you did not learn what you needed to learn in your prerequisite class(es). You need to address this with your professor, so that you do not fail. It's early in the semester; chances are it's not going to get any easier as it goes on.
Was This Post Helpful? 1
  • +
  • -

#24 #define  Icon User is offline

  • Duke of Err
  • member icon

Reputation: 988
  • View blog
  • Posts: 3,448
  • Joined: 19-February 09

Re: Array: Checking for censored words and replacing words problem

Posted 19 September 2012 - 03:53 PM

Hi, when i is 0, Input[i-1] is out of bounds.

	for(i = 0; Input[i-1] != '\0'; i++)
		Output[i] = Input[i];





So slightly change the loop.

  for(i = 0; Input[i] != '\0'; i++)
  {
    Output[i] = Input[i];
  }
  /* add the null */
  Output[i] = Input[i];



That is the same as using the strcpy function.
The i variable now equals the length of the input string (same as strlen).


Instead of the second loop using the number 100 :-

  for(i = 0; i < 100; i++)
  {




the loop can stop at the end of the string, ideally shorter, so that the array is not checked beyond the bounds.

  /* check for words on output string */
  for(i = 0; Output[i] != '\0'; i++)
  {





Instead of finding a space and then checking the word before it. Find a space and check the word after it.

    if(isspace(Output[i]) || ispunct(Output[i]))
    {
      if(Output[i+1] == 'H' || Output[i+1] == 'h')
        if(Output[i+2] == 'e' && Output[i+3] == 'l' && Output[i+4] == 'l')
          if(isspace(Output[i+5]) || ispunct(Output[i+5]))
          {
            //cout << "i=" << i << endl;
            Output[i+3] = 'c';
            Output[i+4] = 'k';
          }
    }



This code doesn't check for words at the beginning or end of the array. That ability could be added.
Was This Post Helpful? 2
  • +
  • -

#25 Charlwillia6  Icon User is offline

  • D.I.C Head

Reputation: -3
  • View blog
  • Posts: 58
  • Joined: 17-September 12

Re: Array: Checking for censored words and replacing words problem

Posted 19 September 2012 - 07:38 PM

Thank you define#. I understand exactly what you are saying. I talked to my professor today, and he gave me a little bit more to add to my algorithm. When I work on the code, I will post it, and hopefully get it all figured out.
Was This Post Helpful? 0
  • +
  • -

#26 Charlwillia6  Icon User is offline

  • D.I.C Head

Reputation: -3
  • View blog
  • Posts: 58
  • Joined: 17-September 12

Re: Array: Checking for censored words and replacing words problem

Posted 24 September 2012 - 12:03 AM

I have finally finished the solution to this program. If anyone needs the solution, pm me. Thanks for all the help.
Was This Post Helpful? 0
  • +
  • -

  • (2 Pages)
  • +
  • 1
  • 2