Subscribe to The Occasional Programmer        RSS Feed
-----

Generating letter frequencies from standard input in C

Icon Leave Comment
This is a pretty frequently asked question here, so I thought I'd post some code, see if anyone finds it. It generates counts for upper- and lower-case and uses functions to gather and print the data. This could have gone into the tutorials, but I figure I needed to finally create a blog post.

To use it, either type in the data at the command line and use EOF (Ctrl-D on *nix systems, Ctrl-Z + Enter on Windows), or redirect a file to the standard input on the command line (./freq < freq.c).

#include <stdio.h>
#include <stdlib.h>

#define ALPHABET_LEN 26

// Populate the passed-in counts array from standard input.
void calculateCounts(unsigned int counts[])
{
    // BUFSIZ is declared in stdio.h and is system-dependent.
    // Initializing the array with = {0}, which is ONLY doable 
    // at array declaration time, ensures the buffer is in a
    // known state.
    char buf[BUFSIZ] = { 0 };

    // fgets returns NULL once we've reached the end of the file/input
    // It will read a single line, up to the new line (if the input is
    // shorter than BUFSIZ) or BUFSIZ characters.
    while (fgets(buf, sizeof(buf), stdin) != NULL)
    {
        unsigned int i = 0;
        char c;

        // We have a line to process in buf.
        // Go through each character, and if it's an alphabetic character
        // increment its count in the counts array.
        while ((c = buf[i++]) != '\0')
        {
            if (c >= 'A' && c <= 'Z')
            {
                // This is an upper-case character, with an ASCII code
                // between 65 ('A') and 90 ('Z'). These will come first 
                // in our counts array, so we will subtract the ASCII value of 
                // the first upper-case character, 'A' (65) from the character's
                // actual ASCII value to yield the index of that character's
                // count in the array; i.e., 0 through 25.
                counts[c - 'A']++;
            }
            else if (c >= 'a' && c <= 'z')
            {
                // This is a lower-case character, with an ASCII code 
                // between 97 ('a') and 122 ('z'). These follow the upper-case
                // character counts, so we will do as above, except using the 
                // ASCII character value of the first lower-case character, 'a'
                // and adding the ALPHABET_LEN to slot it after the upper-case
                // counts.
                counts[c - 'a' + ALPHABET_LEN]++;
            }
        }
    }
}

// Print the passed-in counts array. Notice that for const-correctness
// I pass this as a const array; we're not modifying the array within
// the function, making this the "right thing to do", to quote 
// Wilford Brimley.
void printCounts(const unsigned int counts[])
{
    unsigned int i = 0;
    char letter;
    for (; i < ALPHABET_LEN * 2; ++i)
    {
        // Now we need to use the index value to get the ASCII character
        // I would usually use the ternary operator for this within the
        // printf call, 
        // i.e., (i < ALPHABET_LEN ? i + 'A' : i % ALPHABET_LEN + 'a'),
        // but for clarification/explanation I've expanded 
        // the process into an actual if/else conditional.
        if (i < ALPHABET_LEN)
        { 
            // We're in the upper-case character count list here,
            // so add the value of the first upper-case character, 'A',
            // to the index to get the actual alphabetic character.
            letter = i + 'A';
        }
        else
        {
            // Here we're in the lower-case character count list.
            // So, to get the character we'll use the modulus operator (%) 
            // on the ALPHABET_LEN to get the zero-based index to the lower-case
            // list, then add the ASCII value of the first lower-case letter
            // to get the actual alphabetic character.
            letter = i % ALPHABET_LEN + 'a';
        }

        // Print the letter and the count based on 
        // the index into the count array.
        printf("%c: %d\n", letter, counts[i]);
    }
}

int main(void)
{
    // Initialize our letter counts to 0 
    // Initializing the array with = { 0 }, which is ONLY doable 
    // at array declaration time, ensures that all our counts are
    // 0 at the start of the program. If you do not do this, then
    // we cannot be sure that the array is zeroed-out, i.e., that
    // all the slots in the array are initially set to 0.

    // To accomplish the same thing AFTER the array has been declared, 
    // one must either use the memset function, 
    // i.e., memset(counts, 0, sizeof(counts)/sizeof(counts[0])), 
    // or loop through the array setting each value to 0 individually,
    // i.e., 
    //     unsigned int i = 0;
    //     for (; i < sizeof(counts)/sizeof(counts[0]); ++i) counts[i] = 0;
    // Note that using memset requires the inclusion of the string.h header.
    unsigned int counts[ALPHABET_LEN * 2] = { 0 };

    calculateCounts(counts);

    printCounts(counts);

    return 0;
}

0 Comments On This Entry

 

Trackbacks for this entry [ Trackback URL ]

There are no Trackbacks for this entry

November 2014

S M T W T F S
      1
2345678
9101112131415
161718192021 22
23242526272829
30      

Recent Entries

Recent Comments

Search My Blog

0 user(s) viewing

0 Guests
0 member(s)
0 anonymous member(s)