Monday 17 January 2011

The C Programming Language (K&R) 01x0F—Histogram of Words—Exercise 1-13


I’m well past chapter 4 in “The C Programming Language” by Brian Kernighan and Dennis Ritchie aka K&R. I making notes as I go and reread chapters and sections at least twice, but it’s a challenge to keep pace with this blog. I’m lagging behind. No surprise, but I will make it.


Exercise 1-13.

Write a program to print a histogram of the lengths of words in its input. It is easy to draw the histogram with the bars horizontal; a vertical orientation is more challenging.

This program will build off two programs I’ve already written. Entry 01x0C and 01x0E.

In order to count the length of words, you have to parse words. That will be done based on the code in 01x0C. The histogram will be based on 01x0E. In that one I grouped characters into broad categories. Word lengths can be grouped: 1 to 3; 4 to 6; 7 to 9; >10.

The Pseudocode.

While not EOF
get character from keyboard

if character is not whitespace, increment word length counter

if character is whitespace and previous character is not whitespace, end of word, add count to array otherwise ignore it

Display histogram

Somehow I will cut and paste code form the previous two projects to make this one.

*    *    *

A couple of issues.

First, I tried to initialize an array with this code:

     int WordLengthCount[15];

     // Initialize with zero.
     for (c = 0; c < 16; ++c)
           WordLengthCount[c] = 0;


When I ran the code, the program crashed. I hadn’t seen that behaviour before and wasn’t certain what was causing it. I added debug break points and found that the loop was causing the problem. But why? The index counter. It was trying to place a value in WordLengthCount[15] which is not in the array. It’s outside. Change the limit from 16 to 15 to make it work. I should have known better because I do know the indices of the array go from 0 to 14 for a count of 15.

Also, from the debugging environment, it was clear the array had garbage values. Assigning zero for each element was needed. I can’t find a short cut to initializing values. You can declare the array with a set of {0, 0, …, 0} but you have to get the count right and it’s east to be wrong with a long string of zeroes.

Second, instead of grouping the word length, I could use the word length as the index for the counter. It makes for easier coding. The only issue is to have the last category be for words with 14 or more characters and therefore must test the length before storing any values in the array otherwise it will probably crash.

Third, the code to display the histogram, uses a for loop inside a for loop. It’s easy to get the counters mixed up.

There is another way to write this program. Find each word and call a function to return its length and store that value. There’s usually more than one way to write code.


Sample Code.

I am using Visual C++ 2010 and created the sample code as a console application.

// Function prototypes
bool IsWhitespace(char);

// The standard library includes the system function.
#include <cstdlib>

// C++ standard I/O library.
#include <cstdio>

int main()
{
     // Character input.
     char c = 0;
     // Previous character.
     char prevChar = 0;
     // Word Length.
     int WordLength = 0;
     int WordLengthCount[15];

     // Initialize with zero.
     for (c = 0; c < 15; ++c)
           WordLengthCount[c] = 0;

     while ((c = getchar()) != EOF) {
           if (IsWhitespace(c))
                if (IsWhitespace(prevChar))
                     // Ignore extra whitespace.
                     ; // Null stmt.
                else {
                     // Prev char not whitespace.
                     // Means end of word.
                     // Store length in array.
                     if (WordLength < 14)
                           ++WordLengthCount[WordLength];
                     else
                           // Words 14 chars or longer.
                           ++WordLengthCount[14];
                     // Reset for new word.
                     WordLength = 0;
                }
           else {
                // Not whitespace, therefore not end of word.
                // Increment counter for word length.
                ++WordLength;
           }

           // Set prev char to current.
           prevChar = c;

     } // end while

     // Display histogram.
     printf("\n\nHISTOGRAM - Word length\n");

     // Counter.
     int i;

     // Loop through the 14 word length categories.
     for (c = 1; c < 15; ++c) {
           // Label
           printf("\n%3d  ", c);
           // No. of occurrences for this word length.
           for (i = 1; i <= WordLengthCount[c]; ++i)
                printf("*");
     }   
     printf("\n\n");

     // Keep console window open.
     system("pause");

     // Return some value.
     return 0;

} // end main

bool IsWhitespace(char t)
{
     // Return true for whitespace chars.

     // Space.
     if (t == 32)
     return true;

     // Newline.
     if (t == 10)
           return true;

     // Tab.
     if (t == 9)
           return true;

     // If we get here, no whitespace.
     return false;
} // IsWhitespace


Output.




No comments:

Post a Comment