Friday 14 January 2011

The C Programming Language (K&R) 01x0B—Word Counting (p. 21)

So far, as I read the book “The C Programming Language” by Brian Kernighan and Dennis Ritchie aka K&R, I have seen some pseudo code, but no flowcharts. (Should pseudo code be one word or two or hyphenated? Answer: it doesn’t matter. But flowchart is one word, don’t ask me why.)

If you don’t know what pseudo-code is, well, I think the name says it all. It’s sort-of-like code, but not a computer language. You can’t entered pseudocode into a compiler and expect to get some type of machine executable file. Although some are trying.

(Did you see that? I got in all three versions. The key is to be understood and consistent. Anyone who thinks one of the three is right and the others wrong, needs to take a pill.)

So why write pseudo code?

Development.

Software is like, no, is a product. It has a development cycle and a life cycle. One leads to the other. The development process is about bringing the product to market while the later deals with its life in the marketplace—even if there are no commercial transactions.

Pseudocode fits into the development cycle. You start with a goal. What is the goal of this code? Be clear on the objective then create a solution to reach it. If you’re not clear on the objective, the rest will be wasted time.

After you’re defined what it is you want the code to do, you write pseudocode. Well, I think it’s a good idea to write pseudocode. It gets your mind focused on structuring the code to the solve the problem. As you go along, you may discover a new angle in which case it’s easier to rewrite the pseudocode than the actual code. Also, if you write your pseudocode down, it becomes a point of reference for the future. You can read it and know what it is this chunk of code is supposed to do. Given time, you won’t remember.

So while it may seem an unnecessary step because it uses up time, it can actually save time.

There’s pseudocode and there are flowcharts. We used to have plastic stencils to create these elaborate flowcharts about what was going on. I don’t find them useful and they weren’t a thing of beauty.  I don’t see them used anymore and I say good riddance.


K&R, p. 21 Word Counting

The next code in the K&R is designed to count words. It also counts the number of characters and lines entered. Those tasks have been dealt with and are easy. Words? Different matter.

Start with: what is a word?

It’s a generic term and computers need specifics. A word, here, is a string or sequence of characters separated by whitespace. That is: a space, tab or newline. In ASCII: 32, 9, 10 or the escape sequences are ‘ ’, ‘\t’, ‘\n’.

But how do you know when a new word started? If the previous character is whitespace and the current character is not whitespace, you have the start of a new word so increment the word counter. The K&R code does that.

How did them implement it? They use, effectively, of a Boolean of either “IN a word” or OUT. That can be seen as “what’s the previous character?” IN meaning, not whitespace and OUT is whitespace.

If the current character is a whitespace (if (c == ' ' || c == '\n' || c = '\t')), don’t increment the word counter and keep the state as OUT (i.e., previous character is whitespace).

If it’s not whitespace, increment the counter if the previous character is whitespace (i.e., state is OUT).

I prefer my approach because it makes it clear what the “state” is, but it’s about the code and it’s not about the code.

Here is their code.

#include <stdio.h>

#define IN 1 /* inside a word */
#define OUT 0 /* outside a word */

/* count lines, words, and characters in input */
main()
{
int c, nl, nw, nc, state;

state = OUT;
nl = nw = nc = 0;

while ((c = getchar()) != EOF) {
++nc;
if (c == '\n')
++nl;

if (c == ' ' || c == '\n' || c = '\t')
state = OUT;
else if (state == OUT) {
state = IN;
++nw;
}
}

printf("%d %d %d\n", nl, nw, nc);
}

But why all these codes to make sense of characters a user inputs. Why bother?
 
In most modern GUI, event driven apps, it doesn’t matter. A user enters a number in a box. The box is designed to only accept numbers. It has become automated, but this book was written in 1988 when GUI was not commonplace. Programs were started by typing the name of the program on a command line followed by some arguments such as:

>MYPROGRAM /f:Filename

Given such an environment, it was the programmers responsibility to parse the arguments passed to the program and act on them. Beyond that, compilers have to make sense of the source code. In both instances, a programmer had to write code to parse the data. Break it up into chunks the computer could know and act upon.

Compilers still have to parse the source code, but the internet created new parsing objectives. An HTML file is a mixture of code, markups and text. That requires parsing. Google’s search alogrithms require parsing. It goes from there.
 

No comments:

Post a Comment