OpSys Fall 2006 - HW1

HW1 - Unix C Programming, readline , regex , environment variables, fork() and exec()

Due Date: Wed, 9/20 by 11:59PM

Submit to WebCT drop box labeled HW1

- Unix C mini-shell.

The objectives of this assignment are:

  1. Make sure everyone can create, compile and run C programs on the CS lab freebsd machines.
  2. Understand how to use libraries such as the readline and regex.
  3. Understand what an Environment Variable is, and how to create, change and access them from a C program.
  4. Understand fork() and exec.

You are to write a program that interacts with the user via a command prompt (your program prompts for a command, reads a command as a line of text and prints out any results). The actual commands you need to support are rather simple:

Below is a sample session. The output of the program is shown in blue, the black text was typed by a human user.

> ./hw1
prompt> hello fred
Invalid command
prompt> fred=blah
prompt> print blah
blah has no value...
prompt> print fred
fred = blah
prompt> opsys =   true
prompt> print opsys
opsys = true
prompt> lookup ookk
bookkeeper
bookkeepers
bookkeeper's
bookkeeping
bookkeeping's

prompt> print PATH
PATH = /usr/local/bin:/usr/bin:/usr/X11R6/bin:/bin:/usr/sbin:/sbin:
prompt> fred = hi there joe
Invalid command
prompt>

You can include a quit command if you want, or from Unix you can simply hit ^D (Ctrl-d) to indicate EOF to the program (and readline will return NULL).

IMPORTANT!: You must be able to print out the value of any existing environment variable, not just the new ones created by the user. Some environment variables you can expect to already have values include: PATH, HOME, PWD. From the Unix shell you can use the command "set" to print out the value of all your environment variables.

- The readline library

Your program must use the GNU readline library to get input from the user. The readline library provides functions that make it possible for the user to scroll back through previous commands, search through previous commands, edit previous commands, etc (readline is used by many programs including the bash shell to handle user input, this is why you can hit up-arrow to recall the previous command entered.).

You can get the details of the functions provided by the readline library by issuing the command "man readline" at the unix prompt. If you don't know how to use the "man" command - try "man man".

Basically the readline library provides a function named (oddly enough): readline() that will read input from standard input and allow the user to poke through any history that has been given to the readline library. readline() returns char * pointing to the user input, or a NULL pointer indicating that it has found EOF. The string returned by readline is null terminated and has been allocated from the heap - this means you need to free this memory when you are done with it.

To use the readline library you must tell the linker to include the readline library, this means you need to add -lreadline to your compile line (see below for an example). On the CS FreeBSD machines you also need to tell the compiler where to find the readline library, so you need to add -L/usr/local/lib to the compile line.

The code shown below is a simple example of using the readline library, including the insertion of each line entered into the readline history. This program doesn't do anything with each line it gets, it just shows how to use readline(). This code is also available here: simprl.c.

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <readline/readline.h>
#include <readline/history.h>

/* Simple example of using gnu readline to get lines of input from
   a user. Needs to be linked with -lreadline

  add_history tells the readline library to add the line to it's
  internal histiry, so that using up-arrow (or ^p) will allows the user
  to see/edit previous lines.
*/

int main(int argc, char **argv) {
  char *s;

  while (s=readline("prompt> ")) {
	add_history(s);   /* adds the line to the readline history buffer */
	free(s);          /* clean up! */
  }
  return(0);
}

How to compile and test this code:

  • To build an executable on a CS FreeBSD machine:

    gcc -Wall -o simprl  -L/usr/local/lib simprl.c -lreadline

    on many other OSs (including Linux) you don't need the -I or -L options (readline is installed in the usual places so the compiler doesn't need to be told anything special):

    gcc -Wall -o simprl simprl.c -lreadline 
  • To run the program:
    ./simprl
    type in lines of text, verify that you can scroll through previous lines with the arrow keys (or ^P and ^N). Hit ^D on a line by itself to quit.

- Environment Variables

Every Unix process includes an array of strings called the 'environment'. Each of these strings is of the form "name=value" (this is by convention, there is no requirement that every string in the environment be of this form). There are various functions available for accessing these strings by the "name" contained in each one. These name/value pairs are referred to as environment variables. Below is a list of some of the functions you can use to access/manipulate these environment variables:

  • setenv,putenv: change/set the value of a variable given the name.
  • getsetenv: get the value of an environment variable given the name.
  • unsetenv: remove an environment variable.

There is also a global variable named environ that can be used to access the environment strings.

For the details on any of the above, use the unix man command ("man environ" or "man setenv", ...)

- Regular Expressions and the POSIX regex library

Part of this assignment also involves using regular expressions to parse the user input. The idea is to get exposure to a non-trivial library (as well as to learn how to use some very useful functions!).

Regular expressions are used by Unix in lots of places. There are regular expressions used at the command line to match file names (something like ls *.c is a simple example), many Unix commands are based on regular expressions (commands like sed, grep, awk, perl and many more).

There are a number of flavors of regular expressions, that is, different languages for specifying complex patterns to be matched. The purpose of this assignment is not to force you to learn all about the languages used to express regular expressions, but rather to use them. I am providing sample code that can be used to parse something very similar to the more complex command required for your homework (the set command you need to support), it is expected that you modify this to handle your set command and can come up with regular expressions to handle the other commands based on the example I've provided. Note that there is lots of information on the web about regular expressions that can help you understand the language of regular expressions.

The sample code below uses the POSIX regular expression handling functions (there are others, including the GNU functions and BSD functions - feel free to use any that you find useful). The code includes lots of comments to get you started, the idea is for the assignment you need to customize this code to handle parsing of commands entered by the user.

NOTE: For this assignment is it certainly not difficult to parse the user input using more traditional means, and we are certainly not claiming that using regular expressions is the best way (in fact this is a rather "heavyweight" approach!). You are required to use regular expressions simply to expose you to them and to get you used to figuring out how/where to get information about C libraries.

You can get lots of information about the regular expression handling functions by looking at the following man pages: regex and re_format.

The sample program shown below uses a regular expression to parse argv[1], the first command line parameter specified by the user when the code is run. The regular expression used will match any string that looks like: "set var = value", where var is completely alphabetic characters, value is alphanumeric characters and the whitespace can be missing or any number of spaces. If the program finds that the string entered as argv[1] is matched by the regular expression, it prints out the variable name and value (extracted by the regular expression). This code is also available here: testregex.c, you can build this program with the following command line:

gcc -Wall -o testregex testregex.c

and then test it by running ./testregex. Note that the regular expression functions used in the code below are part of standard libraries, so you don't need to do anything special to tell the compiler you want to include them. Dave will be going over this code in some detail during class.

#include <stdio.h>
#include <stdlib.h>	     /* exit() */
#include <string.h>	     /* for strncpy() */
#include <sys/types.h>   /* needed by regex */
#include <regex.h>       /* regular expression library */

/* Sample of using POSIX regular expression library.
   This attempts to match a regular expression to the first command line argument.
*/

/* this function extracts the part of a string that was matched by a
regular expression as indicated by the regmatch_t argument. New memory
is allocated for a copy of the matched substring and the new copy is
null terminated. This function returns NULL if the regmatch_t
indicates that no match was made */

char * get_match(regmatch_t m,const char *input) {
  char *match=NULL;
  int len;
  /* if no match specified, return NULL */
  if (m.rm_so==-1) {
	return(NULL);
  }

  /* len is the length of the substring that was matched */

  len = m.rm_eo-m.rm_so; 
 
  /* allocate enough memory for a copy of the resulting substring */
  match = (char *) malloc(len + 1);
  if (match==NULL) {
	fprintf(stderr,"Error allocating memory in get_match\n");
	exit(1);
  }

  /* copy the substring */
  strncpy(match,input+m.rm_so,len);

  /* null terminate the copy of the substring! */
  match[len]=0;
  return(match);
}



/* Example of using regular expression library from C.
   For details on  the regular expression library, try
   "man regex"
*/

int main(int argc, char **argv) {
  char *s;
  int i;

  /* here the sample regular expression is defined. For more information
     about POSIX regular expressions you can use "man 7 regex" for a 
     complete description, or google for POSIX regex and get more than
     you want... 

     This regular expression will match any string that looks roughly like
     this:  "name = value", where name can be anything containing alphabetic
     characters and value can be alphanumeric. There can be any number of spaces
     between the name and the '=', and between the '=' and the value. There must
     be nothing else in the string (or it won't match!). Here are some 
     strings that will match: "PROMPT = Hello"   "Count    = 22" "fred=1234joe"
     strings that won't match: " noleadingspace = allowed" "123=456"

     Here is the breakdown if this regular expression:
         ^  matches the beginning of the string. This simply forces the
            next part of the regular expression to match the first character
            (otherwise there could be anything before the first alphabetic char).

         [[:alpha:]]+  this matches any sequence of alphabetic characters. 
         the [[:alpha:]] actually says match one alphabetic character, and the +
         means match at least one.

         The [[:space:]]* means match any sequence of 0 or more spaces (whitespace).
             the * actually means "0 or more".

        The = matches '=' (only one).

         [[:alnum:]]+  this matches any sequence of alphanumeric characters. 
            (+ means one or more).

        The $ matches the end of the string. This means the string must end in
        something that matches the [[:alnum:]]+ right before the $.

        The parentheses are special, they don't actually match any characters 
        in the string, instead they tell the regular expression to "remember" the
        part of the string that matched the part of the regular expression that is
        in parentheses. This is actually the main reason we are using the 
        regular expression, we want to know what part of the string matches each
        parenthesized section of the regular expression. The first parenthesized
         part will be the "name" and the second will be the "value" in "name = value".

		We also use the regular expression to find out if the entire string is of
        the right form (if not then there will be no matches - we can say the string
         is not legal).

  */

  const char *regular_expression = "^([[:alpha:]]+)[[:space:]]*=[[:space:]]*([[:alnum:]]+)$";

  regex_t pattbuf;		/* where the 'compiled' regular expression is stored */

  regmatch_t matches[10];   /* where we will get the offsets of all matches */

  /* make sure we got a command line argument! */
  if (argc<2) {
	printf("You must supply and argument (the string to be matched).\n");
	printf("For example: %s \"path = hello123\"\n",argv[0]);
	exit(1);
  }

  /* compile the regular expression (POSIX extended regular expression syntax */ 
  if (regcomp(&pattbuf, regular_expression,REG_EXTENDED)) {
	/* some problem with the regular expression - this is fatal... */
	fprintf(stderr,"Error - pattern won't compile\n");
	exit(1);
  }


  if (REG_NOMATCH == regexec(&pattbuf,argv[1],10,matches,0)) {
	printf("No match found - illegal input\n");
  } else {
	/* some matches found - print them out */
	/* first match is for the whole string, we don't care about that one!
       remaining matches are for the parts of the regular expression
       that are in parentheses */
	i=1;
	while (s = get_match(matches[i],argv[1])) {
	  printf("Match %d: <%s>\n",i,s);
	  free(s);
	  i++;
	}
  }

  /* free up the compiled regular expression */
  regfree(&pattbuf);

  return(0);
}

- The lookup command

Your lookup command should generate a list of all the lines from the file /usr/share/dict/words that contain the string specified by the user. Your program must use the Unix grep command to do this, and it must run grep like this (assume the command you read from the user is "lookup foo"):

/usr/bin/grep -i foo /usr/share/dict/words

You need to have your program call fork() to create a new process that can be used to run the grep command, and then have the child process call exec (to do this you need to build an argument list to pass to exec()). You can use any of the exec functions.

- Project Requirements

The following are the requirements for the project:

  • Your program must compile and run on the CS department FreeBSD machines (freebsd.remote.cs.rpi.edu).

  • Your program must use the readline library to get input from the user. The user must be able to scroll back through the previous commands they have entered (since the program was started - no persistence between runs is expected).

  • Your program must correctly set and display the value of environment variables, including those inherited by your program from the shell (variables like PATH and HOME).

  • Your program must correctly run the lookup command as described above (you must fork and exec!). Do not read /usr/share/dict/words yourself!.

  • Your submission must include a text file named README that includes the following information:

    • Your name
    • A 1 line description of each file you are submitting
    • Instructions on how to build your program (Makefile preferred, this will be required for future assignments).
    • A description of any problems you had, known bugs or deficiencies.
    • Anything else you want us to know.
  • For this assignment, you are not required to include a Makefile that can be used to build your submission. However, you probably should consider this (check the HW1 FAQ for information about Make and a sample Makefile), as all subsequent assignments will require a Makefile!

Grading: Grades will be based on the formula below. Note that to get full credit we must be able to understand your code (it must be commented!)

30%Proper handling of environment variables (can be set and printed)
10%Use readline properly
20%Correct lookup command
20%Use regular expressions to do at least some parsing (you don't have to use regular expressions for everything, just prove you can use them to do something useful.)
20%Code quality (comments, organization, how hard is it to understand ?).

You can get partial credit for any part (for example if you don't get all the commands working properly).

If your code does not compile and run under FreeBSD on the CS machines, you will lose at least 50% (the remaining 50% partial credit will be awarded based on visual inspection of the code).

- How to Submit

Log in to WebCT at webct.rpi.edu using your RCS id and password. Once you get to MyWebCT click on "Operating Systems", and from there go to the homework drop boxes. Submit your files (individually, zipped or tarred) to the drop box labeled HW1

-Resources