Last Updated: 2021-03-14 Sun 22:01

CSCI 4061 HW08: mmap() / Basic Signals and Handlers

CODE DISTRIBUTION: hw08-code.zip

CHANGELOG: Empty

1 Rationale

Files are often stored in "binary format" for efficiency of storage and access. Rather than more familiar formatted text formats, these formats require use of binary file I/O to manipulate them, frequently low level Unix read() / write() calls. They also often require jumping to different positions in the file which can be done via the lseek() system call. These are explored in this HW.

A viable alternative to file I/O is to make use of memory mapped files through mmap(). This utilizes a system call to expose files as a pointer into operating system managed space which holds parts of the file in main memory. While equivalent in power to standard I/O, mmap() avoids the need for intermediate buffers and allows pointer arithmetic to be used to locate and alter the file.

Signals are one of the simplest forms of communication between processes. They are essential for management of running programs and can also be used for other purposes. This HW explores the C function kill() which sends signals and the signal handler setup function signal() which can allow signals to be caught and handled. It assumes basic familiarity with the shell commands kill and pkill which also send signals primarily to terminate misbehaving processes.

1.1 Associated Reading

Stevens/Rago Ch 3 covers basic I/O functions like read() / write() as well as lseek() in Ch 3.6. These functions work equally as well for text and binary data.

Stevens/Rago Ch 14 discusses advanced I/O techniques with Ch 14.8 covering mmap() for creating a memory mapped file. Optionally, Bryant and O'Hallaron's "Computer Systems: A Programmers Perspective" also has some coverage of mmap() in section 9.8.4. This textbook is mentioned as it is the required text for CSCI 2021, a prerequisite to CSCI 4061.

Ch 10 of Stevens/Rago discusses basics of signals and signal handlers.

1.2 Grading Policy

Credit for this HW is earned by taking the associate Quiz which is linked under Gradescope. The quiz will ask similar questions as those that are present in the QUESTIONS.txt file and those that complete all answers in QUESTIONS.txt should have no trouble with the quiz.

See the full policy in the syllabus.

2 Codepack

The codepack for the HW contains the following files:

File  
QUESTIONS.txt Questions to answer
binfiles-mmap/ Directory for Problems 1-2
Makefile Makefile to build Problem 2/3 programs
department.h Header file for programs
make_dept_directory.c Problem 1-2 program to create data file
cse_depts.dat.bk Backup of data file created in Problem 1-2
print_department_read.c Problem 1 program to analyze
print_department_mmap.c Problem 2 program to analyze
signals/ Problem 3 directory
circle_of_life.c Problem 3 code to analyze
birth_death.c Problem 3 code to analyze
no_interruptions.c File with signal handler for Problem 2

3 What to Understand

Ensure that you understand

  • How data in files can be directly read() into arrays and structs.
  • Use of the lseek() system call to move to a desired byte position in a file
  • Use of mmap() to create a memory mapped file for reading
  • How to send signals to other process with the kill() function
  • How a process can detect whether a child was signaled and if the signal was terminal
  • How processes can set up simple signal handlers and which signals cannot be handled.

4 Questions

                           _________________

                            HW 08 QUESTIONS
                           _________________


- Name: (FILL THIS in)
- NetID: (THE kauf0095 IN kauf0095@umn.edu)

Write your answers to the questions below directly in this text file.
HW quiz questions will be related to the questions in this file.


PROBLEM 1: Binary File Format w/ Read
=====================================

A
~

  Compile all programs in the directory `binfiles/' with the provided
  `Makefile'.  Run the command
  ,----
  | ./make_dept_directory cse_depts.dat
  `----

  to create the `cse_depts.dat' binary file. Examine the source code for
  this program along with the header `department.h'.
  - What system calls are used in `make_dept_directory.c' to create this
    file?
  - How is the `sizeof()' operator used to simplify some of the
    computations in `make_dept_directory.c'?
  - What data is in `cse_depts.dat' and how is it ordered?


B
~

  Run the `print_department_read' program which takes a binary data file
  and a department code to print.  Show a few examples of running this
  program with the valid command line arguments. Include in your demo
  runs that
  - Use the `cse_depts.dat' with known and unknown department codes
  - Use a file other than `cse_depts.dat'


C
~

  Study the source code for `print_department_read' and describe how it
  initially prints the table of offsets shown below.
  ,----
  | Dept Name: CS Offset: 104
  | Dept Name: EE Offset: 2152
  | Dept Name: IT Offset: 3688
  `----
  What specific sequence of calls leads to this information?


D
~

  What system call is used to skip immediately to the location in the
  file where desired contacts are located? What arguments does this
  system call take? Consult the manual entry for this function to find
  out how else it can be used.


PROBLEM 2: mmap() and binary files
==================================

  An alternative to using standard I/O functions is "memory mapped"
  files through the system call `mmap()'. The program
  `print_department_mmap.c' provides the functionality as the previous
  `print_department_read.c' but uses a different mechanism.


(A)
~~~

  Early in `print_department_mmap.c' an `open()' call is used as in the
  previous program but it is followed shortly by a call to `mmap()' in
  the lines
  ,----
  |   char *file_bytes =
  |     mmap(NULL, size, PROT_READ, MAP_SHARED,
  |          fd, 0);
  `----
  Look up reference documentation on `mmap()' and describe some of the
  arguments to it including the `NULL' and `size' arguments. Also
  describe its return value.


(B)
~~~

  The initial setup of the program uses `mmap()' to assign a pointer to
  variable `char *file_bytes'.  This pointer will refer directly to the
  bytes of the binary file.

  Examine the lines
  ,----
  |   ////////////////////////////////////////////////////////////////////////////////
  |   // CHECK the file_header_t struct for integrity, size of department array
  |   file_header_t *header = (file_header_t *) file_bytes; // binary header struct is first thing in the file
  `----

  Explain what is happening here: what value will the variable `header'
  get and how is it used in subsequent lines.


(C)
~~~

  After finishing with the file header, the next section of the program
  begins with the following.
  ,----
  |   ////////////////////////////////////////////////////////////////////////////////
  |   // SEARCH the array of department offsets for the department named
  |   // on the command line
  | 
  |   dept_offset_t *offsets =           // after file header, array of dept_offset_t structures
  |     (dept_offset_t *) (file_bytes + sizeof(file_header_t));
  | 
  `----

  Explain what value the `offsets_arr' variable is assigned and how it
  is used in the remainder of the SEARCH section.


(D)
~~~

  The final phase of the program begins below
  ,----
  |   ////////////////////////////////////////////////////////////////////////////////
  |   // PRINT out all personnel in the specified department
  |   ...
  |   contact_t *dept_contacts = (contact_t *) (file_bytes + offset);
  `----
  Describe what value `dept_contacts' is assigned and how the final
  phase uses it.


PROBLEM 3: `birth_death.c'
==========================

A
~

  Compile `circle_of_life.c' to the program `circle_of_life' and run
  it. Examine the results and feel free to terminate execution
  early. Examine the source code if desired though it is merely a
  print/sleep loop.

  Compile `birth_death.c' to the program `birth_death'. This program is
  invoked with two arguments, another program name and a "lifetime"
  which is an integer number of seconds. Run it like
  ,----
  | $> ./birth_death ./circle_of_life 4
  `----

  and show the output below.


B
~

  Examine the source code for `birth_death.c' and determine the system
  call the parent program (`birth_death') uses to send signals to the
  child program. Paste this line below and explain which signal is being
  sent.


C
~

  `birth_death.c' waits for a child to finish then outputs what signal
  caused it to be terminated if that was the cause of death. Paste the
  lines of code which determine if a child was terminated due to a
  signal below and mention the macros used for this purpose.


D
~

  Compile the program `no_interruptions.c' and run it with
  `birth_death'. Show your results below.

  Note that you may need to send signals to `no_interruptions' to
  forcibly end it. The `pkill' command is useful for this as in
  ,----
  | pkill no_inter        # send TERM signal to proc name matching "no_inter"
  | pkill -KILL no_inter  # send KILL signal to proc name matching "no_inter"
  `----


E
~

  Examine the `no_interruptions.c' code and describe how it is able to
  avoid being killed when receiving the interrupt and TERM signals. Show
  the lines of code used to accomplish this signal handling.

Author: Chris Kauffman (kauffman@umn.edu)
Date: 2021-03-14 Sun 22:01