Last Updated: 2023-01-31 Tue 09:58

CSCI 2021 HW03: Basic I/O + Binary Integers

CODE DISTRIBUTION: hw03-code.zip

  • Download the code distribution
  • See further setup instructions below

CHANGELOG:

1 Rationale

Reading data from files is an essential skill in any programming environment and can be a bit trickier in C due to its reliance on manual memory management via malloc(). This HW discusses a standard two-pass input strategy for dealing with such situations.

Modern computers represent all data using bits, collections of 1's and 0's. Integer data pervades many applications so understanding the bit-level representation of integers lends insight into their strengths and limitations. The two's complement representation for integers is by far the most common used by hardware and is therefore considered specially in this HW. C provides facilities to manipulate data at the bit level using logical and shift operations which are also discussed.

Associated Reading / Preparation

Most C references will contain information on C's standard I/O libraries which discuss functions like fopen(), fscanf(), rewind() and fclose().

Bryant and O'Hallaron: Ch 2.1-2.3 on binary representations of integers in unsigned and two's complement signed format. This page also includes a summary of the Two's Complement system.

Grading Policy

Credit for this HW is earned by taking the associated HW Quiz which is linked under Gradescope. The quiz will ask similar questions as those that are present in the QUESTIONS.txt file and those that complete all answers in QUESTIONS.txt should have no trouble with the quiz.

Homework and Quizzes are open resource/open collaboration. You must submit your own work but you may freely discuss HW topics with other members of the class.

See the full policies in the course syllabus.

2 Codepack

The codepack for the HW contains the following files:

File Description
QUESTIONS.txt Questions to answer
twopass.c Code for Problem 1
nums1.txt / nums2.txt Data files for Problem 2
convert.c C file for Problem 4
twos_complement.c Simple demonstration of C code for twos complement

3 Two's Complement Arithmetic

Understanding two's complement is necessary to complete a problem QUESTIONS.txt. The short presentation below gives the basics of two's complement but you should study the material presented in our textbook and lecture to supplement it.

The principles of two's complement representation for integers are as follows.

  • Fix a number of bits to represent integers such as 8 bits or 32 bits
  • Zero is the bit string of all 0s. In 8 bits this is

    0000 0000
    
  • Positive numbers always start with a 0. In 8 bits +91 would be

    0101 1011
    
  • Negative numbers always start with a 1. In 8 bits, the following is a negative number

    1010 0101
    
  • To determine the value of negative number in two's complement or negate it:

    1. Invert the bits: ones become zeros, zeros to ones. The C operator invert is the tilde ~ symbol.
    2. Add 1

    This process on the above negative number is as follows.

      ~ 1010 0101 = negative number
      -----------
        0101 1010 = +90
      +         1
      -----------
        0101 1011 = +91
    

    So the above negative number is -91 as 91 is its twos complement.

  • The same process works to convert from positive to negative.

      ~ 0101 1011 = +91
      -----------
      ~ 1010 0100 = negative number
      +         1
      -----------
      ~ 1010 0101 = -91
    

    So the above negative number is -91 as 91 is its twos complement.

  • C's unary negation operator performs these two steps of inverting and adding 1:

    int p = 91;  // 0101 1011
    int n = -p;  // 1010 0101
    
  • The context for interpreting bits matters. When interpreting bits as an unsigned quantity, each bit represents a power of 2 that should be added.

      unsigned char p = 0b10100101;
      //   1   0    1   0   0   1   0   1
      // 128 + 0 + 32 + 0 + 0 + 4 + 0 + 1
      // = 165
    
  • In contrast, the same bits when interpreted in a signed context are a negative quantity. In the 2's complement representation, the highest order bit is associated with a negative power of two.

      char p = 0b10100101;  // signed quantity, negative due to leading 1
      //    1   0    1   0   0   1   0   1
      // -128 + 0 + 32 + 0 + 0 + 4 + 0 + 1
      // = -91
    
  • Due to 0 starting with a 0, there is one more bit string available for negative numbers than for positive numbers. This explains the asymmetric range of C data types. 8-bit characters have a range of -128 to +127 which are the following bit strings

      +127 = 0111 1111
      -128 = 1000 0000
    
  • Arithmetic in twos complement has some nice properties such as going from 0 to -1:

        0000 0000 = zero
      + 1111 1111 = -1
      -----------
        1111 1111 = -1
    

    More details of arithmetic will be discussed later.

  • For wider bit widths than 8, extend the sign bit for numbers left as in the following

     8-bit +91: 0101 1011
    16-bit +91: 0000 0000 0101 1011
     8-bit -91: 1010 0101
    16-bit -91: 1111 1111 1010 0101
    

4 Octal and Hexadecimal Representation of Numbers

The base-10 numbering system is taught rigorously from an early age and is familiar to most students. We have seen in the earlier Two's Complement system that numbers can also written in base-2 as Binary. This is indicative of numbers being independent of any particular notation or base. Binary is frequently used in computer science but is cumbersome as it takes a LOT of digits to write a number. When bits must be written, often they are written in a base-8 Octal format or base-16 Hexadecimal format. This is in part due to their compactness: the same bits are shown with fewer printed characters. Here is a short example.

|---------+--------------+----------------+----------------+-----------------+-----------------------|
| Decimal | Byte = 8bits | Byte by 4      | Hexadecimal    | Byte by 3       | Octal                 |
|---------+--------------+----------------+----------------+-----------------+-----------------------|
|      87 |     01010111 | bin: 0101 0111 | 57 = 5*16 + 7  | bin: 01 010 111 | 127 = 1*8^2 + 2*8 + 7 |
|         |              | hex: 5    7    | hex  dec       | oct: 1  2   7   | oct   dec             |
|         |              |                |                |                 |                       |
|      60 |     00111100 | bin: 0011 1100 | 3C = 3*16 + 12 | bin: 00 111 100 | 074 = 0*8^2 + 7*8 + 4 |
|         |              | hex: 3    C=12 | hex  dec       | oct: 0  7   4   | oct   dec             |
|         |              |                |                |                 |                       |
|     226 |     11100010 | bin: 1110 0010 | E2 = 14*16 + 2 | bin: 11 100 010 | 342 = 3*8^2 + 4*8 + 2 |
|         |              | hex: E=14 2    | hex  dec       | oct: 3  4   2   | oct   dec             |
|---------+--------------+----------------+----------------+-----------------+-----------------------|

Several items are of note:

  • The same number is written in 4 forms, Decimal, Binary, Hexadecimal, and Octal
  • A single Hexadecimal digit ranges from decimal 0-15. Values larger than 9 are written as letters: A=10, B=11, C=12, D=13, E=14, F=15
  • Converting between Binary and Hexadecimal is easiest when grouping bits by 4: each 4 bits corresponds to one hexadecimal digit
  • Octal digits ranger from 0-7
  • Converting between Binary and Octal is easiest when grouping bits by 3: each 3 bits corresponds to one octal digit

5 What to Understand

Ensure that you understand

  • Basics of reading data from files and allocating space for it
  • Signed vs unsigned representations
  • Two's complement representation of integers
  • Conversions between binary, decimal, octal, and hexadecimal notation

6 Questions

Analyze the files in the provided codepack and answer the questions given in QUESTIONS.txt.

                           _________________

                            HW 03 QUESTIONS
                           _________________


Write your answers to the questions below directly in this text file to
prepare for the associated quiz. Credit for the HW is earned by
completing the associated online quiz on Gradescope.


PROBLEM 1: Two-Pass Input and malloc()
======================================

  One frequently encounters the following situation when reading files
  - The file contains some data that is to be read into a dynamically
    allocated array
  - There is no indication of how much data is in the file

  C programs commonly use a *two-pass* input strategy to handle this
  situation
  1. Read through the file once to count items in the file, the allocate
     memory for an array of that size
  2. Back up to the beginning of the file and read data from the file
     into the array

  The provided program `twopass.c' demonstrates this technique along
  with several other input / formatting techniques.


A
~

  Compile and run the `twopass.c' program on the two provided text files
  `nums1.txt' and `nums2.txt'.  Paste the output the program produces
  below.


B
~

  Examine the function `read_all_doubles()' in `twopass.c' and describe
  which lines/blocks of code carry out the following actions. Use the
  provided comments in the file as a guide.

  - Opens the file for reading and checks for errors
  - Counts all the doubles in the file
  - Allocates memory for doubles in the file
  - Moves the file read position to the beginning of the file
  - Closes the file when reading is complete


C
~

  In your answer to the previous problem, you should have identified a
  use of `malloc()' in `read_all_doubles()'.  Where is this memory
  `free()''d and why? /Note: This question is similar to a lecture
  exercise on when to free() memory that has been malloc()'d./


D
~

  Examine how the doubles read from file are printed in the `main()'
  function. Discuss below the format specifiers used by `printf()' and
  the width/precision modifiers that are used to get a "table-like"
  appearance.  You may wish to consult the Manual page on `printf()'
  such as the one here:

  <https://man7.org/linux/man-pages/man3/printf.3.html>


Optional Enrichment: Reading without Storing
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  One minor irritation in `read_all_doubles()' is the following code:
  ,----
  |   double tmp;                              // temporary space to read one double
  |   ...;
  |   while(1){                                // first input pass: indefinite loop to count doubles
  |     int ret = fscanf(fin, "%lf", &tmp);    // try to read a double
  |     ...;
  |   }
  `----
  This loop is counting the doubles in an input file but discarding
  them.  Avoiding the need to allocate the `tmp' variable is desirable
  and can be done using some more advanced format specifiers in
  `fscanf() / scanf()' that indicate "read a double but do not store it
  anywhere". This allows the counting loop to happen without need for
  any additional space for doubles.

  Look how to get `fscanf()' to parse items like doubles without
  requiring space to store the result.


PROBLEM 2: Number conversions
=============================

A
~

  Fill in the following table of equivalences.  Filling in the table
  from top to bottom is advantageous as earlier rows can sometimes be
  used to infer lower values. Feel free to make use of any ASCII table
  or the table.c program provided in the week 3 lecture code pack.

  ,----
  | |-----+------+-----+-----------+--------------|
  | | Dec | Hex  | Oct | Binary    | Char         |
  | |-----+------+-----+-----------+--------------|
  | |   9 | 0x09 |  11 | 0000 1001 | TAB          |
  | |  10 | ?    |   ? | ?         | \n (newline) |
  | |   ? | 0x20 |   ? | ?         | SPACE        |
  | |   ? | ?    |   ? | 0011 0010 | ?            |
  | |  65 | 0x41 | 101 | 0100 0001 | A            |
  | |  66 | ?    |   ? | ?         | ?            |
  | |   ? | 0x4F | 117 | ?         | O            |
  | |  80 | ?    |   ? | ?         | P            |
  | |  91 | ?    | 133 | 0101 1011 | [            |
  | |  97 | 0x61 | 141 | ?         | ?            |
  | |   ? | ?    | 172 | 0111 1010 | z            |
  | | 145 | 0x91 | 221 | ?         | none         |
  | | 160 | ?    |   ? | 1010 0000 | none         |
  | | 180 | 0xB4 | 264 | ?         | none         |
  | | 255 | ?    |   ? | ?         | none         |
  | |-----+------+-----+-----------+--------------|
  `----


B
~

  Fill in the bits, hex, and decimal values for the given examples. The
  first example is completed for you. Assume all of these are 32 bit
  unsigned integers.
  ,----
  |   COMPLETED
  |   Binary:   0000 0000  0000 0000  0001 1000  1110 1001  
  |             0    0     0    0     1    8     E    9
  |   Hex   :   0018E9
  |   Decimal:  6377
  | 
  | 
  |   NUMBER 1
  |   Binary:   0000 0000  0010 1111  0011 1010  1000 1101  
  |             ?
  |   Hex   :   ??
  |   Decimal:  ??
  | 
  | 
  |   NUMBER 2
  |   Binary:   ??  
  |             7    F     8    3     5    A     0    B
  |   Hex   :   7F835A0B
  |   Decimal:  ??
  `----


PROBLEM 3: Signed Integer Conversions
=====================================

A
~

  Apply the steps involved in converting the following positive binary
  number to it's two's complement negation in 8-bit signed
  format. Recall the steps are
  - Invert the bits
  - Add one
  Apply these steps to the following number:
  ,----
  | 0111 1100  = 0x7C = 124 (decimal)
  `----

  The result is the two's complement representation of -124.

  Convert back to positive via the same process
  - Invert the bits
  - Add one
  to show that the original bits are gotten back.


B
~

  Complete the following table of equivalences assuming 8-bit
  twos-complement signed integers. The rightmost column is the inverse
  of the binary representation: flip 1's to 0's, and vice versa.

  ,----
  | |------+------+-----------+-----------|
  | |  Dec |  Hex | Binary    | Inverse   |
  | |------+------+-----------+-----------|
  | |   +5 | 0x05 | 0000 0101 | 1111 1010 |
  | |   -5 |    ? | 1111 1011 | ?         |
  | |  +32 | 0x20 | ?         | ?         |
  | |  -32 | 0xE0 | ?         | 0001 1111 |
  | | +127 | 0x7F | ?         | ?         |
  | | -127 | 0x81 | ?         | ?         |
  | | -128 |    ? | 1000 0000 | ?         |
  | |   +2 |    ? | ?         | ?         |
  | |   -2 | 0xFE | ?         | ?         |
  | |   +1 | 0x01 | 0000 0001 | ?         |
  | |   -1 |    ? | 1111 1111 | ?         |
  | |    0 |    ? | ?         | ?         |
  | |------+------+-----------+-----------|
  `----


PROBLEM 4: Converting Strings to Numbers
========================================

  Inspect the program in the code pack called `convert.c'.  Compile and
  run it using
  ,----
  | > gcc convert.c
  | > ./a.out
  `----

  Describe briefly what kind of conversion is being done by the
  `convert()' function given.
  - A. What kind of data is input?
  - B. What result is produced by the function?
  - C. How is a success versus an error reported?
  - D. Why is this kind of conversion needed?
  - E. What built-in C function (useful for the assignment) does this
    conversion function use and how is its calling convention different
    from convert()?

Author: Chris Kauffman (kauffman@umn.edu)
Date: 2023-01-31 Tue 09:58