Last Updated: 2023-03-28 Tue 17:03

CSCI 2021 Lab10: Timing Code and Machine Speed

CODE DISTRIBUTION: lab10-code.zip

CHANGELOG: Empty

1 Rationale

Differences and oddities in CPU architecture are often revealed through observations about the time certain programs take to run. This lab explores these issues through a program that is provided as part of the most recent HW. The code implements some "micro-benchmarks" which repeatedly performing arithmetic operations with slight variations. Timing these, along with some knowledge of CPU architecture, is instructive for observing some of the implementation differences that constitute low-level CPU implementation.

Grading Policy

Credit for this Lab is earned by completing the exercises here and submitting a Zip of the work to Gradescope. Students are responsible to check that the results produced locally via make test are reflected on Gradescope after submitting their completed Zip. Successful completion earns 1 Engagement Point.

Lab Exercises are open resource/open collaboration and students are encouraged to cooperate on labs. Students may submit work as groups of up to 5 to Gradescope: one person submits then adds the names of their group members to the submission.

See the full policies in the course syllabus.

2 Codepack

The codepack for the HW contains the following files:

File   Description
QUESTIONS.txt EDIT Questions to answer: fill in the multiple choice selections in this file.
superscalar_main.c Provided C code also in HW10 that is used to observe CPU timing differences
superscalar_funcs.c Provided  
Makefile Build Enables make test and make zip
QUESTIONS.txt.bk Backup Backup copy of the original file to help revert if needed
QUESTIONS.md5 Testing Checksum for answers in questions file
test_quiz_filter Testing Filter to extract answers from Questions file, used in testing
test_lab10.org Testing Tests for this lab
testy Testing Test running scripts

3 Timing on Two Machines

HW10 has students timing the execution speed of the several different arithmetic functions. For this the time utility is used and your lab demoer will briefly discuss the time utility and the information it provides about program runs. This information is coverd in HW10 posted here:

https://www-users.cs.umn.edu/~kauffman/2021/hw10.html

Timing code runs is interesting as one fully expects the results to vary from one computer to the next. Lab leaders will demonstrate as much by showing timings of the same program runs on two different processors available through the CSE Labs system

  1. csel-plate01.cselabs.umn.edu which is a server in Keller Hall machine room (SSH only).
  2. csel-kh1260-NN.cselabs.umn.edu machines such as csel-kh1260-01 which are desktop workstations in the Keller 1-260 (physical access or SSH login)

Running the same programs on these two machines will lead to different times, sometimes in ways that are quite surprising.

Lab staff will focus their presentation on timing the superscalar_main program which is central to HW10. It is used to run small integer benchmarks that repeatedly add / multiply in different combinations. Timings of these operations reveal peculiarities of some processors.

NOTE: Staff will show results on both the machines csel-plate01 and csel-kh1260-NN during the lab but for HW10, the focus is timing only in csel-kh1260-NN machines.

4 Using the time and lscpu Utilities

Staff will briefly discuss the time utility and cover the 3 types of times it reports. They may timing of programs like the following and explain the differing times for these.

> time make                    # build a program using a makefile
...
> time ls -lR /sys > /dev/null # recursive listing of /sys system directory
..
> time ping -c 3 google.com    # contact google 3 times to see if it is responding
...

Staff will mention which of the measures that time reports is most important to evaluating integer arithmetic code like superscalara_main.

Make sure to download the superscalar_main application from the HW10 specification here:

https://www-users.cs.umn.edu/~kauffman/2021/hw10.html

Staff will Compile and time runs of it as students will do in HW10.

On a run of the program such as

> make
...
> time ./superscalar_main 1 30 add1_diff
...

When timing programs, it is good to know something about the CPU on which the program is being run. This can be obtained via the `lscpu` utility on Linux systems. It can be run just by typing

> lscpu
...

and reports a variety of information including BogoMIPS, a "crude measure of CPU speed" which can be used to roughly compare processor clock speed.

5 QUESTIONS.txt File Contents

Below are the contents of the QUESTIONS.txt file for the lab. Follow the instructions in it to complete the QUIZ and CODE questions for the lab.

                           __________________

                            LAB 10 QUESTIONS
                           __________________





Lab Instructions
================

  Follow the instructions below to experiment with topics related to
  this lab.
  - For sections marked QUIZ, fill in an (X) for the appropriate
    response in this file. Use the command `make test-quiz' to see if
    all of your answers are correct.
  - For sections marked CODE, complete the code indicated. Use the
    command `make test-code' to check if your code is complete.
  - DO NOT CHANGE any parts of this file except the QUIZ sections as it
    may interfere with the tests otherwise.
  - If your `QUESTIONS.txt' file seems corrupted, restore it by copying
    over the `QUESTIONS.txt.bk' backup file.
  - When you complete the exercises, check your answers with `make test'
    and if all is well, create a zip file with `make zip' and upload it
    to Gradescope. Ensure that the Autograder there reflects your local
    results.
  - IF YOU WORK IN A GROUP only one member needs to submit and then add
    the names of their group.


QUIZ Timing Code
================

  Using the HW10 code pack which contains the `superscalar_main'
  benchmark program, answer the following questions concerning timing on
  several lab machines. You will need to SSH into several machines to
  complete the questions.


time utility
~~~~~~~~~~~~

  On a run of the program such as
  ,----
  | > time ./superscalar_main 1 30 add1_diff
  | ...
  `----
  which of the reported times is the most relevant to understanding
  processor speed?
  - ( ) The `real' time as it reports how many seconds the user has to
    wait for the program to complete
  - ( ) The `user' time which is the number of seconds that the CPU
    spends executing the code in the user's program
  - ( ) The `sys' time because it indicates how much time the program
    spends in OS system calls


Processor types
~~~~~~~~~~~~~~~

  Use the `lscpu' utility on these two machines:
  - csel-plate01.cselabs.umn.edu : a server machine
  - csel-kh1260-10.cselabs.umn.edu : a desktop lab machine
  Analyze the output to the types of processors and their relative
  processing speed according to the "BogoMIPS" measure.
  - ( ) `csel-plate01' and `csel-kh1260-NN' both have AMD processors and
    the BogoMIPS measure indicates `csel-plate01' is faster
  - ( ) `csel-plate01' and `csel-kh1260-NN' both have Intel processors
    and the BogoMIPS measure indicates `csel-kh1260-NN' is faster
  - ( ) `csel-plate01' has Intel processors and `csel-kh1260-NN' has AMD
    processors and the BogoMIPS measure indicates `csel-plate01' is
    faster
  - ( ) `csel-plate01' has AMD processors and `csel-kh1260-NN' has Intel
    processors and the BogoMIPS measure indicates `csel-kh1260-NN' is
    faster


Timings using `superscalar_main'
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  Compile the `superscalar_main' program using the provided `Makefile'
  and time runs of it on both `csel-plate01' and `csel-kh1260-25' using
  the following commands:
  ,----
  | >> make
  | gcc -Wall -Werror -g  -Og -o superscalar_main superscalar_main.c superscalar_funcs.c
  | 
  | >> time ./superscalar_main 1 30 add1_diff
  `----

  According to what you observe for this, which of the following best
  reflects the outcome of the runs between the two machines?
  - ( ) `csel-plate01' takes about 0.91s to run while `csel-kh1260-NN'
    takes about 0.63s to run indicating `csel-kh1260-NN' is faster
  - ( ) `csel-plate01' takes about 0.50s to run while `csel-kh1260-NN'
    takes about 0.85s to run indicating `csel-plate01' is faster
  - ( ) `csel-plate01' takes about 1.99s to run while `csel-kh1260-NN'
    takes about 0.25s to run indicating `csel-kh1260-NN' is faster
  - ( ) `csel-plate01' takes about 0.10s to run while `csel-kh1260-NN'
    takes about 1.15s to run indicating `csel-plate01' is faster


Analysis of Benchmarks
~~~~~~~~~~~~~~~~~~~~~~

  Among the micro 'benchmarks' implemented in `superscalar_main` are the
  following two
  ,----
  |   add2_diff : add 2 times in same loop; different destination variables
  |   add2_same : add 2 times in same loop; same destination variable
  `----
  Find the code for the two functions that implement these benchmarks in
  the file `superscalar_funcs.c' (each benchmark has a function named
  for it).

  Analyze the code and CHECK ALL OF THE BELOW ITEMS that are true.
  - ( ) Both `add2_diff()' and `add2_same()' have loops the repeatedly
    perform arithmetic operations
  - ( ) `add2_diff()' will loop fewer times than `add2_same()' for the
    same function parameters / command line arguments
  - ( ) Both `add2_diff()' and `add2_same()' dereference pointers in
    their loops so interact with main memory every iteration
  - ( ) Both `add2_diff()' and `add2_same()' primarily work on registers
    in their loops as there are no memory references in the loop body
  - ( ) The biggest difference between them is that `add2_diff()' adds
    each iteration to different variables/registers while `add2_same()'
    adds to the same variable/register each iteration
  - ( ) The biggest difference between them is that `add2_diff()' adds
    twice each iteration while `add2_same()' adds once each iteration


Timing Mysteries
~~~~~~~~~~~~~~~~

  Time runs of the two benchmarks above by running these commands.
  ,----
  | time ./superscalar_main 1 30 add2_diff
  | time ./superscalar_main 1 30 add2_same
  `----
  Perform the timing on BOTH `csel-plate01' and `csel-kh1260-NN' and
  report on the relations below.

  On `csel-plate01'
  - ( ) csel-plate01: time for `add2_diff < add2_same'
  - ( ) csel-plate01: time for `add2_diff > add2_same'
  - ( ) csel-plate01: time for `add2_diff = add2_same'

  On `csel-kh1260-NN'
  - ( ) csel-kh1260-NN: time for `add2_diff < add2_same'
  - ( ) csel-kh1260-NN: time for `add2_diff > add2_same'
  - ( ) csel-kh1260-NN: time for `add2_diff = add2_same'

  These results should seem strange to you and requires further
  discussion which will come in lecture.


CODE None
=========

  None for this lab : analyze the provided superscalar_main.c and
  superscalar_funcs.c to learn several interesting techniques such as
  how to create an array of *function pointers* and select one to run.

6 Submission

Follow the instructions at the end of Lab01 if you need a refresher on how to upload your completed lab zip to Gradescope.


Author: Chris Kauffman (kauffman@umn.edu)
Date: 2023-03-28 Tue 17:03