CSCI 2021 Lab11: Timing Strides

Due: 11:59pm Tue 11-Apr-2023 on Gradescope
Approximately 1.00% of total grade

CODE DISTRIBUTION: lab11-code.zip

CHANGELOG: Empty

1. Rationale
2. Codepack
3. Using the clock() Function
4. QUESTIONS.txt File Contents
5. Submission

1 Rationale

While command line utilities like time can provide overall information on how long a program takes to run, often one wishes to time individual sections of code. The clock() function enables this through reporting the current "moment" in time for the user CPU clock. It is typically used via the pattern:

Call clock() to get the START time
Do some sort of computation that takes a while
Call clock() to get the FINISH time
Compute ELAPSED = FINISH - START

This pattern can be repeated throughout a code and is often used in benchmarks to time and compare approaches to a problem.

This lab introduces the use of clock() on a problem involving structs that are arranged differently in memory. The timing allows one to observe how struct layout affects performance which reinforces the central idea of efficient memory access: sequential access patterns are faster than better than strided or random access patterns.

Grading Policy

Credit for this Lab is earned by completing the exercises here and submitting a Zip of the work to Gradescope. Students are responsible to check that the results produced locally via make test are reflected on Gradescope after submitting their completed Zip. Successful completion earns 1 Engagement Point.

Lab Exercises are open resource/open collaboration and students are encouraged to cooperate on labs. Students may submit work as groups of up to 5 to Gradescope: one person submits then adds the names of their group members to the submission.

See the full policies in the course syllabus.

2 Codepack

The codepack for the HW contains the following files:

File		Description
`QUESTIONS.txt`	EDIT	Questions to answer: fill in the multiple choice selections in this file.
`struct_stride.c`	EDIT	C code in HW10 code pack that is used to observe CPU timing differences
`Makefile`	Build	Enables `make test` and `make zip`
`QUESTIONS.txt.bk`	Backup	Backup copy of the original file to help revert if needed
`QUESTIONS.md5`	Testing	Checksum for answers in questions file
`test_quiz_filter`	Testing	Filter to extract answers from Questions file, used in testing
`test_lab11.org`	Testing	Tests for this lab
`testy`	Testing	Test running scripts

3 Using the `clock()` Function

The code block below illustrates the basic usage pattern for the clock() function.

#include <time.h>               // for clock() and clock_t

{
  clock_t begin = clock();      // current cpu moment

  Perform computation that takes a while;

  clock_t end = clock();        // later cpu moment

  double cpu_time =             // convert into seconds
    ((double) (end-begin)) / CLOCKS_PER_SEC;

  printf("Elapsed CPU Time: %f second\n", cpu_time);
}

A caveats are worth pointing out.

The clock_t type is often a large integer type like unsigned long which is why one can perform subtraction using it. Don't rely on this being the case and just use the type indicated as shown.
clock() itself returns a number corresponding to the number of CPU "ticks" which have occurred while the program runs. This requires conversion into the number of seconds shown above. It makes use of the CLOCKS_PER_SECOND constant which is included via time.h.
The time computed by this method is equivalent to the user time reported by the time utility: it is how much CPU time the user program has used. It is relevant to timing computational loops but is complemented by "wall time" which requires use of different timing functions like gettimeofday() to compute.
WARNING: Timing code runs is inherently noisy and will vary from one run to the next. clock() is reliable for timing computations that take around 0.001 seconds (a thousandth of a second). For times shorter than that, the variations in timing will likely be nearly as large as the total time which makes timing shorter activities unreliable.

Adjust program parameters like the number of loop iterations so reported times are at least 1e-03 seconds. Ideally times should be larger, in the 1e-01 second range to be trustworthy.

Time your code on the machine login01.cselabs.umn.edu.

While other machines may reflect the same speeds, login01 is known to show the intended behavior.

4 QUESTIONS.txt File Contents

Below are the contents of the QUESTIONS.txt file for the lab. Follow the instructions in it to complete the QUIZ and CODE questions for the lab.

                           __________________

                            LAB 11 QUESTIONS
                           __________________





Lab Instructions
================

  Follow the instructions below to experiment with topics related to
  this lab.
  - For sections marked QUIZ, fill in an (X) for the appropriate
    response in this file. Use the command `make test-quiz' to see if
    all of your answers are correct.
  - For sections marked CODE, complete the code indicated. Use the
    command `make test-code' to check if your code is complete.
  - DO NOT CHANGE any parts of this file except the QUIZ sections as it
    may interfere with the tests otherwise.
  - If your `QUESTIONS.txt' file seems corrupted, restore it by copying
    over the `QUESTIONS.txt.bk' backup file.
  - When you complete the exercises, check your answers with `make test'
    and if all is well, create a zip file with `make zip' and upload it
    to Gradescope. Ensure that the Autograder there reflects your local
    results.
  - IF YOU WORK IN A GROUP only one member needs to submit and then add
    the names of their group.


CODE: `struct_stride.c' Program
===============================

  The provided `struct_stride.c' program has a number of TODO items in
  it related to timing several computations and reporting their results.
  It is best to complete these items first and then attempt to answer
  the quiz questions as some questions require running the program and
  observing timing results.

  Use the lab's description of how the `clock()' function works to
  complete TODO items and print the results.

  When completed, the program can be run as show below:
  ,----
  | >> ./struct_stride 
  | usage: ./struct_stride <arr_length> <num_iters>
  | 
  | >> ./struct_stride 10000000 100
  | method: int_field_base CPU time: 1.2345e-01 sec   sum: 0
  | method: arr_field_base CPU time: 1.2345e-01 sec   sum: 0
  | method: int_field_optm CPU time: 1.2345e-01 sec   sum: 0
  | method: arr_field_optm CPU time: 1.2345e-01 sec   sum: 0
  `----

  NOTE: the timing information has intentionally been changed to read
  1.2345e-01 as calculating this timing information is part of the lab.


QUIZ: Timing `struct_stride.c' Runs
===================================

  *Time your code on the machine login01.cselabs.umn.edu.*

  While other machines may reflect the same speeds, login01 is known to
  show the intended behavior.


Relative Speed of Struct Layouts
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  After adding in calls to `clock()' and code the print times, run the
  `struct_strid' program.

  Run the program with a large array and a modest amount of array
  iterations such as using the following parameters:
  ,----
  | ./struct_stride 10000000 100
  `----

  Examine the times reported.

  Which option below reflects the relative speeds of the
  layout/algorithm combinations?
  ,----
  |   ------SLOWEST--------------------------------------------FASTEST-----
  | - ( ) arr_field_base > arr_field_optm > int_field_base > int_field_optm 
  | - ( ) int_field_base > int_field_optm > arr_field_base > arr_field_optm
  | - ( ) arr_field_base > int_field_base > arr_field_optm > int_field_optm 
  | - ( ) int_field_base > arr_field_base > int_field_optm > arr_field_optm
  `----


int_field_base VS arr_field_base
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  Examine the differences between the two types of structs that are used
  in `struct_stride.c' called `int_field_t' and `arr_field_t'.

  Now examine the first 2 code blocks that use these structs called,
  `int_field_base' and `arr_field_base'. Both involve arrays and structs
  which store an equal number of positive and negative
  integers. However, they differ in the overall layout of those
  integers.  Both use loops sum the "a" numbers first then sum the "b"
  numbers, then combine them for the total sum.

  Which of the following are possible explanations for the timing
  difference between `int_field_base' and `arr_field_base'?
  - ( ) `int_field_base' must perform more loop iterations than
    `arr_field_base' which will making it slower.
  - ( ) `arr_field_base' uses more memory to store then number than
    `int_field_base' and this additional memory increases speed.
  - ( ) `int_field_base' has a memory layout that is ABABABAB so when
    adding A elements, there is a "stride" through
    memory. `arr_field_base' has a layout like `AAAAABBBBB' so adding
    elements has no stride.
  - ( ) `int_field_base' uses struct field access. The assembly
    instructions to access array fields are slower than the assembly
    instructions that access array elements. This makes `arr_field_base'
    faster due to its use of plain integer arrays.


BASE vs OPTM versions
~~~~~~~~~~~~~~~~~~~~~

  The last two layout/algorithm sections are labeled "optm" as they
  perform a simple code transformation from their "base" version.

  Select ALL of the items below that are accomplished with this
  transformation.

  - ( ) Fewer loop checks/increments are needed as there is only one
    loop instead of 2.
  - ( ) The number of loop iterations is lowered for all loops in the
    optm version.
  - ( ) The memory stride is eliminated for the int_field_optm as both
    a/b elements are added each iteration.
  - ( ) The algorithmic complexity is reduced from O(N^2) in the "base"
    to O(N) in the "optm" version.

5 Submission

Follow the instructions at the end of Lab01 if you need a refresher on how to upload your completed lab zip to Gradescope.