CSCI 5451 Assignment 1: Parallel Architecture and Communication

Due: Thu 02-Feb-2023 by 11:59 pm
Approximately 12.5% of total grade
Submit to Gradescope in PDF format
You may work in groups of 1 or 2 and submit one assignment per group.

Note: some of the math fonts used throughout the assignment may not display correctly. I have tested the display on Google Chrome and Mozilla Firefox and seen it work when connected to the network. Notify Prof Kauffman via email if it does not render properly.

CHANGELOG: Empty

1. (15pts) Problem 1: Store-and-forward Routing (Grama 2.26)
2. (10pts) Problem 2: Embedding a Ring Network in a Tree Network
3. (10pts) Problem 3: Task-dependence Graphs (Grama 3.2)
4. (15pts) Problem 4: Parallel Histogram
5. (15pts) Problem 5: Parallel One-to-All Broadcast in 2D Torus
6. (15pts) Problem 6: Bring the Heat
7. (15pts) Problem 7: C Proficiency / K-Means Preparation
8. (5pts) Statement of Effort
9. Submission Instructions

1 (15pts) Problem 1: Store-and-forward Routing (Grama 2.26)

Consider the routing of messages in a parallel computer that uses store-and-forward routing. In such a network, the cost of sending a single message of size \(m\) from \(P_{source}\) to \(P_{destination}\) via a path of length \(d\) is \(t_s + t_w \times d \times m\). An alternate way of sending a message of size m is as follows. The user breaks the message into \(k\) parts each of size \(m/k\), and then sends these \(k\) distinct messages one by one from \(P_{source}\) to \(P_{destination}\). For this new method, derive the expression for time to transfer a message of size \(m\) to a node \(d\) hops away under the following two cases:

Assume that another message can be sent from \(P_{source}\) as soon as the previous message has reached the next node in the path.
Assume that another message can be sent from \(P_{source}\) only after the previous message has reached \(P_{destination}\).

For each case, comment on the value of this expression as the value of \(k\) varies between 1 and \(m\). Also, what is the optimal value of \(k\) if \(t_s\) is very large, or if \(t_s = 0\)?

Clarification: To start sending any message, a cost of \(t_s\) must be paid. For example sending \(m=10\) words in its entirety costs \[t_s + t_w \times d \times 10\] Sending the same message broken into two 5-word chunks costs \[(t_s + t_w \times d \times 5) + (t_s + t_w \times d \times 5)\] if the sending does not overlap.

Example

To increase the clarity, below is a concrete example of the problem setting with communication according to both sets of assumptions. Studying it may help with visualizing the nature of the problem.

* ****** EXAMPLE PROBLEM PARAMETERS ****** *
m=10 : 10 words of data
k=2  : message in two parts, 5 words each [A5 B5]
d=3  : Source to Destination is 3 hops

* ****** A ASSUMPTIONS ****** *
After S sends A5, can immediately send B5 to "follow" it

TIME=0
S:0 ---- 1 ---- 2 ---- D:3
A5,B5

Initiate sending A5

TIME=1*5*t_w + t_s
S:0 ---- 1 ---- 2 ---- D:3
B5      A5

Initiate sending B5

TIME=2*5*t_w + t_s + t_s
S:0 ---- 1 ---- 2 ---- D:3
        B5     A5

TIME=3*5*t_w + t_s + t_s
S:0 ---- 1 ---- 2 ---- D:3
               B5      A5

TIME=4*5*t_w + t_s + t_s
S:0 ---- 1 ---- 2 ---- D:3
                       A5,B5

* ****** B ASSUMPTIONS ****** *
After sending A5, S must wait until it arrives at D before send B5.

TIME=0
S:0 ---- 1 ---- 2 ---- D:3
A5,B5

Initiate sending A5

TIME=1*5*t_w + t_s
S:0 ---- 1 ---- 2 ---- D:3
B5      A5

TIME=2*5*t_w + t_s
S:0 ---- 1 ---- 2 ---- D:3
B5             A5

TIME=3*5*t_w + t_s
S:0 ---- 1 ---- 2 ---- D:3
B5                     A5

A5 Arrived
Initiate sending B5

TIME=4*5*t_w + t_s + t_s
S:0 ---- 1 ---- 2 ---- D:3
        B5             A5

TIME=5*5*t_w + t_s + t_s
S:0 ---- 1 ---- 2 ---- D:3
               B5      A5

TIME=6*5*t_w + t_s + t_s
S:0 ---- 1 ---- 2 ---- D:3
                       A5,B5

2 (10pts) Problem 2: Embedding a Ring Network in a Tree Network

Section 2.7 of Grama discusses a number techniques to evaluate how well one type of network interconnect may be embedded in another. Two important characteristics of such embedding are identified in Grama 2.7:

Congestion: The maximum number of links in the source network mapped to a single link in the destination network.
Dilation: The maximum number of links traversed in the destination network which correspond to a single link in the source network.

We briefly discussed this at a high level in lecture indicating that embedding a 2D Mesh into Hypercube is possible as the former has many fewer links than the latter. We were able to informally show that mapping a source 2D Mesh network to a destination Hypercube network would yield a Congestion of 1 (all links in the mesh had a unique link in the Hypercube) and a Dilation of 1 (adjacent procs in the Mesh were adjacent in the Hypercube).

Below is a diagram of an 8-node Ring network (linear array with wrap-around) and an 8-node Binary Tree network (with intermediary switches as squares).

Discuss how one can map/embed the Ring to the Tree. Be detailed, perhaps annotating the diagrams to show the specific mapping. Give the numeric values for Congestion and Dilation in this mapping.

Generalize your answer for a \(2^D\) node Ring mapped to a \(2^D\) node Tree. Provide an analytic expression for Congestion and Dilation. Informally justify your answers (no need for a formal proof).

3 (10pts) Problem 3: Task-dependence Graphs (Grama 3.2)

For the task graphs given in Figure 3.42 (below), determine the following:

Maximum degree of concurrency.
Critical path length.
Maximum achievable speedup over one process assuming that an arbitrarily large number of processors is available.
The minimum number of processors needed to obtain the maximum possible speedup.
The maximum achievable speedup if the number of processors is limited to 2, 4, and 8.

Assume that each task node takes an equal amount of time to execute.

Figure 3.42. Task-dependency graphs for this Problem.

4 (15pts) Problem 4: Parallel Histogram

This problem is a variant of Grama 3.19 + 3.20 which discusses Bucket Sort but focuses on the simpler task of computing Histograms.

Consider parallelizing a simple histogram algorithm on a distributed memory computer.

Input consists of an array A[] of N random integers in the range {0...(R-1)}.
The algorithm should compute the array of counts H[] where element H[i] is the total number of times integer i appeared in input array A[].
All P processors on the parallel computer are assumed to have access to the entire array A[] which can be loaded from permanent storage. However if the size N gets large enough, only parts of A[] can be held in memory for an individual processor.
At the end of the algorithm execution, at least one processor must contain the entire array H[] which may involve some communication.

EXAMPLE:

Input: R=4 and A[] of size N=10 is A[] = {0, 2, 1, 3, 2, 1, 1, 0, 0, 1}
Output: H[] of size R=4 is H[] = {3, 4, 2, 1} as H[i] is the number of occurrences of i throughout A[]

Describe two ways to decompose this problem into a parallel program. Compare these two.

Describe a decomposition based on partitioning the input data (i.e., the array A[]) and an appropriate mapping onto p processors. Describe briefly how the resulting parallel algorithm would work.
Describe a decomposition based on partitioning the output data (i.e., the array H[]) and an appropriate mapping onto p processors. Describe briefly how the resulting parallel algorithm would work.
Discuss the advantages and disadvantages of these two parallel formulations. Describe circumstances under which each is preferred. Such circumstances should consider
- P the number of processors
- R the number of different data possibilities (size of H[])
- A[] the input array and n its size
- Potentially also the cost to communicate between the processors

5 (15pts) Problem 5: Parallel One-to-All Broadcast in 2D Torus

Assume processor 0 has a message that must be sent to all other processors in a distributed memory parallel computer. Describe an efficient algorithm to do this in a 2D Torus (wrap-around mesh) with R rows and C columns.

Account for the following in your answers.

A processor can send a message to any other processor. Any processor between the source and destination does not receive the message, only forwards it along the path. However, once a processor has actually received the message, it can subsequently be a source and send it to other processors.
Processors can be indexed with 2D coordinates as in processor (0,2) sends a message to processor (5,4)
Give rough pseudocode for how this algorithm will look
Give a cost estimate of this communication in terms of the number or rows R and columns C in the network. The cost of a single communication between nodes that are \(d\) hops away is \[t_{single} = t_s + t_w \times m \times d\] You may assume that R and C are powers of two in your analysis to simplify the expression for the total communication time \(t_{comm}\).

6 (15pts) Problem 6: Bring the Heat

Consider a simple physical simulation of heat transfer. A rod of material is insulated except at its ends to which are attached two constant sources of heat/cold.

The rod is broken into discrete chunks, sometimes referred to as finite elements, and it is assumed that throughout each element the temperature is uniform. The reservoir elements of heat/cold at the left and right remain at a constant temperature. For the other elements, the temperature is updated at each time step by examining the temperature of its neighbors compared to itself. This dependence is indicated in the lower part of the diagram and the exact update for an element is given in the code below.

The following C code implements a simple version of this simulation. A row of the matrix H contains the temperature of all elements in the rod at a time step with the leftmost and rightmost elements remaining constant. Ultimately, the program computes the entire matrix H which contains temperatures of all elements at all time steps. Examine this code carefully.

#include <stdio.h>
#include <stdlib.h>
//  HEAT TRANSFER SIMULATION
//  
//  Simple physical simulation of a rod connected at the left and right
//  ends to constant temperature heat/cold sources.  All positions on
//  the rod are set to an initial temperature.  Each time step, that
//  temperature is altered by computing the difference between a cells
//  temperature and its left and right neighbors.  A constant k
//  (thermal conductivity) adjusts these differences before altering
//  the heat at a cell.  Use the following model to compute the heat
//  for a position on the rod according to the finite difference
//  method.
// 
//     left_diff  = H[t][p] - H[t][p-1];
//     right_diff = H[t][p] - H[t][p+1];
//     delta = -k*( left_diff + right_diff )
//     H[t+1][p] = H[t][p] + delta
//  
//  Substituting the above, one can get the following
// 
//    H[t+1][p] = H[t][p] + k*H[t][p-1] - 2*k*H[t][p] + k*H[t][p+1]
// 
//  The matrix H is computed for all time steps and all positions on
//  the rod and displayed after running the simulation.  The simulation
//  is run for a fixed number of time steps rather than until
//  temperatures reach steady state.

int main(int argc, char **argv){
  int max_time = 50;            // Number of time steps to simulate 
  int width = 20;               // Number of cells in the rod 
  double initial_temp = 50.0;   // Initial temp of internal cells  
  double L_bound_temp = 20.0;   // Constant temp at Left end of rod 
  double R_bound_temp = 10.0;   // Constant temp at Right end of rod 
  double k = 0.5;               // thermal conductivity constant 
  double **H;                   // 2D array of temps at times/locations  

  // Allocate memory  
  H = malloc(sizeof(double*)*max_time); 
  int t,p;
  for(t=0;t<max_time;t++){
    H[t] = malloc(sizeof(double*)*width);
  }

  // Initialize constant left/right boundary temperatures 
  for(t=0; t<max_time; t++){
    H[t][0] = L_bound_temp;
    H[t][width-1] = R_bound_temp;
  }

  // Initialize temperatures at time 0 
  t = 0;
  for(p=1; p<width-1; p++){
    H[t][p] = initial_temp;
  }

  // Simulate the temperature changes for internal cells
  for(t=0; t<max_time-1; t++){
    for(p=1; p<width-1; p++){
      double left_diff  = H[t][p] - H[t][p-1];
      double right_diff = H[t][p] - H[t][p+1];
      double delta = -k*( left_diff + right_diff );
      H[t+1][p] = H[t][p] + delta;
    }
  }

  // Print results 
  printf("Temperature results for 1D rod\n");
  printf("Time step increases going down rows\n");
  printf("Position on rod changes going accross columns\n");

  // Column headers 
  printf("%3s| ","");
  for(p=0; p<width; p++){
    printf("%5d ",p);
  }
  printf("\n");
  printf("%3s+-","---");
  for(p=0; p<width; p++){
    printf("------");
  }
  printf("\n");
  // Row headers and data 
  for(t=0; t<max_time; t++){
    printf("%3d| ",t);
    for(p=0; p<width; p++){
      printf("%5.1f ",H[t][p]);
    }
    printf("\n");
  }

  return 0;
}

Answer the following questions about how to parallelize this code.

A typical approach to parallelizing programs is to select a loop and split iterations of work between available processors. Describe how one might do this for the heat program. Make sure to indicate any loops for which this approach is not feasible and any which seem more viable to you.
Describe how you would divide the data for the heat transfer problem among many processors in a distributed memory implementation to facilitate efficient communication and processing. Describe a network architecture that seems to fit this problem well and balances the cost of the network well.

7 (15pts) Problem 7: C Proficiency / K-Means Preparation

CODEPACK: kmeans-code.zip

FILE	DESCRIPTION
`kmeans.py`	Python kmeans clustering program to convert to C
`python-demo.txt`	Demo of running the Python version, C version should produce similar results
`filestats.c`	Parsing routine to help process data files in C
`sample-mnist-data/`	Directory with data files for clustering
`kmeans_serial.c`	CREATE this file with your C version

In the next Assignment, you will be responsible for coding a parallel implementation of K-Means clustering. This will be coded in the C language. Many students may be rusty or uncertain about their C coding skills. To prepare for the parallel coding exercise, convert the a Python program to a C version. The provided Python code is downloadable with associated data files in the codepack linked at the top of this problem.

Brief Overview of K-Means Clustering

K-Means clustering will be discussed in lecture several times to introduce it and later to discuss various approaches to parallelizing the serial algorithm. For a additional details, examine the Wikipedia article on K-Means Clustering. Below is a brief overview of the central ideas.

The algorithm can be summarized as follows:

- Input: data + #of clusters desired
- Output: assignment of each data to a cluster + cluster centers 
- Algorithm: Iterates between
  1. Calculate cluster centers
  2. Calculate cluster assignments
  Until no data changes clusters

Each point to be clustered is described as a vector of numerical features. In the below sample picture, each data has 2 such features so can be easily plotted in 2 dimensions. The figure starts with two random cluster centers, then iterates above between assigning data to clusters, then recomputing cluster centers, then assigning, etc.

Figure 1: Example of progression of K-means clustering algorithm. Source: K-Means by Chris Piech. Based on a handout by Andrew Ng.

K-means requires one to specify how many clusters to group data into. Our K-means clustering initially assigns each data randomly to one cluster. Then the cluster centers are computed by summing the features of each data in a cluster and dividing by the number of data in the cluster. Finally, each data is compared against each cluster center and re-assigned to it if it is closer than its current cluster's center. The distance can be a variety of metrics but Euclidean distance is common.

The algorithm iterates between calculating new cluster centers and then re-assigning each data to new clusters. Under very mild conditions, cluster assignments will converge and this can be detected by counting how many data are assigned to different clusters than in the last iteration. When this count is 0, the algorithm has converged.

For our application, we will be clustering small images of hand-written digits so the features for the intensities of pixels in the image. A small image file like this one:

Figure 2: Sample of a small hand-written digit in graymap

may have pixel data in 2 dimensions like this (here written in the Portable Gray Map [PGM] format):

P2    # file type
5 7   # 5 columns by 7 rows
15    # max intensity 15
0  0  0  0  0
0 10 15 15  0
0 14  0  0  0
0 15 15  9  0
4  0  0  12 3
0 13  14 9  0
0  0  8  0  0

Such a data point is written as a 5x7 = 35 element vector by concatenating the rows of pixels as in:

0  0  0  0  0  0 10 15 15  0  0 14  0  0  0  0 15 15  9  0  4  0  0  12 3  0 13  14 9  0  0  0  8  0  0

Our application will cluster such collections of such images. Each image appears as its own line in a text file starting with a label as a digit of 0, 1, 2,... 9. This is followed by a colon, then then image pixels:

5 : 0  0  0  0  0  0 10 15 15  0  0 14  0  0  0  0 15 15  9  0  4  0  0  12 3  0 13  14 9  0  0  0  8  0  0
5 : 0  0  0  0  0  1 12 15 13  0  0 12  0  0  0  0 14 14  8  1  3  0  0  15 6  0 12  11 9  0  0  0  8  2  0
3 : 0  0  0  0  0  0 10 15 15  0  0  0  0 14  0  0  4 15 11  0  0  0  0  12 3  0 13  14 9  0  0  0  8  0  0 
...

The data we will be clustering are images from the well-known MNIST Database which are 28x28 pixel graymap images with a range of 0 for black to 255 for white. The original binary formats have been converted to text formats like those about for somewhat easier interpretability and to processing.

Notes On Implementation

The Python version has 2 "classes" at the top which represent the main datatype associated with the problem. These should become C structs in the C version as standard C does not support classes and they are unnecessary in this case anyway.
All numerical calculations should be done in floating point format using C's 32-bit float type. It is recommended for performance reasons that the integer pixel data be read in as floats so that one can avoid doing unnecessary conversions.
Python automatically allocates data and can extend its lists while C involves manual memory allocation. In this application, after setting up data from the input file, no further memory allocation is required so most of the application logic will be identical between the Python and the C versions.
Mirror the functions and flow used in the Python version closely in your C version. Some Pythonesque function such as split() and append() methods are not available in C. Instead, pre-allocating appropriate amounts of memory and using standard input functions like fscanf() will suffice.
When reading in data from the input files, it may useful to employ the following function which calculates the lines and total tokens in a data file. It is provided in the indicated source file in the problem codepack.
Using this C function allows one to calculate how much memory must be allocated for the data structure that will hold the input file: each line of the input files starts with an integer then a colon then 784 integers (28x28 pixels).

Python Code

The Python code version is provided below but is also in the codepack.

#!/usr/bin/env python3
#
# usage: kmeans.py <datafile> <nclust> [savedir] [maxiter]

import struct
import sys
import os
import math

class KMData:                      # Data set to be clustered
  ndata    = 0                     # count of data
  dim      = 0                     # dimension of features for data
  features = []                    # pointers to individual features
  assigns  = []                    # cluster to which data is assigned
  labels   = []                    # label for data if available
  nlabels  = []                    # max value of labels +1, number 0,1,...,nlabel0

class KMClust:                     # Cluster information
  nclust   = 0                     # number of clusters, the "k" in kmeans
  dim      = 0                     # dimension of features for data
  features = []                    # 2D indexing for individual cluster center features
  counts   = []                    # number of data in each cluster
  

# Load a data set from the named file. Data should be formatted as a
# text file as:
# 
# 7 :  84 185 159 151  60  36   0   0   0   0   0   0
# 2 :   0  77 251 210  25   0   0   0 122 248 253  65
# 1 :   0   0   0   0   0   0   0   0   0  45 244 150
# 0 :   0   0   0   0 110 190 251 251 251 253 169 109
# 4 :   0   0   0   4 195 231   0   0   0   0   0   0
# 1 :   0   0   0   0   0   0   0   0   0  81 254 254
# 4 :   0  20 189 253 147   0   0   0   0   0   0   0
# 9 :   0   0   0   0  91 224 253 253  19   0   0   0
# 5 :   0   0   0   0   0   0  63 253 253 253 253 253
# 9 :   0   0   0   0   0   0   0  36  56 137 201 199
#
# with the lead number being an optional correct label for the data
# and remaining numbers being floating point values that are space
# separated which are the feature vector for each data. The abve
# example does not have any fractional values for features but it
# could.
def kmdata_load(datafile):
  data = KMData()
  with open(datafile,"r") as fin:
    for line in fin:
      data.ndata += 1
      tokens = line.split()
      data.labels.append(int(tokens[0]))
      feats = []
      for f in tokens[2:]:
        feats.append(float(f))
      data.features.append(feats)
  
  data.dim = len(data.features[0])
  data.nlabels = max(data.labels)+1
  return data

# Allocate space for clusters in an object
def kmclust_new(nclust, dim):
  clust = KMClust()
  clust.nclust = nclust
  clust.dim = dim
  for c in range(nclust):
    clust.features.append([0.0] * dim)
    clust.counts.append(0)
  return clust
  

# Save clust centers in the PGM (portable gray map) image format;
# plain text and can be displayed in many image viewers. File names re
# cent_0000.pgm and so on.
def save_pgm_files(clust,savedir):
  dim_root = int(math.sqrt(clust.dim))
  if clust.dim % dim_root == 0:                    # check if this looks like a square image
    print(f"Saving cluster centers to {savedir}/cent_0000.pgm ...")
    maxfeat = max(map(max,clust.features))
    for c in range(clust.nclust):
      outfile=f"{savedir}/cent_{c:04}.pgm"
      with open(outfile,"w") as pgm:               # output the cluster centers as
        print("P2",file=pgm)                       # pgm files, a simple image format which
        print(f"{dim_root} {dim_root}",file=pgm)   # can be viewed in most image
        print(f"{maxfeat:.0f}",file=pgm)           # viewers to show how the cluster center
        for d in range(clust.dim):                 # actually appears; nomacs is a good viwer
          if d > 0 and d%dim_root == 0:
            print(file=pgm)
          print(f"{clust.features[c][d]:3.0f} ",end="",file=pgm)
        print(file=pgm)

#### MAIN FUNCTION ####
def main():
  if len(sys.argv) < 3:
    sys.exit("usage: kmeans.py <datafile> <nclust> [savedir] [maxiter]")
  
  datafile = sys.argv[1]
  nclust = int(sys.argv[2])
  savedir = "."
  MAXITER = 100                        # bounds the iterations
  
  if len(sys.argv) > 3:                # create save directory if specified
    savedir = sys.argv[3]
    os.system(f"mkdir -p {savedir}")
  
  if len(sys.argv) > 4:
    MAXITER = int(sys.argv[4])
  
  print("datafile:",datafile)
  print("nclust:",nclust)
  print("savedir:",savedir)
  

  data = kmdata_load(datafile)         # read in the data file, allocate cluster space
  clust = kmclust_new(nclust, data.dim)

  print("ndata:",data.ndata)
  print("dim:",data.dim)
  print()
  
  for i in range(data.ndata):          # random, regular initial cluster assignment
    c = i % clust.nclust
    data.assigns.append(c)

  for c in range(clust.nclust):
    icount = data.ndata / clust.nclust;
    extra = 0
    if c < (data.ndata % clust.nclust):
      extra = 1                        # extras in earlier clusters
    clust.counts[c] = icount + extra;
  
  
  ################################################################################
  # THE MAIN ALGORITHM
  curiter = 1                          # current iteration
  nchanges = data.ndata                # check for changes in cluster assignment; 0 is converged
  print(f"==CLUSTERING: MAXITER {MAXITER}==")
  print(f"ITER NCHANGE CLUST_COUNTS")
  
  while nchanges > 0 and curiter <= MAXITER: # loop until convergence

    # DETERMINE NEW CLUSTER CENTERS
    for c in range(clust.nclust):      # reset cluster centers to 0.0
      for d in range(clust.dim):
        clust.features[c][d] = 0.0
  
    for i in range(data.ndata):        # sum up data in each cluster
      c = data.assigns[i]
      for d in range(clust.dim):
        clust.features[c][d] += data.features[i][d]
  
    for c in range(clust.nclust):      # divide by ndatas of data to get mean of cluster center
      if clust.counts[c] > 0:
        for d in range(clust.dim):        
          clust.features[c][d] = clust.features[c][d] / clust.counts[c]
        
    # DETERMINE NEW CLUSTER ASSIGNMENTS FOR EACH DATA
    for c in range(clust.nclust):      # reset cluster counts to 0
      clust.counts[c] = 0              # re-init here to support first iteration
  
    nchanges = 0
    for i in range(data.ndata):        # iterate over all data
      best_clust = None
      best_distsq = float("inf")
      for c in range(clust.nclust):    # compare data to each cluster and assign to closest
        distsq = 0.0
        for d in range(clust.dim):     # calculate squared distance to each data dimension
          diff = data.features[i][d] - clust.features[c][d]
          distsq += diff*diff
          # print(f"DBG: curiter {curiter} i {i} c {c} d {d} diff {diff:.6f}")
        if distsq < best_distsq:       # if closer to this cluster than current best
          best_clust = c
          best_distsq = distsq
      clust.counts[best_clust] += 1
      if best_clust != data.assigns[i]:  # assigning data to a different cluster?
        # print(f"DBG: data {i} changing clusters: {data_assign[i]} to {best_clust}")
        nchanges += 1                    # indicate cluster assignment has changed
        data.assigns[i] = best_clust     # assign to new cluster
    
    # Print iteration information at the end of the iter
    print(f"{curiter:3}: {nchanges:5} |", end="") 
    for c in range(nclust):
      print(f" {clust.counts[c]:4}",end="")
    print()
    curiter += 1
  
  # Loop has converged
  if curiter > MAXITER:
    print(f"WARNING: maximum iteration {MAXITER} exceeded, may not have conveged")
  else:
    print(f"CONVERGED: after {curiter} iterations")
  print()
  

  ################################################################################
  # CLEANUP + OUTPUT

  # CONFUSION MATRIX
  confusion = []                         # confusion matrix: labels * clusters big
  for i in range(data.nlabels):
    confusion.append([0] * nclust)
  
  for i in range(data.ndata):            # count which labels in which clusters
    confusion[data.labels[i]][data.assigns[i]] += 1
  
  print("==CONFUSION MATRIX + COUNTS==")
  print("LABEL \ CLUST")
  
  print(f"{'':2} ",end="")               # confusion matrix header
  for j in range(clust.nclust):
    print(f" {j:>4}",end="")
  print(f" {'TOT':>4}")
  
  for i in range(data.nlabels):          # each row of confusion matrix
    print(f"{i:>2}:",end="")
    tot = 0
    for j in range(clust.nclust):
      print(f" {confusion[i][j]:>4}",end="")
      tot += confusion[i][j]
    print(f" {tot:>4}")
  
  print("TOT",end="")                    # final total row of confusion matrix
  tot = 0
  for c in range(clust.nclust):
    print(f" {clust.counts[c]:>4}",end="")
    tot += clust.counts[c]
  print(f" {tot:>4}")
  print()
  
  # LABEL FILE OUTPUT
  outfile = f"{savedir}/labels.txt"    
  print(f"Saving cluster labels to file {outfile}")
  with open(outfile,"w") as fout:
    for i in range(data.ndata):
      print(f"{data.labels[i]:2} {data.assigns[i]:2}",file=fout)
  

  # SAVE PGM FILES CONDITIONALLY
  save_pgm_files(clust,savedir)
### END MAIN ###      
  
main()

Sample Runs of kmeans.py

A demo of the Python version is shown below.

# Demo of running the python kmeans clustering on several provided
# data files. The C version you construct should behave similarly.

>> cd kmeans-code

>> ls
filestats.c  kmeans.py	sample-mnist-data

>> chmod u+x kmeans.py

>> ./kmeans.py
usage: kmeans.py <datafile> <nclust> [savedir] [maxiter]

>> ./kmeans.py sample-mnist-data/digits_all_1e2.txt 10 outdir
datafile: sample-mnist-data/digits_all_1e2.txt
nclust: 10
savedir: outdir
ndata: 100
dim: 784

==CLUSTERING: MAXITER 100==
ITER NCHANGE CLUST_COUNTS
  1:    35 |    9   14    9    7   10    6   15   10    8   12
  2:    15 |    8   14    9    6    9    6   16   12    8   12
  3:     8 |    8   15    9    6   10    4   16   15    8    9
  4:     4 |    9   15    9    7   10    4   14   15    8    9
  5:     2 |    9   14    9    7   10    4   14   16    8    9
  6:     0 |    9   14    9    7   10    4   14   16    8    9
CONVERGED: after 7 iterations

==CONFUSION MATRIX + COUNTS==
LABEL \ CLUST
       0    1    2    3    4    5    6    7    8    9  TOT
 0:    0    1    0    4    0    1    0    0    2    0    8
 1:    0    0    0    0    9    0    0    5    0    0   14
 2:    0    3    1    0    0    1    0    3    0    0    8
 3:    4    0    1    0    1    0    0    0    5    0   11
 4:    0    0    3    1    0    1    8    1    0    0   14
 5:    0    0    1    2    0    1    0    2    1    0    7
 6:    0    9    1    0    0    0    0    0    0    0   10
 7:    5    0    0    0    0    0    3    3    0    4   15
 8:    0    1    0    0    0    0    0    1    0    0    2
 9:    0    0    2    0    0    0    3    1    0    5   11
TOT    9   14    9    7   10    4   14   16    8    9  100

Saving cluster labels to file outdir/labels.txt
Saving cluster centers to outdir/cent_0000.pgm ...
>> ls outdir/
cent_0000.pgm  cent_0002.pgm  cent_0004.pgm  cent_0006.pgm  cent_0008.pgm  labels.txt
cent_0001.pgm  cent_0003.pgm  cent_0005.pgm  cent_0007.pgm  cent_0009.pgm
>> ./kmeans.py sample-mnist-data/digits_1x9_2e3.txt 4 two-nine-dir
datafile: sample-mnist-data/digits_1x9_2e3.txt
nclust: 4
savedir: two-nine-dir
ndata: 2000
dim: 784

==CLUSTERING: MAXITER 100==
ITER NCHANGE CLUST_COUNTS
  1:  1398 |  530  393  548  529
  2:   591 |  734  359  369  538
  3:   180 |  738  397  294  571
  4:   119 |  674  423  322  581
  5:    86 |  612  435  379  574
  6:    68 |  563  444  423  570
  7:    54 |  522  451  462  565
  8:    33 |  501  456  480  563
  9:    17 |  488  458  490  564
 10:     8 |  481  459  496  564
 11:     5 |  476  459  501  564
 12:     5 |  471  459  506  564
 13:     4 |  468  460  508  564
 14:     5 |  464  461  511  564
 15:     5 |  461  462  514  563
 16:     2 |  460  463  514  563
 17:     1 |  459  463  515  563
 18:     1 |  458  463  516  563
 19:     0 |  458  463  516  563
CONVERGED: after 20 iterations

==CONFUSION MATRIX + COUNTS==
LABEL \ CLUST
       0    1    2    3  TOT
 0:    0    0    0    0    0
 1:    1  450    1  548 1000
 2:    0    0    0    0    0
 3:    0    0    0    0    0
 4:    0    0    0    0    0
 5:    0    0    0    0    0
 6:    0    0    0    0    0
 7:    0    0    0    0    0
 8:    0    0    0    0    0
 9:  457   13  515   15 1000
TOT  458  463  516  563 2000

Saving cluster labels to file two-nine-dir/labels.txt
Saving cluster centers to two-nine-dir/cent_0000.pgm ...
>> ls two-nine-dir/
cent_0000.pgm  cent_0001.pgm  cent_0002.pgm  cent_0003.pgm  labels.txt

Timing Results to Provide

Time your both the provided Python version and your C version of the K-means clustering. Provide timing on any Linux/Unix machine using the following commands.

## Time Python and C versions on with these parameters
## dataset:  sample-mnist-data/digits_3x7_2e3.txt 
## numclust: 4

################################################################################
## TIME PYTHON VERSION on digits_3x7_2e3.txt 4 clusters
>> time ./kmeans.py sample-mnist-data/digits_3x7_2e3.txt 4 outdir1
datafile: sample-mnist-data/digits_3x7_2e3.txt
nclust: 4
savedir: outdir1
ndata: 2000
dim: 784

==CLUSTERING: MAXITER 100==
ITER NCHANGE CLUST_COUNTS
  1:  1427 |  450  503  240  807
  2:   521 |  247  535  247  971
...
TOT  465  518  450  567 2000

Saving cluster labels to file outdir/labels.txt
Saving cluster centers to outdir/cent_0000.pbm ...

real	XXXXXX  # include all output including these last timing numbers
user	XXXXXX
sys	XXXXXX

################################################################################
## TIME C VERSION on digits_3x7_2e3.txt 4 clusters
>> time ./kmeans_serial sample-mnist-data/digits_3x7_2e3.txt 4 outdir2
datafile: sample-mnist-data/digits_3x7_2e3.txt
nclust: 4
savedir: outdir2
ndata: 2000
dim: 784

==CLUSTERING: MAXITER 100==
ITER NCHANGE CLUST_COUNTS
  1:  1427 |  450  503  240  807
  2:   521 |  247  535  247  971
...
TOT  465  518  450  567 2000

Saving cluster labels to file outdir2/labels.txt
Saving cluster centers to outdir2/cent_0000.pbm ...

real	XXXXXX  # include all output including these last timing numbers
user	XXXXXX
sys	XXXXXX

C is often used when speed is important for an application so expect a speed bump over the Python version. The size of the bump will vary but should be substantial in most cases.

Submission Notes

Paste the entirety of your C code into the document that you have written your other answers in. If your code is spread across several files, include all of them.
Also paste the timing results of your C version on the datasets indicated compared to the Python results.
Your solution will not be run by graders, only manually inspected to determine if it appears sane.
This serial version will serve as your starting point for A2 in which you must parallelize the code so failure to complete this problem now will put you at a disadvantage for the next Assignment. No serial C version will be provided to students for A2, only the Python version.

8 (5pts) Statement of Effort

Include the following Statement of Effort at the end of your document.

I / We affirm that all parties listed below have contributed to each solution presented in this document. All parties are capable of describing how the solutions were derived, how they apply to the problem, and that they were created in accordance with the course's PRIME DIRECTIVE.

Signed,

YOUR NAME(S)

Including this section will earn full credit. Failure to include this statement of effort may result in additional penalties aside from the loss of points indicated.

9 Submission Instructions

Complete the problems in each section labelled Problem
This is a pair assignment: you may select one partner to work with on this assignment as a group of 2. You may also opt to work alone as a group of 1.
Only 1 member of your group should submit the HW Writeup on Gradescope. If you are working in a partnership, add your partner after submitting as described below.

Make sure the Writeup has all group member's information in it:

  CS 5451 A1
  Group of 2
  Turanga Leela tleela4@umn.edu 
  Philip J Fry pfry99@umn.edu

Submit a PDF (portable document format, .pdf) of your work. Whatever program you write your answers in (Microsoft Word, Apple Words, Google Docs, Latex, etc.) make sure you can export or "save as" a PDF.
Submit your work to Gradescope under Assignment 1
- Log into Gradescope
- Assignment 1
- Select your PDF for upload
- Use the Gradescope system to indicate which of your answers are on which page of your PDF submission - this will make things easier on your grader (you want a happy grader).
- After uploading, add your partner as a group member if you had one.