Name: ME 759 - Assignment 4 - Solved
SKU: 73069
Availability: InStock

Description

5/5 - (1 vote)

High Performance for Engineering Applications

All commands or code must work on Euler with only the cuda module loaded unless specified otherwise. Commands and/or code may behave differently on your computer.

. Consider using a formatter like clang-format .

* Before you begin, copy the provided files from HW04 of the ME759-2020 repo . Do not change any of the provided files because we will write clean copies over them .

(a) Implement in a file called matmul.cu the matmul and matmul kernel functions as declared and described in matmul.cuh.

Write a program cu which does the following:
- Creates matrices (as 1D row major arrays) A and B of size n*n in managed (aka unified) memory.
- Fills those matrices however you like.
- Calls your matmul
- Prints the last element of the resulting matrix.
- Prints the time taken to perform the multiplication in milliseconds using CUDA events.
- Compile: nvcc task1.cu matmul.cu -Xcompiler -O3 -Xcompiler -Wall -Xptxas -O3 -o task1
- Run (where n and threads per block are positive integers): ./task1 n threads per block
- Example expected output:

11.36

1.23

On an Euler compute node, run task1 for each value n = 2⁵,2⁶,·· ,2¹⁵and generate a plot task1.pdf which plots the time taken by your algorithm as a function of n when threads per block = 1024. Overlay another plot which plots the same relationship with a different choice of threads per block.

(a) Implement in a file called stencil.cu stencil and stencil kernel functions as declared and described in stencil.cuh. These functions should produce the 1D convolution of image and mask:

R output[i] = ^Ximage[i + j] ∗ mask[j + R] i = 0,··· ,n − 1

j=−R

Assume that image[i] = 0 when i < 0 or i > n − 1. Pay close attention to what data you are asked to store and compute in shared memory.

Write a program cu which does the following:
- Creates arrays image (length n), output (length n), and mask (length 2 * R + 1) all in managed memory.
- Fills those arrays however you like.
- Calls your stencil
- Prints the last element of the resulting array.
- Prints the time taken to perform the convolution in milliseconds using CUDA events.
- Compile: nvcc task2.cu stencil.cu -Xcompiler -O3 -Xcompiler -Wall -Xptxas -O3 -o task2
- Run (where n, R, and threads per block are positive integers):

./task2 n R threads per block

Example expected output:

11.36

1.23

On an Euler compute node, run task2 for each value n = 2¹⁰,2¹¹,·· ,2³¹and generate a plot task2.pdf which plots the time taken by your algorithm as a function of n when threads per block = 1024 and R = 128. Overlay another plot which plots the same relationship with a different choice of threads per block.

[SOLVED] ME 759 - Assignment 4

If Helpful Share:

Description

Related products

ME 759 -Assignment 8

ME759 – Assignment 1 –

ME 759 – Assignment 7 –

Related in this category

More in this category

ME 759 -Assignment 8

ME759 – Assignment 1 –

ME 759 – Assignment 7 –