I would recommend to start with getting general idea on how backprop works for for convolutions. For this I have created Miro board below:
https://miro.com/app/board/uXjVIYa0oMo=/?share_link_id=877962336717 general idea. It is based on this video.
The basic principle comes from the fact that convolution is essentially a sum of input tensor elements multiplied with convolution filter elements.
For example, if
Then all feature map calculations would be like below:
$z_1 = w_{1,1} f_1 + w_{1,2} f_2 + \cdots + w_{1,9} f_9$
$z_2 = w_{2,1} f_1 + w_{2,2} f_2 + \cdots + w_{2,9} f_9$
…