I would recommend to start with getting general idea on how backprop works for for convolutions. For this I have created Miro board below:

https://miro.com/app/board/uXjVIYa0oMo=/?share_link_id=877962336717 general idea. It is based on this video.

The basic principle comes from the fact that convolution is essentially a sum of input tensor elements multiplied with convolution filter elements.

For example, if

Then all feature map calculations would be like below:

$z_1 = w_{1,1} f_1 + w_{1,2} f_2 + \cdots + w_{1,9} f_9$

$z_2 = w_{2,1} f_1 + w_{2,2} f_2 + \cdots + w_{2,9} f_9$