My layman's interpretation of CNN -This was the sentence that I was wondering about.
When designing a neural network architecture for execution at Akida, one should take into account a number of additional limitations concerning the layer parameters (e.g., maximum convolution size is 7, while stride 2 is supported for convolution size 3 only) and their sequence.
In convolution the system takes manageable size bites of the pixel data for processing.
The processor examines small segments of the whole pixel array to see if there is a pattern which it recognizes. It does this by looking at each pixel in association with their neighbouring pixels.
Think of an array of pixels in a camera photoreceptor, say 100 columns x 100 rows (10000 pixels) - starting at the top left corner, draw a box enclosing, say 7x7 pixels. That is what the convolution size means.
Now move the box 1 pixel to the right - that is a stride of 1.
This results in the 7 pixels in the first column being excluded from the next processing step and a new set of 7 pixels to the right being included.
So a stride of 2 moves the box 2 columns to the right, dropping 2 columns of 7 on the left and adding 2 columns of 7 to the right.
When the box has scanned the whole first 7 rows, it can drop down 2 rows and repeat the process.