A low complexity parallel filter architecture for image processing applications is presented. The proposed approach makes use of the transposition and custom poly-phase sub-filters design in ‘Fast FIR’ filter implementation technique to exploit the correlation among input samples by processing the intermediate signals depicting low entropy through low complexity multipliers. Despite lower complexity, the proposed approximate design performs very close to the corresponding full precision implementation as evident from the provided experimental results. The experimental evidence shows that other related efforts reported in the literature are extremely sensitive to the approximation and yields unusable results due to the propagation of error without bounds. The proposed approximate design, on the other hand, limits the inclusion of error to as low as 4% of the data samples with high Signal-to-Noise Ratio (SNR) values on average. This has been achieved through design of custom polyphase sub-filters without recursive stage and proved to be feasible irrespective of the number of kernel taps. The design efficiently maps on a Xilinx Field Programmable Gate Arrays (FPGA) and conserves chip resources by at least 6% for kernels with asymmetric fixed coefficients and 29% for symmetric ones.