# Mean and variance statistic for image processing on FPGA 

Sunny Arief Sudiro ${ }^{\mathbf{1} \text { * }}$, Aqwam Rosadi Kardian ${ }^{1}$, Sarifuddin Madenda ${ }^{2}$, Lingga Hermanto ${ }^{2}$<br>${ }^{1}$ STMIK Jakarta STI\&K Jl. BRI No. 17 Radio Dalam Kebayoran Baru Jakarta Indonesia<br>${ }^{2}$ Gunadarma University Jl. Margonda Raya No. 100 Depok Jawa Barat Indonesia


#### Abstract

Statistical formula processing an image data is commonly used in image processing. In software processing this formula and accessing data stored in memory is an easy task, but in hardware implementation, it is more difficult task due to many of constraints. This article presents hardware implementation of mean \& variance statistic formula in effective and efficient way using FGPA Device. The design of circuit for both formulas proposed in this article need only two additions component (in two accumulators) and two shift-right-registers will be used for divisor circuits, one subtractor and one multiplier. In the experiment, processing an image size $8 \times 8$ pixels need 64 clocks cycle to conclude the mean \& variance calculations. More than 1024 additions component is needed in some design so this design is more efficient.


Keywords: Mean, Variance, FPGA, Accumulator, Counter.

## 1. INTRODUCTION

A computer process for manipulating and analysing image is known as digital image processing. Basic statistic formulas that used in image processing using computer application are: histogram, variance, mean, etc. These all processes have high correlation with functions in pattern recognition based on features to recognize the object.

Many research for the implementation of function involving large data and statistical calculation in FPGA device. Implementation variance value in image fusion using FPGA device, computation speed is one of the reason among others. This approach based on FPGA technology provides fast, compact and low power consumption for image fusion (Gade and Khope, 2016).

The computation time or speed of process is one of the important things in processing large of data. Hardware implementation is one of solution, acceleration using FPGA presented in Iturk et al. (2008) for processing Markowitz' mean variance framework. This approach gives a 221 x speed ratio comparing to software implementation. Another research in FPGA acceleration is proposed in Betkaoui et al. (2011), this approach focus on re-configurable architecture or component in FPGA and partitioning data for processing large graph data, and the result is reducing execution time from 120 minutes to 12 minutes or 10 times faster for common bio-informatics algorithm.

Research for mean calculation only using FPGA device is proposed by Kardian et al. (2016). In this article presents a method for mean calculation needs 1 addition operations, 1 division, 64 cycles for 8 x 8 image size. This work is actually the extended work based on previous method describe in Kardian et al. (2016).

An interesting work is presented in Ismael and Mahmood (2017), this research proposed an implementation 6 functions of statistical operation based on FPGA, which are: mean, variance, standard deviation, Root Mean Square (RMS), covariance and Mean Square Error (MSE). This approach used 1090 number of slices, 220 numbers of

## International Journal of Applied Science and Engineering

Sudiro et al., International Journal of Applied Science and Engineering, 18(1), 2020115

Slice Flip-Flops and 1988 number of LookUp Tables (LUTs) for all 6 function of calculation. For mean calculation the result has an error $1.062 \%$ and for variance calculation is 0.193 \%.

Another method in hardware implementation for mean \& variance calculation in an iterative manner is described in Bailey and Laiber (2013). This method has been implemented efficiently in hardware based architecture. This method distributes the divisions over several iteration steps not combinational divisions. This approach enables calculation operation of running statistics formula using simple elements in hardware. For 0.3 MP using HW shifter, this design used 131 of registers and 597 of LUT's.

## 2. MEAN AND VARIANCE STATISTIC FORMULA IN IMAGE PROCESSING

Diagram that drawn based on frequency of the appearance for every intensity value from the whole image pixel element known as histogram diagram (Martinez and Martinez, 2002; Woods et al., 2005). The first-order of statistical analysis calculation known as mean value and the 2 nd-order is variance value. These values usually used for image feature extraction and segmentation processes. Mean value represent intensity of the image from the entire
pixel, while contrast of the image is variance value (Kardian et al., 2016). Equation (1) is formula for 'mean' $(\mu)$, and Equation (2) $\sigma^{2}$ is the normalized 2nd order statistical formula, based on these feature can be used to show level of contrast from the image (Martinez and Martinez, 2002).

$$
\begin{align*}
& \mu=\sum_{i=0}^{L} i . p(i), \text { where } p(i)=\frac{H(i)}{N M}  \tag{1}\\
& \sigma^{2}=\sum_{i=0}^{255}(i-\mu)^{2} p(i)=\frac{1}{N \times M} \sum_{i=0}^{255}(i- \\
& \mu)^{2} H(i), i=f(n, m) \tag{2}
\end{align*}
$$

## 3. PROPOSED ALGORITHM

Based on Equations (1) and (2), for mean and variance value calculation, we can optimize these equations to Equations (3) and (4), and then propose the implementation using FPGA.

$$
\begin{align*}
& \mu=\sum_{i=0}^{255} i_{P} .(i)=\frac{1}{N x M} \sum_{n=0}^{N} \sum_{m=0}^{M} f(n, m)  \tag{3}\\
& \sigma^{2}=\left(\frac{1}{N x M} \sum_{n=0}^{255} \sum_{m=0}^{255} i^{2}\right)-\mu^{2} \text {, if } i=f(n, m) \tag{4}
\end{align*}
$$

Pseudo code 1. Mean formula (1 $1^{\text {st }}$ version in Equations 1 and 2)
(1) $\mathrm{f}=\%$ data $\%$
(2) $\mathrm{h}=\operatorname{zeros}(256,1)$;
(3) $[\mathrm{N}, \mathrm{M}]=\operatorname{size}(\mathrm{f})$;
(4) for $\mathrm{i}=1: \mathrm{N} \quad \%$ Histogram $\mathrm{H}(\mathrm{i})$ calculation\%
(5) $\quad$ for $\mathrm{j}=1: \mathrm{M}$
(6) $\quad \mathrm{h}(\mathrm{f}(\mathrm{i}, \mathrm{j})+1)=\mathrm{h}(\mathrm{f}(\mathrm{i}, \mathrm{j})+1)+1$;
(7) end
(8) end
(9) $\mathrm{N} 1=\left(\mathrm{N}^{*} \mathrm{M}\right)$;
(10) for $\mathrm{i}=1: 256 \quad \%$ Probability p (i) calculation $\%$
(11) $\mathrm{p}(\mathrm{i})=\mathrm{h}(\mathrm{i}) / \mathrm{N} 1$;
(12) end
(13) Mean $=0$;
(14) for $\mathrm{i}=1: 256 \quad$ \% Mean $\mu$ calculation\%
(15) $\quad$ Mean $=$ Mean $+\mathrm{p}(\mathrm{i}) *(\mathrm{i}-1)$;
(16) end
(17) Mean

```
Pseudo code 2. Mean and variance formula (proposed version in Equations 3 and 4)
1) \([\mathrm{M}, \mathrm{N}, \mathrm{L}]=\operatorname{size}(\mathrm{f}) ; \quad \% \mathrm{f}\) is the data
(2) htsum \(=0\);
(3) for \(\mathrm{i}=1: \mathrm{M} ; \quad\) \% Mean \(\mu\) calculation\%
(4) for \(\mathrm{j}=1: \mathrm{N}\);
(5) htsum \(=\) htsum \(+\mathrm{f}(\mathrm{i}, \mathrm{j})\);
(6) \(\quad\) vsum \(=\) vsum \(+(f(i, j))^{\wedge} 2\);
(7) end;
(8) end;
(9) optimalmean \(=\) htsum \(/(M * N)\).
(10) varians \(=\operatorname{vsum} /\left(M^{*} N\right)\)-optimalmean^2 \(\%\) variance \(\sigma^{2}\) calculation
```

Sudiro et al., International Journal of Applied Science and Engineering, 18(1), 2020115


Fig. 2. The component block diagram (a) First design (Kardian et al., 2016) and (b) Second/proposed design

Pseudo code-1 is the pseudo code for variance calculation based on Equation (1) and (2). The three steps of calculations are: histogram $\mathrm{H}(\mathrm{i})$, probability $\mathrm{p}(\mathrm{i})$ and the last one is mean value $=\mu$. Pseudo code-2 (based on Equation (3) \& (4)) is the algorithm of the variance formula (variance formula $\left(\sigma^{2}\right)$ ), see Equation (2). It is only need one step in this algorithm. The algorithm based on optimize formula reduces steps of calculation (and reduce time of operation, $2 \times 256$ clock cycles) and number of arithmetic operations e.g. addition, multiplication and division.

Fig. 1 shows mean $\&$ variance calculation result using basic algorithms (mean $=14.5156$ variance $=235.7185)$ and optimized algorithms (optimalmeanrounded $=14$, variansrounded $=250$ ) for the similar data $(\mathrm{Im})$, and obtain the same result. The result from the Matlab evaluation is used for comparison with values from FPGA components processes that concern only in fix value not the floating point value (including "optimalmeanrounded" and "variansrounded" values).


Fig. 1. Result in Matlab based on both value

## 4. HARDWARE IMPLEMENTATION

Fig. 2(a) is the design using pseudo code-1, this design use a selector for selecting the intensity value of pixel from the image. This intensity values are 0 to 255 and then send to their each accumulators (there are 256 accumulators with 256 addition components). All these accumulators value will be used as an input to 256 addition ( 256 additions are needed) so 512 additions are needed in this method (Kardian et al., 2016). And finally to get mean value, we divided with the number of pixels in this example is $8 \times 8=64 . S$ right register 6 bits is used to conclude the division operation. Fig. 2(b) shows the design of component based on optimized formula, and at the end reduce the addition and multiplication process. Dn is the input data stream coming form image that want to be processed.

Fig. 3 is entity diagram of the component and Fig. 4 is the schematic diagram of the component. This proposed design is the corrective design based on diagram in Fig. 2(b). For mean and variance value calculation, we don't have to calculate histogram, but in the same operation using one accumulator to keep the total value of histogram (htsum), see the pseudo code-2.


Fig. 3. The entity of component

Sudiro et al., International Journal of Applied Science and Engineering, 18(1), 2020115


Fig. 4. Component RTL schematic diagram


Fig. 5. Behavioural simulation results

We use one other accumulator for keeping 2 nd order value of the pixel element ( $i^{\wedge} 2$ ). This method makes the mathematical operation reduction which imply in reducing time of operations and components. Components in the design are: two additions in two accumulators, two shift right register, multiplier and subtractor. This proposed design has the same processing time ( $\mathrm{N} \times \mathrm{M}=64$ clock cycles) for 8 x 8 image size and the result of mean and variance value are similar. Counter and buffer are used to keep the values and after 64 clocks send the final value of mean and variance.

From Fig. 4. we can see the schematic digaram of component, there are one multiplier for ( $\mathrm{i}^{\wedge} 2$ ) and than send to upper accumulator (second order value). In the lower accumulator is to hold (htsum) value. Counter and comparator is used to determined wether the operation is completed after 64 cycles (for image size $8 * 8$ ) and send the correct value of mean and variance. We use data mapping for Shit Register in Fig. 2, when division is needed (see two square block on the right of Fig. 4). For example in VHDL code this formula is shown in Equation (5).

$$
\begin{equation*}
A<=B(27 \text { downto } 6) ; \quad \% A=B / 64 \tag{5}
\end{equation*}
$$

Finally this design only needs : one DFF, two multiplier, one counter, one comparator, two adder in two accumulator, one subtractor and two buffer with data mapping.

## 5. SIMULATION RESULT

10 images with $8 * 8$ size and grayscale of pixel value ( 0 255) are used in simulation. Fig. 5 show the behavioural simulation in ISE software simulation result. From this figure we can see the result for calculating the same data in Matlab operation, the result are 14 (mean) and 250 (variance). All processes are done when 64 clock cycles are completed. The processes are simultaneously with the coming of data input (data in). Comparing to the Matlab evaluation result, mean value is 14.5 , it means there is difference value which is 0.5 or an error of $((0.5 / 14) * 100 \%$ $=3.575 \%$ ). For variance value the Matlab result is 235.72 with the difference is 14.28 or there is an error of $6.05 \%$ (14.28/235.72*100\%).

Sudiro et al., International Journal of Applied Science and Engineering, 18(1), 2020115
Table 1. Experiment Result for 10 data matrix

| No | Matlab Result |  | FPGA Result |  | Error |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  | Mean | Variance | Mean | Variance | Mean | Variance |
| 1. | 161.1563 | 1245.7000 | 161 | 1245 | $0.10 \%$ | $0.06 \%$ |
| 2. | 58.4219 | 79.7126 | 58 | 79 | $0.72 \%$ | $0.89 \%$ |
| 3. | 105.4844 | 994.7490 | 105 | 994 | $0.46 \%$ | $0.08 \%$ |
| 4. | 136.2969 | 156.9900 | 136 | 156 | $0.22 \%$ | $0.63 \%$ |
| 5. | 138.4844 | 49.2881 | 138 | 49 | $0.35 \%$ | $0.58 \%$ |
| 6. | 67.1563 | 394.1000 | 67 | 394 | $0.23 \%$ | $0.15 \%$ |
| 7. | 35.6250 | 203.0449 | 35 | 203 | $1.75 \%$ | $0.02 \%$ |
| 8. | 76.7344 | 142.6700 | 76 | 142 | $0.96 \%$ | $0.47 \%$ |
| 9. | 22.4688 | 107.7178 | 22 | 107 | $2.09 \%$ | $0.67 \%$ |
| 10. | 203.3594 | 18.1365 | 203 | 18 | $0.18 \%$ | $0.75 \%$ |
|  |  |  | Average Error Rate | $0.71 \%$ | $0.43 \%$ |  |

Table 2. Comparing to other research

|  | Properties | Previouse <br> Approach <br> (Kardian et <br> al., 2016) | Previouse <br> Approach <br> (Ismael and <br> Mahmood, <br> $2017)$ | Previouse <br> Approach <br> (Bailey and <br> Laiber, 2013) | Proposed <br> Approach |
| :---: | :---: | :---: | :---: | :---: | :---: |
| 1. | Number of Slice | 16 | 1090 | NA | 30 |
| 2. | Number of Slice Flip- | 24 | 220 | NA | 38 |
| 3. | Flops | Number of LUTs | 31 | 1988 | 597 |



Fig. 6. Logic utilization summary

This error value is calculated based on Equation (6) and in Table 1 all error value from 10 data of experiment is presented.
ErrorValue $=\frac{\mid \text { FPGAValue }- \text { Matlabresult } \mid}{\text { Matlabresult }} \times 100 \quad \%$
Fig. 6 shows the components design summary and the accupation of logic element information in FPGA device. Experiment result for other 10 data matrix (comming from 10 blox or crop images) can be seen in Table 1, we can see that the difference is very small less than 0.99 .

Based on Table 1. we calculate mean square error (MSE) for mean and variance value using formula in Equation (7). We obtain MSE for mean value is 0.175 and variance value is 0.3347 .

$$
\begin{equation*}
M S E=\frac{1}{n} \sum_{i=1}^{n}\left(Y^{\prime}-Y\right)^{2} \tag{7}
\end{equation*}
$$

Sudiro et al., International Journal of Applied Science and Engineering, 18(1), 2020115

$$
\begin{aligned}
& \text { MSE }_{\text {Mean }}=\frac{1}{10}\left((161.1563-161)^{2}+(58.4219-58)^{2}\right. \\
& +(105.4844-105)^{2}+(136.2969-136)^{2} \\
& +(138.4844-138)^{2}+(67.7344-67)^{2} \\
& +(35.6250-35)^{2}+(76.7344-76)^{2} \\
& \left.+(22.4688-22)^{2}+(203.3594-203)^{2}\right) \\
& \text { MSE }_{\text {Mean }}=0.175 \\
& \text { MSE Variance }^{(10} \frac{1}{10}\left((1245.70-1245)^{2}+(79.7126-79)^{2}\right. \\
& +(994.7490-994)^{2}+(156.9-156)^{2} \\
& +(49.2881-49)^{2}+(394.100-394)^{2} \\
& +(203.0449-203)^{2}+(142.67-142)^{2} \\
& \left.+(107.7178-107)^{2}+(18.1365-18)^{2}\right) \\
& \text { MSE Variance }=0.3347
\end{aligned}
$$

Comparing to previous result (Kardian et al., 2016), our approach needs little bit more FPGA resources but for two function (mean and variance calculation, not only "mean" calculation), see Table 2. Comparing with result from Bailey and Laiber (2013) this approach needs smaller resources, NA here mean that there are no information regarding the use of FPGA Slice and FPGA Slice Flip Flops in this article. Approach from Ismael and Mahmood (2017) is for 6 function, if we calculate average resource used in this approach we obtain 1090/6 $=181.6$ slice, $220 / 6=36.6$ FlipFlops and $1988 / 6=331.3$ of LUT for each function, our result still need smaller resources.

## 6. CONCLUSION

Hardware base component using FPGA device for efficient mean and variance calculation has been obtained. This approach needs two additions and two shift right registers. For $8 \times 8$ image size 64 clock cycles is needed to calculate the value of mean and variance. Overall components need only 53 of 4 input LUTs and 38 flip-flops slices. The result difference (error) of variance value comparing to Matlab result for 10 data of experiment is 0.43 \% because of floating point constraint in FPGA.

## ACKNOWLEDGMENT

This research is as part of research dissertation and supported by Yayasan Pendidikan Gunadarma and Yayasan Ilmu Komputer Jakarta.

## REFERENCES

Bailey, D.G., Laiber, K.M.J. 2013. Efficient hardware calculation of running statistics. 28th International

Conference on Image and Vision Computing New Zealand (IVCNZ 2013), 1-6.
Betkaoui, B., Thomas, D.B., Luk, W., Przulj, N. 2011. A framework for FPGA acceleration of large graph problems: Graphlet counting case study. International Conference on Field-Programmable Technology, 1-8.
Gade, P.B., Khope, S.R. 2016. FPGA based multifocus image fusion using variance Method. Irjet International Research Journal of Engineering and Technology (IRJET), 3 .
Irturk, A., Benson, B., Laptev, N., Kastner, R. 2008. FPGA acceleration of mean variance framework for optimal asset allocation. Workshop on High Performance Computational Finance at SC08 International Conference for High Performance Computing, Networking, Storage and Analysis, 1-8.
Ismael, S.F., Mahmood, B.S. 2017. A novel way to design and implement statistical operations based on FPGA. International Journal of Computer Applications (09758887), 167.

Kardian, A.R., Sudiro, S.A., Madenda, S. 2016. Efficient implementation of mean formula for image processing using FPGA device. 1st International Conference on Information Technology, Information Systems and Electrical Engineering (ICITISEE), Yogyakarta, Indonesia, ISBN: 978-602-60280-0-6.
Martinez, W.L., Martinez, A.R. 2002. Computational statistics handbook with MATLAB. Chapman \& Hall/CRC, USA.
Woods, R.E., Eddins, S.L., Gonzales, R.C. 2005. Digital image processing using MATLAB. Pearson Education.

