Someone emailed me asking about the results of flood data and the prediction of the disaster event after reading my published proceeding. He was asking me either that prediction formula can be utilized for his work in calculating the flood event in his country particularly around some specific location in the Africa continent. I was skeptical to answer not knowing of his nature of work and academic background. However, I did found out that he was doing some research for his advanced degree and he has a qualification in some sort of Satellite Engineering. Interesting!

I have advised him to do the following things:

**Collect data**of hydrology/ water/ flood/ drainage**from the agency or authority**that manages the flood disaster management. (Or related samplings to the scope of your study).**Collect data from global sources**such as NASA rainfall, cloud movements, and other equivalent international data portals.**Collect data from any other department or agency**that may have direct or indirect roles in flood events such as meteorology, geospatial mapping, welfare, etc.

Because he was also interested in the scope of study of Big Data, so I have told him that by obtaining those data, it was indeed had contributed to the abundance of his data sets.

Days later, he replied to my email telling me that he had got the data as what I have suggested. Then he repeated his question, *“Can I use the prediction formula like you have suggested in the paper this time?”*

Nope. He still didn’t get the gist of big data and event prediction method that I have suggested in my paper.

The truth is, when you got your data sets and you wanted to make a prediction on the probability of the event to occur, it is advisable for you to **follow the statistical hypothesis** and the steps of deducing or inducing your statement(s) before coming to conclusion on either an event is feasible to return and recur. The prediction should be per case matter.

By the generic steps, I mean you should undergo **the process of data collection** and the method of finding the conclusion out of the available data by means of proving on either your objectives are working or not. This is as illustrated in the above diagram: *Big data and event prediction via hypothesis testing*. This is one of the many approaches that you can use when you are dealing with your big data, you got your problems and you have the objectives of the project that you are aiming. But you were unsure of the direction that you should navigate from this point.

From the above diagram, it is clearly shown that you cannot avoid the tedious steps of research methods which include the following listing:

- Identify your research objectives,
- State your problem statements,
- Identify your hypotheses statements, which eventually will lead you to identify your research problem and decide on the structure of your research design,
- Draw your scope of research,
- Do data collection, which eventually you need to process on the behavior and properties of the data by using the approach of statistical analysis and hypothesis testing. At the end of this process, if the data is tested normal, you could formulate the prediction formula for the event to occur, and lastly
- Make the conclusion out of the above-mentioned steps.

When identifying the hypotheses statements, it is wise if you could: (1) write the list of contributing factors of your thematic problem, (2) the major problem to study and (3) the consequences that might repeatedly incur if the problem is not solved.

Simultaneously, during the process of statistical analysis and hypothesis testing, it is of advisable if you could **test its normality** so that the data and the problem that you are dealing with are feasible and that you **could conclude on either to accept or to reject the null hypothesis** that you have drawn. Consequently, whichever the hypothesis that you have accepted, the data should also be **feasible to be tested upon its correlation and regression**. This is to determine that the strengths and weaknesses of the relationship for your chosen parameters and domain are within the scope of study and formulating the prediction equation would be much significant.

The approach is in fact works for both the social sciences and physical engineering. Though you might come up with different kind of approaches of manipulating the big data to formulate the event prediction, be it by using the machine learning, artificial intelligent, or any other semantic of sciences, nothing beat to the preciseness of testing the normality of the event by distribution curve and its statistical properties and behavior.

**References:**

Navidi, William Cyrus. *Statistics for engineers and scientists*. Vol. 2. New York: McGraw-Hill, 2006.

Sekaran, Uma, and Roger Bougie. *Research methods for business: A skill building approach*. John Wiley & Sons, 2016.