3 Business understanding
In business understanding, you:
- Define your (business) goal
- Frame the problem (regression, classification,…)
- Choose a performance measure
- Show the data processing components
First of all, we take a look at the big picture and define the objective of our data science project in business terms.
In our example, the goal is to build a model of housing prices in California. In particular, the model should learn from California census data and be able to predict the median house price in any district (population of 600 to 3000 people), given some predictor variables. Hence, we face a supervised learning situation and should use a regression model to predict the numerical outcomes. Furthermore, we use the root mean square error (RMSE) as a performance measure for our regression problem.
Let’s assume that the model’s output (a prediction of a district’s median housing price) will be fed to another analytics system, along with other data. This downstream system will determine whether it is worth investing in a given area or not. The data processing components (also called data pipeline) are shown in 3.1 (you can use Google’s architectural templates to draw the data pipeline).