5.2.3 Data processing
Course subject(s)
Module 5. Research in city logistics
Data is not really useful until it has been converted into meaningful information. It is helpful to think of data as simple facts. These facts can be accurate (e.g. the truck weighs 2,400 kilograms), imprecise (e.g. the truck is heavy) or even inaccurate (e.g. the truck is 20 kilograms). This depends on how the facts were collected or derived. And in the real-world, we often have to deal with imprecise or even wrong data. A lot of effort is spent refining and improving methods of collecting data. Let’s discuss what we need to do with it to arrive at a good (enough) set of facts.
Post processing
The data we collect, whether through surveys or electronic sensors, might require some manipulation to be converted into information we are interested in. For example, loop detectors built into the road creates an electronic signal, whenever a vehicle crosses over it. This electronic signal needs to be cleaned and converted into the type of information we want, namely vehicle occupancy – which is the state of the vehicle being over the loop detector. With more sophistication, the signals can be distinguished according to vehicle types. With the post-processing step, we know the number and type vehicles passing the loop detector in time.
Data fusion
In many cases, data needs to be combined with other types of data to produce the types of information we need. For example, license plate capture could be combined with a vehicle registration database to determine make and model of the vehicle. License plate information by itself yields very little interesting information. If we combine vehicle data with traffic counts using number plates we can assess which type of vehicle uses which types of roads. If we link this to land use data, we might find out why these trips are there. An important challenge is that all these data originate from different administrative and technical systems. Combining the data requires a lot of processing work to make the formats and categorizations fit.
Data validation
Data validation needs to be part of the process of collecting and processing data, in order to understand and correct the reliability, precision and accuracy of the data used for further analysis. Data is validated according to its own internal rules (i.e. adherence to format rules, plausibility checks, dealing with outliers) or through external means (i.e. by comparing with validated data sets, or at a later stage comparing the intermediate or final outputs of subsequent analyses with externally validated output data). When working with data processing, data fusion or analytics (see further down), one must have a clear view on which data is regarded as reliable, or “ground truth” data.
Data completion
Incomplete data-sets are frequent, mostly due to sampling and surveying limitations. These can still be used in research, if missing data is estimated. Estimation approaches can be statistical (usually based on entropy maximization or equivalent methods) and/or based on a structural model of flows (for example the gravity model or a trip generation model). A well known problem is the estimation of origin/destination matrices (also called: O/D table synthesis), where counts of passing vehicles are used in conjunction with route choice models to statistically estimate all flows inside an area.
Visualisation
The visualisation of data is often an important aspect in understanding the data collected. This is often represented in graphs (typically bars or lines), or if containing geospatial data, in maps. A good visualization allows us to identify expected or unexpected trends, visually identify outliers or errors in the data. Maps are especially important to assess links to land use, relating to access to activities, liveability of residential areas, congestion problems and environmental effects.
Analytics
Data analytics relates to the discovery and use of patterns in data, nowadays often in the context of streaming data, i.e. a continuous measurement. Analytics can relate to (1) description of patterns, (2) prediction of patterns and (3) prescriptive purposes, i.e. using patterns in the data to control a system to exhibit desired behaviour. Analytics usually exploit simple statistical relations (or correlations) which can be based on a structural mathematical model of the system, inspired by theory (e.g. a model of mode choice behaviour). Sometimes pattens are sought using artifical intelligence techniques, using, for example, neural networks to memorize patterns in data. Such models are called data-driven.
Sustainable Urban Freight Transport: a Global Perspecitive by TU Delft OpenCourseWare is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Based on a work at https://ocw.tudelft.nl/courses/sustainable-urban-freight-transport-global-perspective/.