IOTSim: Modeling and Simulation in the IoT and Big Data Era
Recent studies have shown that we generate 2.5 quintillion (2.5.1018) bytes of data per day (Ciscoand IBM) and this is set to explode to 40 yotta (40.1024) bytes by 2020 – this is 5,200 gigabytes for every person on earth. Much of these data is and will be generated from the Internet of Things (IoT). IoT comprises billions of Internet Connected Devices (ICDs)or ‘things’ where each thing can sense, communicate, compute and potentially control/influence/reconfigure their surroundings.. ICDs can be sensors, RFIDs, social media, clickstreams, remote sensing satellites, business transactions, actuators (such as machines/equipment fitted with sensors and deployed for mining, oil exploration, or manufacturing operations), lab instruments (e.g., high energy physics synchrotron), and smart consumer appliances (TV, phone, etc.). The vision of IoT is to allow ‘things’ to be connected anytime, anywhere, with anything and anyone, ideally using any path, any network and any service. This vision has recently given rise to the notion of IoT big data applications that are capable of analysing billions of data streams and tens of years of historical data to provide the knowledge required to support timely decision making. These IoT big data applications need to process and manage streaming and multidimensional data from geographically distributed data sources that generate datain a range of formats, and with different levels of reliability.
A hard challenge in designing and developing IoT big data application is how to manage performance, dependability, energy and cost trade-offs by optimising configuration at both IoT device layer (e.g., sensors, social media, etc.), cloud-based hardware resource layers (e.g., CPU, Storage, and Network) and big data processing framework software layers (e.g., Apache Hadoop, Apache Storm, Apache Mahout). This needs toaccommodate application requirements (e.g. analytics result delay, cost, throughput ) while addressing complexities such as resource contention, heterogeneous data flow, uncertain resources needs, and lack of fault-tolerance.
There is therefore a requirement to develop an approach that can help engineers and researchers to analyse the impact of the above complexities as well as IoT device, software and hardware configuration interdependencies upon the final behaviour achievable by an IoT application. It can be challenging to conduct such a study in a real computing environment due to following reasons:
- It is not cost-effective to procure or rent a large scale datacentre resource pool that will accurately reflect realistic application deployment and let practitioners experiment with dynamic hardware and software resource configurations.
- Frequently changing experiment configurations in a large-scale real test bed involves lot of manual configuration, making the performance analysis itself time-consuming and expensive. As a result, the reproduction of results becomes extremely difficult, making most of the experiments non-repeatable.
- It is extremely hard to incorporate and control different types of failure behaviours and benchmarks across heterogeneous software and hardware resource – types in a real test bed (e.g. Amazon AWS, Open Cirrus and Microsoft Azure) environment.
In contrast, simulation-based approaches to performance testing and benchmarking would have significant advantages including: (i) multiple IoT big data application developers and researchers can perform tests in a controllable and repeatable manner; (ii) finding performance bottlenecks in a simulated environment is relatively easier than in real-world test beds; (iii) experimenting with various IoT device, hardware resource and big data processing framework configurations and collecting insights about the impact of each design choice on the performance guarantees (service level agreements) is simplified; (iv) developers and researchers can share their simulation datasets and environment setups, leading to better validation of hypothesis and reproducibility of results; and (v) instantiating multiple big data processing frameworks and diverse workload scenarios is possible.
At Newcastle University, United Kingdom in collaboration with Australian National University, Australia and University of Tasmania, Australia we are developing IOTSim to achieve this. It will support the following novel features.
- modelling of heterogenous IoT device types, gateways, and network devices;
- modelling of heterogneous IoT device virtualization and resource management technologies;
- modeling of heterogeneous data programming abstractions such as Map/Reduce in Hadoop, Continuous Query operators in Storm, transactional operators in MySQL and Cassandra, etc.;
- modeling of heterogeneous data flows (e.g. static, streams, and transactions), workload processing (batch processing in Hadoop, continuous stream processing in Storm, and transaction processing in MySQL and Cassandra), and hardware resource configurations
- ‘Evaluation Templates’ that incorporate details on application-level performance constraints, fault-injection models, big data processing benchmarks and configurations in relevance to specific application types (e.g., credit card fraud detection, emergency management, etc.);
- Failure injection models at all the layers including IoT device, software, and hardware layers.
This is an open-source project. We will make simulator tool and its code freely available for others to use, and would be interested in hearing from those who wish to join the project.
- Rajiv Ranjan
- Paul Watson
- Xuezhi Zhang (Australian National University)
- Saurabh Kumar Garg (University of Tasmania, Australia)
- Chang Liu
- Meisong Wang
- Devki Nandan Jha
- Nigel Thomas
- Matt Forshaw
- Prem Prakash Jayaraman (RMIT, Australia)
- Dimitrios Georgakopulos (RMIT, Australia)
- X. Zeng, S. K. Garg, P. Strazdins, P. Jayaraman, D. Georgakopoulos and R. Ranjan, “IOTSim: a Simulator for Analysing IoT Applications,” Journal of System Architecture, Elsevier. [ISI Impact Factor: 0.44] (Accepted July 2016)
- R. Ranjan, “Modeling and Simulation in Performance Optimization of Big Data Processing Frameworks,” IEEE Cloud Computing, Volume 1, Issue 4, BlueSkies Column, IEEE Computer Society.
- X. Zeng, R. Ranjan, P. Strazdins, S. Garg, and L. Wang, “Cross SLA Management for Cloud-hosted Big Data Analytics applications“, The 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, May 2015, IEEE Computer Society