Mainstream headlines have already established that big data and the companies that wield it like puppets on strings are evil, and the governments that allow this are suspect. But, why and also just as importantly, how? What actually is data and why is it big? What happens when data is transmitted, and where does it go? What is the “cloud”?
Before we continue on our exploration of smart cities, we pause for an explainer session with web developer Thomas Gorissen that breaks down this abstract notion into tiny, digestible packets.
Christina J. Chua: Why is it that smart cities have only become possible in recent times?
Thomas Gorissen: I think networking and Internet Protocols are a large part of that. They made the cost to transfer data incredibly cheap. The economics in the widespread adoption of the Internet has really brought the prices down, so it makes it very affordable to set up digital systems for all kinds of data gathering and analytics functions that were previously just way too expensive. In that same line, the power cost of computing has been gotten very, very low so you can actually do very heavy tasks like big data analysis and do things like machine learning and other memory-intensive tasks that 10 years ago, you would need a super-computer for.
Chua: Wait, what’s “big data”?
Gorissen: A lot of data.
Chua: What exactly do you mean? How do you actually quantify big data as big?
Gorissen: Okay, big data is as much a hype word as AI, but I think what it wants to describe is leveraging a large amount of similar data points to find insights, learnings and ways to enhance services and products, or optimise logistics or other functions of a business or government. For big data, it’s hard to quantify where it starts.
Chua: So it's just a lot of different data points and then a computer will crunch all that together and spit out some analysis.
Gorissen: Yes, in a way. But it also involves a lot of people to set up these systems. It requires a lot of knowledge about stochastic and statistical analysis to be able to shape software and systems that can take a look at data and actually give you insightful information. The amount of data collected these days are so large that you can really only make sense of them using a computer and analytical software tools. Going through each data point one by one in a written book would be impossible these days.
So, going back to smart cities, the cost of networking, the cost of computing, the cost of digital storage and memory, as well as the cost of the hardware itself, really have gotten so low that it makes sense to use it for way more problems and tasks than ever before. So we’re really only constrained right now by the amount of people — human resources and engineers — available to implement all these functions.
Chua: I get it that big data is the core and heart of the smart city. How is data sent from place to place, city to city?
Thomas Gorissen: Think of if this way. You have a friend that works in a large company like a bank, and you want to send him a package, just for him. So what you do is you take a carton and you put inside it what you would like to send your friend, and write his name on it with a message just for him. Then you put that package now in another package that you then wrapped to give it to the postal service, and you put the address of the bank on there. So now you have kind of have a package in a package, both with their own address. One address is to tell the postal service where to deliver your package and the other one is for the company to route the goods to your friend within the company. So you have to think about the Internet a little bit like that too, just multiplied by a lot more packages within packages, going down the ones and zeros out of whatever you write in your computer.
Chua: A little bit like a Russian doll.
Gorissen: Yes, it’s a big, fun postal institution that works in real time.
To get your request into one of those servers that you want to access a website from, your computer needs to translate this request into ones and zeros that can be sent over a wire or your Wi-Fi radio connection. And to do that, it uses different layers to translate it down into ones and zeros. For your HTTP, which you might have seen in your web browser, there is the hypertext transfer protocol, and underneath that protocol is TCP/IP, a session-based protocol that the US military invented at some point to connect large amounts of computers in a growing network with IP addresses . Underneath that, you have other networking layers and eventually, the physical layer which is the wire, where that's just ones and zeros in the form of — if I turn that electricity on and off like thousands of times per second — to signal these data to another receiver. On the receiving side, the data then goes all the way back up, meaning there’s a listener that listens to all these ones and zeros coming in and then converts them back into the networking layer packages and then the TCP/IP packages and then back into HTTP packages.
Adeline Setiawan: Let’s just say that I send out a Facebook post, how does Facebook actually package that message into packages and post it to all my friends?
Gorissen: Well, you interact with the Facebook server — or servers sometimes — and they are sitting in between. The Internet sort of benefits from the fact that you can make copies for no cost of anything, because digital information can be copied and pasted elsewhere for almost no cost, which was the thing that made Internet companies so fundamentally scalable in comparison to, for example, CDs where you have to create a physical item every time you want to sell a copy. So when you send a message to a group, the Facebook server is very likely to create a multitude of copies of your message to send them to these other people. So they get all transferred individually and through different mechanisms, but in the end you just need to send one message.
Chua: You mentioned servers... Where is all this data stored and how secure are these data centres?
Gorissen: Data centres are pretty secure, they’re high security facilities and there are very few people allowed to enter. There are actually laws and regulations around this too. There are background checks before you even get permission to go there, there are gates, controls, you’re going to be screened and searched for anything that you might bring into them. They are totally sealed off.
Chua: And virtually — how secure are they?
Gorissen: That’s a better question, actually. Virtually, they’re as secure as the software that runs on them and the hardware that runs them. It really depends on what software, what hardware you install, like every piece of equipment that you buy might have known or unknown bugs somewhere that could be exploited. But you know, nothing is 100% secure. On the software side, it's really hard because the complexity is incredibly high and it's one of the biggest challenges of our times is, actually, to secure data properly. It's like an infinite task to fight complexity.
Chua: So when we hear about data hacks, do they happen at the data centre?
Gorissen: I think that most hacks happen on your local computer or on your device. The most common hacks are social hacks, things like phishing, where people retrieve emails or messages in WhatsApp, SMS, with the attempt to impersonate, to mislead somebody into thinking that the sender is somebody else so that you unveil your password or other security credentials and with that information an attacker can then access your bank accounts or services.
Even the recent SingHealth hack  here was quite interesting. As far as I hear, there was somebody who befriended a worker, and while his colleague was absent, took over control of his computer that he was logged into and sent some data out. So this was also a social hack.
In the end, security is a function of economics, how valuable is the thing that you're protecting defines how much you would invest into protecting it. These days, the software and physical security of data is so good that it is very expensive to attempt to breach it. There are some institutions, governments or malicious actors that sit on these so-called exploits, which is a piece of code that could penetrate an existing software, and they keep that secret to be able to sell it on the black market for millions to somebody who could see value in it and use it to exploit a system. There have been crazy examples of that in the past, to intrude into nuclear facilities and make equipment that is installed in these facilities change their behaviour subtly so that people would think the equipment is broken. To do that, researchers guessed that it was a multimillion-dollar operation to create this exploit and make it work. So it really depends... But the interesting part is that that these things are so expensive that it's mostly easier to do this kind of social hacks because individuals and the way they use their devices are the weakest link.
Setiawan: At the end of it, what does the hacker get? Is it just a string of information?
Gorissen: Malicious actors could be motivated by all kinds of things. Sometimes it's really just a certain volume of data that they're looking for, like a massive amount of email addresses that they could sell or use. Sometimes it's lists of not very well-secured computers that they could take over, to run their own software on them as actors in bigger attacks later on, like sort of dormant robots, if you will. It could be that they're looking for some specific kind of information that somebody's willing to pay a lot of money for, like a Head of State’s health status, for example, could have been an interesting data point for somebody to know that they might put money on the line to know it. So the motivations are as colourful as the ways to do it.
Setiawan: Let’s talk about small data, if you could say such a thing. There’s a lot of people that say smart cities are also about the Internet of Things (IoT).
Gorissen: The Internet of Things describe small, distributed computing devices. In a way, your phone is an IoT device, and anything that has a chipset inside it that can run a bit of software is a generic computing machine. Your printer, your smartwatch, even your Wi-Fi router is a small generic computing machine. You can actually run all kinds of software on it; it doesn’t technically need to be a router software. These days, it could be almost any device that has a power cable, from fridges to coffee machines to smart metres to manage your energy consumption and then track it. Essentially, an IoT device is a device that has a specific function that now can run software and be connected to the Internet.
Chua: And they all collect data from us, right?
Gorissen: Yes. It can be data about environmental happenings, scenarios and situations, tide flows, to the way animals breed, to the weather to...
Setiawan: To how much we use the washing machine!
Gorissen: To how much water you consume, yes. It's not so much what the individual does, but the relevant information is more about what we as a group do, because that has the most effect on the future. Typically, you install all these devices to see trends, monitor situations and learn from them to affect the future.
Thomas Gorissen is a web engineer, a digital product advisor and event organiser who has spent the last 20 years re-thinking, designing, building web applications in industries such as e-commerce, advertising and communication. Throughout this time he built digital products, teams and companies in Germany, the US, Panama and now Singapore. Today he uses this experience to consult for and advise young technology driven companies and mentors at Asia's leading tech incubators and venture firms. For seven years he has organised Southeast Asia's largest and most influential web developer conference, JSConf.Asia, which has lifted Southeast Asia and Singapore onto the global stage of web development.
1 The network first used to implement the TCP/IP protocol suite, the Advanced Research Projects Agency Network (ARPANET), was initially funded by the United States Department of Defense in the 1960s to 70s.
2 In 2018, hackers stole the personal particulars of 1.5 million patients of SingHealth, Singapore's largest group of public healthcare institutions and clinics.