1 00:00:08,040 --> 00:00:11,480 It sometimes seems we're being deluged with data. 2 00:00:13,320 --> 00:00:16,200 Wave upon wave of news and messages. 3 00:00:17,720 --> 00:00:19,640 Submerged by step counts. 4 00:00:21,400 --> 00:00:26,560 Constantly bailing out to make room for more. 5 00:00:26,560 --> 00:00:31,480 We buy it, surf it, occasionally drown in it 6 00:00:31,480 --> 00:00:37,680 and with modern technology quantify ourselves and everything else with it. 7 00:00:37,680 --> 00:00:41,320 Data is the new currency of our time. 8 00:00:43,240 --> 00:00:46,880 Data has become almost a magic word for...anything. 9 00:00:46,880 --> 00:00:53,320 Crime and lunacy and literacy and religion and...drunkenness. 10 00:00:55,600 --> 00:00:58,840 You name it, somebody was gathering information about it. 11 00:00:58,840 --> 00:01:01,840 It offers the ability to be transformationally positive. 12 00:01:03,720 --> 00:01:07,880 It's, in one sense, just the reduction in uncertainty. 13 00:01:07,880 --> 00:01:10,600 So what exactly is data? 14 00:01:10,600 --> 00:01:15,480 How is it captured, stored, shared and made sense of? 15 00:01:17,160 --> 00:01:21,920 The engineers of the data age are people that most of us have never heard of, 16 00:01:21,920 --> 00:01:25,400 despite the fact that they brought about 17 00:01:25,400 --> 00:01:29,320 a technological and philosophical revolution, 18 00:01:29,320 --> 00:01:34,200 and created a digital world that the mind boggles to comprehend. 19 00:01:36,200 --> 00:01:39,480 This is the story of THE word of our times... 20 00:01:41,720 --> 00:01:46,040 ..how the constant flow of more and better data has transformed society... 21 00:01:49,200 --> 00:01:53,600 ..and is even changing our sense of ourselves. 22 00:01:53,600 --> 00:01:55,720 I can't believe this is my life now. 23 00:01:59,480 --> 00:02:02,000 So come on in, because the water's lovely. 24 00:02:13,600 --> 00:02:15,240 My name is Hannah Fry, 25 00:02:15,240 --> 00:02:19,280 I'm a mathematician, and I'd like to begin with a confession. 26 00:02:20,800 --> 00:02:23,600 I haven't always loved data. 27 00:02:23,600 --> 00:02:27,440 The truth is mathematicians just don't really like data that much. 28 00:02:27,440 --> 00:02:31,160 And for most of my professional life I was quite happy sitting in 29 00:02:31,160 --> 00:02:35,360 a windowless room with my equations, describing the world around me. 30 00:02:35,360 --> 00:02:40,320 You can capture the arc of a perfect free kick or the beautiful 31 00:02:40,320 --> 00:02:42,640 aerodynamics of a race car. 32 00:02:42,640 --> 00:02:48,160 The mathematics of the real world is clean and ordered and elegant, 33 00:02:48,160 --> 00:02:51,760 everything that data absolutely isn't. 34 00:02:51,760 --> 00:02:54,280 There was one moment that helped to change my mind. 35 00:02:57,040 --> 00:03:02,040 It was in 2011 when I came across a little game that a teenage Wikipedia 36 00:03:02,040 --> 00:03:04,880 user called Mark J had invented. 37 00:03:04,880 --> 00:03:09,680 Now, Mark noticed that if you hit the first link in the main text of any 38 00:03:09,680 --> 00:03:15,120 Wikipedia page and then do the same for the next page, a pattern emerges. 39 00:03:15,120 --> 00:03:17,560 So the page for data, for example, 40 00:03:17,560 --> 00:03:20,240 links from "set" to "maths" 41 00:03:20,240 --> 00:03:25,480 to "quantity" to "property" and then "philosophy", 42 00:03:25,480 --> 00:03:29,280 which after a few more links will loop back onto itself. 43 00:03:29,280 --> 00:03:32,320 Now, the page "egg" ends up in the same place, 44 00:03:32,320 --> 00:03:37,600 and even that famously philosophical boyband One Direction will take you 45 00:03:37,600 --> 00:03:39,840 all the way through to "philosophy", 46 00:03:39,840 --> 00:03:42,760 although you have to go through "science" to get there. 47 00:03:44,400 --> 00:03:50,040 The same goes for "fungi", or "hairspray", 48 00:03:50,040 --> 00:03:55,200 "marmalade", even "mice", "dust" and "socks". 49 00:03:55,200 --> 00:03:59,440 It was a very strange finding and it called for some statistics. 50 00:04:02,280 --> 00:04:05,320 Another Wikipedia user, Il Mare, 51 00:04:05,320 --> 00:04:09,600 wrote a computer program to try and investigate this phenomenon. 52 00:04:09,600 --> 00:04:16,640 Now, he discovered, amazingly, that for almost 95% of Wikipedia pages, 53 00:04:16,640 --> 00:04:20,840 you will end up getting to "philosophy" eventually. 54 00:04:20,840 --> 00:04:25,520 Now, that's pretty cool, but how did it change my mind about data? 55 00:04:25,520 --> 00:04:32,200 Well, the pattern that Mark J discovered and the data that was captured and analysed, 56 00:04:32,200 --> 00:04:37,120 it revealed a hidden mathematical structure, 57 00:04:37,120 --> 00:04:42,400 because Wikipedia is just a network with loops and chains hidden all over the place 58 00:04:42,400 --> 00:04:47,040 and it's something that can be described beautifully using mathematics. 59 00:04:49,320 --> 00:04:56,000 For me this was the perfect example of how there are two parallel universes. 60 00:04:56,000 --> 00:04:58,560 There's the tangible, noisy, messy one, 61 00:04:58,560 --> 00:05:02,840 the one that you can see and touch and experience. 62 00:05:02,840 --> 00:05:07,680 But there's also the mathematical one, where I think the key to our 63 00:05:07,680 --> 00:05:09,480 understanding lies. 64 00:05:09,480 --> 00:05:13,760 And data is the bridge between those two universes. 65 00:05:15,360 --> 00:05:20,040 Our understanding of everything from cities to crime, 66 00:05:20,040 --> 00:05:21,800 global trade, 67 00:05:21,800 --> 00:05:24,520 migration and even disease... 68 00:05:25,800 --> 00:05:28,160 ..it's all underpinned by data. 69 00:05:30,920 --> 00:05:32,480 Take this for example. 70 00:05:33,680 --> 00:05:37,360 Rural Wiltshire and a dairy farm 71 00:05:37,360 --> 00:05:41,400 gathering data from its cows wearing 72 00:05:41,400 --> 00:05:42,400 pedometers. 73 00:05:44,440 --> 00:05:46,840 We can't be out here 24-7. 74 00:05:46,840 --> 00:05:51,080 The pedometers help us to have our eyes and ears everywhere. 75 00:05:51,080 --> 00:05:53,920 It turns out when cows go into heat 76 00:05:53,920 --> 00:05:56,920 they move around a lot more than normal. 77 00:05:56,920 --> 00:06:00,720 Constant monitoring of their steps and some background mathematics 78 00:06:00,720 --> 00:06:04,720 reveal the prime time for insemination. 79 00:06:04,720 --> 00:06:07,280 We'll be able to look at the data 80 00:06:07,280 --> 00:06:09,360 and within 24 hours there'll be a 81 00:06:09,360 --> 00:06:12,920 greater chance of getting her in calf. 82 00:06:12,920 --> 00:06:15,880 Data-driven farming is now big business, 83 00:06:15,880 --> 00:06:19,840 turning a centuries-old way of life into precision science. 84 00:06:23,480 --> 00:06:26,320 Pretty much every industry you can think of 85 00:06:26,320 --> 00:06:28,160 now relies on data. 86 00:06:32,560 --> 00:06:34,800 We all agree that we are undergoing 87 00:06:34,800 --> 00:06:36,840 a major revolution in human history. 88 00:06:46,400 --> 00:06:49,040 The digital world replacing the analogue world. 89 00:06:49,040 --> 00:06:50,720 A world based on data, 90 00:06:50,720 --> 00:06:55,480 they are made of codes rather than a world made of biological or physical 91 00:06:55,480 --> 00:06:57,960 data, that is extraordinary. 92 00:06:57,960 --> 00:07:00,240 Why philosophy at this stage? 93 00:07:00,240 --> 00:07:03,040 Because when you face extraordinary challenges, 94 00:07:03,040 --> 00:07:05,320 the worst thing you can do is to get close to it. 95 00:07:05,320 --> 00:07:07,720 You need to take a long run-up. 96 00:07:07,720 --> 00:07:09,680 The bigger the gap, the longer the run-up. 97 00:07:09,680 --> 00:07:11,640 And the run-up is called philosophy. 98 00:07:13,240 --> 00:07:15,960 In the spirit of taking a long run-up, 99 00:07:15,960 --> 00:07:17,840 we'll start with the word itself. 100 00:07:20,320 --> 00:07:23,600 "Data" is originally from the Latin "datum", 101 00:07:23,600 --> 00:07:26,040 meaning "that which is given". 102 00:07:26,040 --> 00:07:28,480 Data can be descriptions... 103 00:07:29,880 --> 00:07:33,520 ..counts, or measures... 104 00:07:35,720 --> 00:07:36,960 ..of anything... 105 00:07:39,200 --> 00:07:41,040 ..in any format. 106 00:07:43,480 --> 00:07:47,160 It's anything that when analysed becomes information, 107 00:07:47,160 --> 00:07:50,120 which in turn is the raw material for knowledge, 108 00:07:50,120 --> 00:07:52,400 the only true path to wisdom. 109 00:07:54,320 --> 00:07:56,080 Look at the data on data. 110 00:07:58,440 --> 00:08:00,960 And before the scientific and industrial revolution, 111 00:08:00,960 --> 00:08:03,640 the word barely gets a look in, in English. 112 00:08:05,040 --> 00:08:08,240 But then it starts to appear in print 113 00:08:08,240 --> 00:08:11,560 as scientists and the state gather, 114 00:08:11,560 --> 00:08:13,520 observe and create more and more of it. 115 00:08:14,560 --> 00:08:18,760 This arrival of the age of data would change everything. 116 00:08:24,720 --> 00:08:27,560 Industrial Revolution Britain. 117 00:08:27,560 --> 00:08:28,640 For Victorians, 118 00:08:28,640 --> 00:08:32,840 booming industry and the growth of major cities were changing both the 119 00:08:32,840 --> 00:08:36,640 landscape and daily life beyond recognition. 120 00:08:37,920 --> 00:08:41,560 Into this scene stepped an unlikely man of numbers, 121 00:08:41,560 --> 00:08:43,000 William Farr, 122 00:08:43,000 --> 00:08:48,200 one of the first people to manage data on an industrial scale. 123 00:08:48,200 --> 00:08:50,480 William Farr had a quite unusual upbringing, 124 00:08:50,480 --> 00:08:53,480 in that he was actually the son of a farm labourer but who had managed to 125 00:08:53,480 --> 00:08:56,720 get a medical education, which was really very unusual for someone of his class. 126 00:09:05,520 --> 00:09:08,400 Farr very quickly became absorbed in the study of statistics. 127 00:09:08,400 --> 00:09:12,240 He was particularly interested, as you might expect for somebody with medical training, 128 00:09:12,240 --> 00:09:14,680 in public health, life expectancy and about causes of death. 129 00:09:16,040 --> 00:09:18,320 For anyone interested in statistics, 130 00:09:18,320 --> 00:09:20,160 there was only one place to be. 131 00:09:21,800 --> 00:09:24,360 Somerset House in London was home to 132 00:09:24,360 --> 00:09:26,440 the General Register Office, 133 00:09:26,440 --> 00:09:28,040 where, in 1839, 134 00:09:28,040 --> 00:09:30,800 Farr found his dream job. 135 00:09:30,800 --> 00:09:34,960 From up there in the north wing, William Farr, the apothecary, 136 00:09:34,960 --> 00:09:37,800 medical journalist and top statistician, 137 00:09:37,800 --> 00:09:39,280 would really rule the roost. 138 00:09:39,280 --> 00:09:41,520 Now, this place was almost like a factory. 139 00:09:41,520 --> 00:09:44,920 Here, they would collect, process and analyse 140 00:09:44,920 --> 00:09:47,480 vast amounts of data. 141 00:09:47,480 --> 00:09:51,000 So in would come the census returns, the records of every single birth, 142 00:09:51,000 --> 00:09:55,640 death and marriage in the country, and out would come the big picture, 143 00:09:55,640 --> 00:09:58,360 the usable information that could help 144 00:09:58,360 --> 00:10:01,160 inform policy and reform society. 145 00:10:02,280 --> 00:10:05,680 I think it's sometimes difficult for us to remember just how little people knew 146 00:10:05,680 --> 00:10:08,840 in the early 19th century about the changes that Britain was going through. 147 00:10:08,840 --> 00:10:12,200 So when Farr did an analysis of population density and death 148 00:10:12,200 --> 00:10:15,600 rate, he was able to show that life expectancy in Liverpool 149 00:10:15,600 --> 00:10:17,080 was absolutely atrocious. 150 00:10:17,080 --> 00:10:19,280 It was far, far worse than the surrounding areas. 151 00:10:19,280 --> 00:10:22,360 This came as a surprise to a lot of people who believed that Liverpool, 152 00:10:22,360 --> 00:10:24,960 a coastal town, was actually quite a salubrious place to live. 153 00:10:27,600 --> 00:10:32,520 At Somerset House, Farr spearheaded a revolution in the systematic 154 00:10:32,520 --> 00:10:34,480 collection of data 155 00:10:34,480 --> 00:10:37,960 to uncover the real picture of this changing society. 156 00:10:39,360 --> 00:10:44,320 Its scale and ambition was described in a newspaper at the time. 157 00:10:44,320 --> 00:10:47,960 "In arched chambers of immense strength and extent 158 00:10:47,960 --> 00:10:49,280 "are, in many volumes, the 159 00:10:49,280 --> 00:10:54,520 "genuine certificates of upwards of 28 million persons 160 00:10:54,520 --> 00:10:59,360 "born into life, married or passed into the grave." 161 00:10:59,360 --> 00:11:02,880 Here, every person was recorded equally. 162 00:11:02,880 --> 00:11:04,560 A revolutionary idea. 163 00:11:07,600 --> 00:11:11,920 "Here are to be found the records of nonentities, 164 00:11:11,920 --> 00:11:15,480 "side-by-side with those once learned in the law 165 00:11:15,480 --> 00:11:16,960 "or distinguished in 166 00:11:16,960 --> 00:11:19,280 "literature, art or science." 167 00:11:20,640 --> 00:11:26,200 But what really motivated William Farr was not just data collection, 168 00:11:26,200 --> 00:11:31,080 it was the possibility that data gathered could be analysed to help 169 00:11:31,080 --> 00:11:33,760 overcome society's greatest ill. 170 00:11:35,880 --> 00:11:40,600 Cholera was probably the most feared of all of the Victorian diseases. 171 00:11:40,600 --> 00:11:44,240 The terrifying thing was that you could wake up in the morning and feel absolutely fine, 172 00:11:44,240 --> 00:11:45,840 and then be dead by the evening. 173 00:11:47,480 --> 00:11:51,120 Between the 1830s and the 1860s, 174 00:11:51,120 --> 00:11:53,960 tens of thousands died in London alone. 175 00:11:56,920 --> 00:12:00,240 The control of infectious diseases like cholera, 176 00:12:00,240 --> 00:12:02,440 which no-one fully understood, 177 00:12:02,440 --> 00:12:05,360 became the greatest public-health issue of the time. 178 00:12:07,000 --> 00:12:09,160 However great London might have looked back then, 179 00:12:09,160 --> 00:12:12,840 it would have smelled absolutely terrible. 180 00:12:12,840 --> 00:12:16,520 At that point, the Victorians didn't have really a great way of disposing 181 00:12:16,520 --> 00:12:19,720 of human waste, so it would have flowed down the gutters 182 00:12:19,720 --> 00:12:21,360 into open sewers 183 00:12:21,360 --> 00:12:23,000 and out into the Thames. 184 00:12:23,000 --> 00:12:26,080 Now, the city smelt so bad that it 185 00:12:26,080 --> 00:12:29,400 was pretty plausible that the foul air 186 00:12:29,400 --> 00:12:31,600 was responsible for carrying the disease. 187 00:12:33,600 --> 00:12:37,280 Farr collected a huge range of data during each 188 00:12:37,280 --> 00:12:39,200 cholera outbreak to try to 189 00:12:39,200 --> 00:12:43,480 identify what put people most at risk from the bad air. 190 00:12:44,640 --> 00:12:48,280 He used income-tax data to try and measure the affluence of the different 191 00:12:48,280 --> 00:12:50,240 boroughs that were affected by cholera. 192 00:12:50,240 --> 00:12:53,560 He asked his friends at the Royal Observatory to provide data on the 193 00:12:53,560 --> 00:12:55,320 temperature and climatic conditions. 194 00:12:56,360 --> 00:12:59,560 But the one that he thought was most convincing was about the topography. 195 00:12:59,560 --> 00:13:02,480 It was about the elevation above the Thames. 196 00:13:02,480 --> 00:13:08,080 Using the data, Farr suggested a mathematical law of elevation. 197 00:13:08,080 --> 00:13:13,280 Its equations described how cholera mortality falls the higher you live 198 00:13:13,280 --> 00:13:14,560 above the Thames. 199 00:13:15,800 --> 00:13:18,600 Now, he published his report in 1852, 200 00:13:18,600 --> 00:13:20,760 which the Lancet described as one of 201 00:13:20,760 --> 00:13:23,280 the most remarkable productions of 202 00:13:23,280 --> 00:13:26,200 type and pen in any age and country. 203 00:13:28,080 --> 00:13:30,840 The only problem was that Farr's work, 204 00:13:30,840 --> 00:13:33,240 although elegant and meticulous, 205 00:13:33,240 --> 00:13:35,440 was fundamentally flawed. 206 00:13:38,080 --> 00:13:40,760 Farr stuck to the prevailing theory 207 00:13:40,760 --> 00:13:43,600 that cholera was spread by air. 208 00:13:43,600 --> 00:13:46,080 Such is the power of the status quo. 209 00:13:46,080 --> 00:13:47,920 But in 1866, 210 00:13:47,920 --> 00:13:53,880 5,500 people died in just one square mile of London's East End, 211 00:13:53,880 --> 00:13:56,440 and that data made Farr change his mind. 212 00:13:58,800 --> 00:14:01,480 When Farr came to write his next report, 213 00:14:01,480 --> 00:14:04,320 the data told a different story 214 00:14:04,320 --> 00:14:06,200 which proved the turning point in 215 00:14:06,200 --> 00:14:08,240 combating the disease. 216 00:14:08,240 --> 00:14:13,760 The common factor among those who died was not elevation or air 217 00:14:13,760 --> 00:14:16,520 but sewage-contaminated drinking water. 218 00:14:19,280 --> 00:14:20,640 With this new report, 219 00:14:20,640 --> 00:14:23,240 Farr may seem to have contradicted 220 00:14:23,240 --> 00:14:26,080 much of his own work, but I think that 221 00:14:26,080 --> 00:14:30,160 this is the perfect example of what data can do. 222 00:14:30,160 --> 00:14:33,880 It provides that bridge essential to scientific discovery, 223 00:14:33,880 --> 00:14:36,760 from theory to proof, problem to solution. 224 00:14:38,560 --> 00:14:41,680 Good data, even in huge volumes, 225 00:14:41,680 --> 00:14:45,640 does not guarantee that you will arrive at the truth. 226 00:14:45,640 --> 00:14:49,560 But, eventually, when the weight of the data tips the balance, 227 00:14:49,560 --> 00:14:52,320 even the strongest-held beliefs can be overcome. 228 00:14:55,920 --> 00:14:59,800 Of course, it was the weight of the data itself which, 229 00:14:59,800 --> 00:15:01,800 with the dawn of the 20th century, 230 00:15:01,800 --> 00:15:04,680 was becoming increasingly hard to manage. 231 00:15:06,600 --> 00:15:11,040 Data stored long form in things like census ledgers could take the best 232 00:15:11,040 --> 00:15:13,640 part of a decade to process, 233 00:15:13,640 --> 00:15:16,320 meaning the stats were often out of date. 234 00:15:17,560 --> 00:15:20,120 When you're dealing with figures like these, it's one thing. 235 00:15:22,840 --> 00:15:26,520 But when you're counting the population like this it's quite a different matter. 236 00:15:26,520 --> 00:15:31,040 A deceptively simple solution got what's now called the 237 00:15:31,040 --> 00:15:34,120 information revolution under way, 238 00:15:34,120 --> 00:15:38,800 encoding data as holes punched in cards. 239 00:15:38,800 --> 00:15:41,560 These cards are passed over sorting machines, 240 00:15:41,560 --> 00:15:44,280 each of which handles 22,000 cards a minute. 241 00:15:46,880 --> 00:15:51,360 By the 1950s, data processing and simple calculations were 242 00:15:51,360 --> 00:15:53,560 routinely mechanised, 243 00:15:53,560 --> 00:15:56,240 laying the groundwork for the next generation of 244 00:15:56,240 --> 00:15:57,800 data-processing machines. 245 00:16:01,280 --> 00:16:04,280 They would be put to pioneering work 246 00:16:04,280 --> 00:16:06,280 in a rather unlikely place. 247 00:16:08,840 --> 00:16:12,240 In a grand London dining hall, a group of men and women, 248 00:16:12,240 --> 00:16:16,720 many in their 80s and 90s, have gathered for a special work reunion. 249 00:16:19,320 --> 00:16:22,440 At its peak, their employer, J Lyons, 250 00:16:22,440 --> 00:16:25,320 purveyor of fine British tea and cakes, 251 00:16:25,320 --> 00:16:27,720 had hundreds of tea shops nationwide. 252 00:16:28,720 --> 00:16:30,800 There are hundreds of items of food. 253 00:16:30,800 --> 00:16:34,960 All these in a varying quantity each day are delivered to a precise 254 00:16:34,960 --> 00:16:36,120 timetable to the tea shops. 255 00:16:38,960 --> 00:16:43,520 These people aren't former J Lyons bakers or tea-shop managers. 256 00:16:43,520 --> 00:16:47,640 They were hired for their mathematical skills. 257 00:16:47,640 --> 00:16:51,680 Lyons had a huge amount of data which has to be processed, 258 00:16:51,680 --> 00:16:54,640 often very low-value data. 259 00:16:54,640 --> 00:16:57,320 So, for example, the transaction from a tea shop 260 00:16:57,320 --> 00:16:58,400 would be a cup of tea. 261 00:16:59,480 --> 00:17:02,520 But each one had a voucher and had to be recorded, 262 00:17:02,520 --> 00:17:05,160 and had to go to the accounts 263 00:17:05,160 --> 00:17:08,840 for business reasons and for management reasons. 264 00:17:08,840 --> 00:17:13,080 Every calculation you did, not only you had to do it twice, 265 00:17:13,080 --> 00:17:16,000 but you had to get it checked by someone else as well. 266 00:17:16,000 --> 00:17:19,520 The handling of these millions and millions of pieces of data, 267 00:17:19,520 --> 00:17:23,600 the storage of that data, are the key of the business problem. 268 00:17:26,040 --> 00:17:30,840 The Lyons team took the world by surprise when, in 1951, 269 00:17:30,840 --> 00:17:33,960 they unveiled the Lyons Electronic Office, 270 00:17:33,960 --> 00:17:35,160 or Leo for short. 271 00:17:39,320 --> 00:17:42,880 At this point, only a handful of computers existed, 272 00:17:42,880 --> 00:17:46,560 and they were used solely for scientific and military research, 273 00:17:46,560 --> 00:17:51,720 so a business computer was a radical reimagining of what this brand-new 274 00:17:51,720 --> 00:17:53,160 technology could be for. 275 00:17:56,120 --> 00:17:59,280 Each manageress has a standing order depending on the day of the week. 276 00:18:00,680 --> 00:18:04,120 She speaks by telephone to head office, where her variations are 277 00:18:04,120 --> 00:18:05,760 taken quickly onto cards. 278 00:18:05,760 --> 00:18:08,800 What the girl hears, she punches. 279 00:18:08,800 --> 00:18:10,640 The programme is fed first, 280 00:18:10,640 --> 00:18:14,560 laying down the sequence for the multiplicity of calculations Leo will perform. 281 00:18:15,800 --> 00:18:16,960 It was the first 282 00:18:16,960 --> 00:18:19,280 opportunity to process large volumes 283 00:18:19,280 --> 00:18:20,760 of clerical work, 284 00:18:20,760 --> 00:18:22,760 take all the hard work out of it, 285 00:18:22,760 --> 00:18:25,560 and put it on an automatic system. 286 00:18:25,560 --> 00:18:28,600 Before Leo, working out an employee's pay 287 00:18:28,600 --> 00:18:31,440 took an experienced clerk eight minutes, 288 00:18:31,440 --> 00:18:35,680 but with Leo that dropped to an astonishing one and a half seconds. 289 00:18:39,320 --> 00:18:41,160 It was all so exciting 290 00:18:41,160 --> 00:18:44,080 because we were breaking new ground the whole time. 291 00:18:44,080 --> 00:18:48,520 Absolutely everything which we did has never been done before. 292 00:18:49,680 --> 00:18:51,560 By anybody anywhere. 293 00:18:51,560 --> 00:18:56,040 I don't think we realised the kind of transformation we were part of. 294 00:19:00,120 --> 00:19:04,560 The post-war years saw a boom in the application of this new computing 295 00:19:04,560 --> 00:19:05,560 technology. 296 00:19:06,680 --> 00:19:09,560 Leo ran on paper, tape and cards, 297 00:19:09,560 --> 00:19:13,680 but soon machines with magnetic tape and disks were developed, 298 00:19:13,680 --> 00:19:17,560 allowing for greater data storage and faster calculations. 299 00:19:19,560 --> 00:19:23,760 As more businesses and institutions adopted these new machines, 300 00:19:23,760 --> 00:19:27,480 application of mathematics to a whole host of new, 301 00:19:27,480 --> 00:19:30,000 real-world challenges took off. 302 00:19:31,600 --> 00:19:36,520 And the word "data" went from relatively obscure to ubiquitous. 303 00:19:45,320 --> 00:19:48,720 "Data" has become almost a magic word for anything. 304 00:19:48,720 --> 00:19:51,880 The truth is that it is a kind of interface 305 00:19:51,880 --> 00:19:55,120 today between us and the rest of the world. 306 00:19:55,120 --> 00:19:57,840 In fact, between us and ourselves, 307 00:19:57,840 --> 00:20:01,280 we understand our bodies in terms of data, 308 00:20:01,280 --> 00:20:03,560 we understand society in terms of data, 309 00:20:03,560 --> 00:20:06,680 we understand the physics of the universe in terms of data. 310 00:20:06,680 --> 00:20:09,800 The economy, social science, we play with data, 311 00:20:09,800 --> 00:20:12,920 so essentially it is what we interact with 312 00:20:12,920 --> 00:20:14,440 most regularly every day. 313 00:20:17,480 --> 00:20:21,480 Data underpins all human communication, 314 00:20:21,480 --> 00:20:23,840 regardless of the format. 315 00:20:23,840 --> 00:20:28,840 And it was the desire to communicate effectively and efficiently that led 316 00:20:28,840 --> 00:20:33,520 to one of the most important academic papers of the 20th century. 317 00:20:36,160 --> 00:20:39,320 A mathematical theory of communication 318 00:20:39,320 --> 00:20:45,520 has justifiably been called the Magna Carta for the information age. 319 00:20:45,520 --> 00:20:50,560 It was written by a very young and bright employee of Bell Laboratories, 320 00:20:50,560 --> 00:20:54,640 the American centre for telecoms research that was founded by one of 321 00:20:54,640 --> 00:20:58,560 the inventors of the telephone, Alexander Graham Bell. 322 00:20:58,560 --> 00:21:04,360 Now, this paper was written by Claude Shannon in 1948 and it would 323 00:21:04,360 --> 00:21:10,080 effectively lay out the theoretical framework for the data revolution 324 00:21:10,080 --> 00:21:12,840 that was just beginning. 325 00:21:12,840 --> 00:21:15,000 Those that knew him described 326 00:21:15,000 --> 00:21:18,640 Shannon as a lifelong puzzle solver and inventor. 327 00:21:18,640 --> 00:21:23,920 To define the correct path it registers the information in its memory. 328 00:21:23,920 --> 00:21:27,360 Later, I can put him down in any part of the maze that he has already explored 329 00:21:27,360 --> 00:21:30,280 and he will be able to go directly to the goal without making a single 330 00:21:30,280 --> 00:21:32,440 false turn. 331 00:21:32,440 --> 00:21:36,920 During World War II he worked on data-encryption systems, 332 00:21:36,920 --> 00:21:40,240 including one used by Churchill and Roosevelt. 333 00:21:42,240 --> 00:21:43,480 But at Bell Labs, 334 00:21:43,480 --> 00:21:48,400 Claude Shannon was trying to solve the very civilian problem of noisy 335 00:21:48,400 --> 00:21:49,680 telephone lines. 336 00:21:49,680 --> 00:21:51,320 # There's a call, there's a call 337 00:21:51,320 --> 00:21:53,760 # There's a call for you 338 00:21:53,760 --> 00:21:56,840 # There's a call on the phone for you. # 339 00:21:56,840 --> 00:22:00,440 In that analogue world of 20th-century phones, 340 00:22:00,440 --> 00:22:04,720 your speech was converted into an electrical signal using a handset 341 00:22:04,720 --> 00:22:09,280 like this and then transmitted down a series of wires. 342 00:22:09,280 --> 00:22:12,480 The voice signals would travel along the wire, 343 00:22:12,480 --> 00:22:17,120 be detected by the receiver at the other end and then be converted back 344 00:22:17,120 --> 00:22:20,760 into sound waves to reach the ear of whoever had picked up. 345 00:22:20,760 --> 00:22:22,440 The problem was, 346 00:22:22,440 --> 00:22:25,200 the further the electrical signal travelled down the line, 347 00:22:25,200 --> 00:22:26,680 the weaker it would get. 348 00:22:26,680 --> 00:22:29,080 PHONE LINE CRACKLES Eventually you couldn't 349 00:22:29,080 --> 00:22:33,040 even hear the conversation for the amount of noise on the line. 350 00:22:33,040 --> 00:22:36,520 And you could boost the signal but it would mean boosting the noise, too. 351 00:22:38,000 --> 00:22:42,840 Shannon's genius idea was just as simple as it was beautiful. 352 00:22:44,280 --> 00:22:47,000 The breakthrough was converting speech 353 00:22:47,000 --> 00:22:49,200 into an incredibly simple code. 354 00:22:50,680 --> 00:22:52,160 ON PHONE: Hello? 355 00:22:52,160 --> 00:22:56,160 First the audio wave is detected, then sampled. 356 00:22:56,160 --> 00:23:00,720 Each point is assigned a code of ones and zeros 357 00:23:00,720 --> 00:23:04,000 and the resulting long string of digits can then be sent down 358 00:23:04,000 --> 00:23:07,720 the wire with the zeros as brief low-voltage signals 359 00:23:07,720 --> 00:23:11,840 and ones as brief bursts of high voltage. 360 00:23:11,840 --> 00:23:16,720 From this code, the original audio can be cleanly reconstructed and 361 00:23:16,720 --> 00:23:19,160 regenerated at the other end. 362 00:23:19,160 --> 00:23:20,880 ON PHONE: Hello? 363 00:23:20,880 --> 00:23:23,640 Shannon was the first person to publish the name 364 00:23:23,640 --> 00:23:25,240 for these ones and zeros, 365 00:23:25,240 --> 00:23:28,200 the smallest possible pieces of information, 366 00:23:28,200 --> 00:23:31,120 and they are called bits or binary digits, 367 00:23:31,120 --> 00:23:35,400 and the real power of the bit and the mathematics behind it 368 00:23:35,400 --> 00:23:37,560 applies way beyond telephones. 369 00:23:39,200 --> 00:23:42,320 They offered a new way for everything, 370 00:23:42,320 --> 00:23:48,360 including text and pictures, to be encoded as ones and zeros. 371 00:23:51,800 --> 00:23:57,560 The possibility to store and share data digitally in the form of bits 372 00:23:57,560 --> 00:24:00,720 was clearly going to transform the world. 373 00:24:02,320 --> 00:24:05,560 If anyone has to be identified as 374 00:24:05,560 --> 00:24:08,440 the genius who developed the 375 00:24:08,440 --> 00:24:11,680 foundational science of mathematics 376 00:24:11,680 --> 00:24:15,360 for our age, that is certainly Claude Shannon. 377 00:24:15,360 --> 00:24:19,200 Now, one thing has to be clarified, 378 00:24:19,200 --> 00:24:24,680 the theory developed by Shannon is about data transmission and it has 379 00:24:24,680 --> 00:24:27,920 nothing to do with meaning, truth, relevance, 380 00:24:27,920 --> 00:24:30,560 importance of the data transmitted. 381 00:24:30,560 --> 00:24:35,440 So it doesn't matter whether the zero and one represent 382 00:24:35,440 --> 00:24:40,600 an answer to, "Heads or tails?", or to the question, "Will you marry me?", 383 00:24:40,600 --> 00:24:47,280 for a theory of information is data anyway and if it is a 50-50 chance 384 00:24:47,280 --> 00:24:51,480 that you will or will not marry me or that it is heads or tails, 385 00:24:51,480 --> 00:24:54,400 the amount of information, the Shannon information, 386 00:24:54,400 --> 00:24:56,520 communicated is the same. 387 00:24:58,440 --> 00:25:04,520 Shannon information is not information like you or I might think about it. 388 00:25:04,520 --> 00:25:09,120 Encoding any and every signal using just ones and zeros is a pretty 389 00:25:09,120 --> 00:25:10,920 remarkable breakthrough. 390 00:25:10,920 --> 00:25:16,480 However, Shannon also came up with a revolutionary bit of mathematics. 391 00:25:16,480 --> 00:25:21,520 That equation there is the reason you can fit an entire HD movie on a 392 00:25:21,520 --> 00:25:27,080 flimsy bit of plastic or the reason why you can stream films online. 393 00:25:27,080 --> 00:25:30,640 I'll admit, it might not look too nice, but... 394 00:25:32,640 --> 00:25:37,200 don't get put off yet, because I'm going to explain how this equation 395 00:25:37,200 --> 00:25:38,680 works using Scrabble. 396 00:25:40,600 --> 00:25:43,320 Imagine that I created a new alphabet 397 00:25:43,320 --> 00:25:45,760 containing only the letter A. 398 00:25:45,760 --> 00:25:49,120 This bag would only have A tiles inside it 399 00:25:49,120 --> 00:25:52,640 and my chances of pulling out an A tile would be one. 400 00:25:52,640 --> 00:25:55,040 You'd be completely certain of what was going to happen. 401 00:25:55,040 --> 00:25:56,680 Using Shannon's maths, 402 00:25:56,680 --> 00:26:01,680 the letter A contains zero bits of what's called Shannon information. 403 00:26:03,400 --> 00:26:06,200 Let's say then I got a little bit more creative, but not much, 404 00:26:06,200 --> 00:26:09,360 and had an alphabet with two letters, A and B, 405 00:26:09,360 --> 00:26:11,840 and equal numbers of both in this bag. 406 00:26:11,840 --> 00:26:16,280 Now my chances of pulling out an A are going to be a half 407 00:26:16,280 --> 00:26:20,880 and each letter contains one bit of Shannon information. 408 00:26:22,800 --> 00:26:25,240 Of course, when transmitting real messages, 409 00:26:25,240 --> 00:26:27,080 you'll use the full alphabet. 410 00:26:28,600 --> 00:26:30,440 But English, 411 00:26:30,440 --> 00:26:34,080 as with every other language, has some letters that are used 412 00:26:34,080 --> 00:26:36,320 more frequently than others. 413 00:26:36,320 --> 00:26:38,760 If you take a quite common letter like H, 414 00:26:38,760 --> 00:26:42,600 which appear about 5.9% of the time, 415 00:26:42,600 --> 00:26:47,120 this will have a Shannon information of 4.1 bits. 416 00:26:49,120 --> 00:26:52,560 And incidentally, a Scrabble score of four. 417 00:26:52,560 --> 00:26:57,080 Of course, there are some much more exotic and rare letters, 418 00:26:57,080 --> 00:27:02,760 like Z, for instance, which appears about 0.07% of the time. 419 00:27:02,760 --> 00:27:06,200 That gives it 10.5 bits 420 00:27:06,200 --> 00:27:07,720 and Scrabble score of ten. 421 00:27:10,000 --> 00:27:13,640 Bits measure our uncertainty. 422 00:27:13,640 --> 00:27:17,760 If you're guessing a three-letter word and you know this letter is Z, 423 00:27:17,760 --> 00:27:21,400 it gives you a lot of information about what the word could be. 424 00:27:23,280 --> 00:27:25,080 But if you know it's H, 425 00:27:25,080 --> 00:27:28,280 because it is a more common letter with less information, 426 00:27:28,280 --> 00:27:30,920 you're more uncertain about the answer. 427 00:27:33,280 --> 00:27:36,160 Now if you wrap up all that uncertainty together, 428 00:27:36,160 --> 00:27:38,920 you end up with this, the Shannon entropy. 429 00:27:40,480 --> 00:27:42,720 It's the sum of the probability of 430 00:27:42,720 --> 00:27:44,840 each symbol turning up times the 431 00:27:44,840 --> 00:27:47,880 number of bits in each symbol. 432 00:27:47,880 --> 00:27:50,240 And this very clever bit of insight 433 00:27:50,240 --> 00:27:52,920 and mathematics means that the code 434 00:27:52,920 --> 00:27:56,160 for any message can be quantified. 435 00:27:56,160 --> 00:27:59,200 Not every letter, or any other signal for that matter, 436 00:27:59,200 --> 00:28:01,560 needs to be encoded equally. 437 00:28:03,880 --> 00:28:07,920 The digital code behind a movie like this one of my dog, Molly, 438 00:28:07,920 --> 00:28:09,160 for example, 439 00:28:09,160 --> 00:28:12,600 can usually be compressed by up to 50% 440 00:28:12,600 --> 00:28:16,360 without losing any information. 441 00:28:16,360 --> 00:28:17,520 But there's a limit. 442 00:28:20,000 --> 00:28:24,680 Compressing more might make it easier to share or download, 443 00:28:24,680 --> 00:28:28,240 but the quality can never be the same as the original. 444 00:28:28,240 --> 00:28:30,880 DOG BARKS 445 00:28:32,560 --> 00:28:36,280 You can't really overstate the impact that Shannon's work 446 00:28:36,280 --> 00:28:40,720 has had, because without it we wouldn't have JPEGs or Zip files 447 00:28:40,720 --> 00:28:44,080 or HD movies or digital communications. 448 00:28:44,080 --> 00:28:47,880 But it doesn't just stop there, because while the mathematics of 449 00:28:47,880 --> 00:28:51,680 information theory doesn't tell you anything about the meaning of data, 450 00:28:51,680 --> 00:28:54,560 it does begin to open up a possibility 451 00:28:54,560 --> 00:28:57,360 of how we can understand ourselves 452 00:28:57,360 --> 00:29:01,920 and our society, because pretty much anything and everything can be 453 00:29:01,920 --> 00:29:04,600 measured and encoded as data. 454 00:29:11,320 --> 00:29:14,000 We say that signals flow through human society, 455 00:29:14,000 --> 00:29:16,760 that people use signals to get things done, 456 00:29:16,760 --> 00:29:18,880 that our social life is, in many ways, 457 00:29:18,880 --> 00:29:21,280 the sending back and forth of signals. 458 00:29:21,280 --> 00:29:22,600 So what is a signal? 459 00:29:22,600 --> 00:29:26,480 It's, in one sense, just the reduction in uncertainty. 460 00:29:34,880 --> 00:29:39,720 What it means to receive a signal is to be less uncertain than you were 461 00:29:39,720 --> 00:29:43,640 before and so, another way to think of measuring or quantifying signal 462 00:29:43,640 --> 00:29:46,680 is in that change in uncertainty. 463 00:29:47,880 --> 00:29:51,200 Using Shannon's mathematics to quantify signals 464 00:29:51,200 --> 00:29:54,280 is common in the world of complexity science. 465 00:29:54,280 --> 00:29:57,840 It's rather less familiar to historians. 466 00:29:57,840 --> 00:30:01,000 I love maths, I love its precision, I love its beauty. 467 00:30:08,680 --> 00:30:11,640 I absolutely love 468 00:30:11,640 --> 00:30:19,680 its certainty, and that, Simon can bring that mathematical worldview, 469 00:30:19,680 --> 00:30:23,200 that mathematical certainty to what I work with. 470 00:30:24,640 --> 00:30:28,920 The reason behind this remarkable marriage between history and science 471 00:30:28,920 --> 00:30:33,080 is the analysis of the largest single body of digital text 472 00:30:33,080 --> 00:30:35,440 ever collated about ordinary people. 473 00:30:36,800 --> 00:30:39,800 It's the Proceedings of London's Old Bailey, 474 00:30:39,800 --> 00:30:42,320 the central criminal court of England and Wales, 475 00:30:42,320 --> 00:30:50,360 which hosted close to 200,000 trials between 1674 and 1913. 476 00:30:50,360 --> 00:30:54,440 There are 127 million words of everyday speech 477 00:30:54,440 --> 00:31:00,880 in the mouths of orphans and women and servants and ne'er-do-wells, 478 00:31:00,880 --> 00:31:02,800 of criminals, certainly, 479 00:31:02,800 --> 00:31:07,160 but also people from every rank and station in society. 480 00:31:07,160 --> 00:31:09,680 And that made them unique. 481 00:31:09,680 --> 00:31:14,040 What's exciting about the Old Bailey and the size of the dataset, 482 00:31:14,040 --> 00:31:16,120 the length and magnitude of it, 483 00:31:16,120 --> 00:31:18,760 is that not only can we detect a signal, 484 00:31:18,760 --> 00:31:23,280 but we are able to look at that signal's emergence over time. 485 00:31:24,560 --> 00:31:29,080 Shannon's mathematics can be used to capture the amount of information in 486 00:31:29,080 --> 00:31:30,600 every single word, 487 00:31:30,600 --> 00:31:34,520 and like the alphabet, the less you expect a word, 488 00:31:34,520 --> 00:31:37,920 the more bits of information it carries. 489 00:31:37,920 --> 00:31:41,040 Imagine that you walk into a courtroom 490 00:31:41,040 --> 00:31:43,440 at the time and you hear a single word, 491 00:31:43,440 --> 00:31:48,120 the question we ask is how much information does that word carry 492 00:31:48,120 --> 00:31:51,160 about the nature of the crime being tried? 493 00:31:52,720 --> 00:31:55,440 You hear the word "the". 494 00:31:55,440 --> 00:32:00,920 It's common across all trials and so gives you no bits of information. 495 00:32:00,920 --> 00:32:04,320 Most words you hear are poor signals of what's going on. 496 00:32:06,520 --> 00:32:09,440 But then you hear "purse". 497 00:32:09,440 --> 00:32:11,480 It conveys real information. 498 00:32:13,200 --> 00:32:15,520 Then comes "coin", 499 00:32:15,520 --> 00:32:18,160 "grab" and "struck". 500 00:32:18,160 --> 00:32:23,000 The more rare a word, the more bits of information it carries, 501 00:32:23,000 --> 00:32:25,080 the stronger the signal becomes. 502 00:32:26,960 --> 00:32:29,880 One of the clearest signals that we see in the Old Bailey, 503 00:32:29,880 --> 00:32:32,240 one of the clearest processes that comes out, 504 00:32:32,240 --> 00:32:35,440 is something that is known as the civilising process. 505 00:32:35,440 --> 00:32:42,400 It's an increasing sensitivity to, and attention to, the 506 00:32:42,400 --> 00:32:47,200 distinction between violent and nonviolent crime. 507 00:32:47,200 --> 00:32:52,440 If, for example, somebody hit you and stole your handkerchief, 508 00:32:52,440 --> 00:32:54,880 in the 18th-century context, in 1780, 509 00:32:54,880 --> 00:32:57,800 you would concentrate on the handkerchief. 510 00:32:57,800 --> 00:33:02,360 More worried about a few pence worth of dirty linen than the fact that 511 00:33:02,360 --> 00:33:05,480 somebody just broke your nose or cracked a rib. 512 00:33:05,480 --> 00:33:10,240 The fact that 100 years later, by 1880, 513 00:33:10,240 --> 00:33:15,080 every concern, every focus, both in terms of the words used in court, 514 00:33:15,080 --> 00:33:18,480 but also in terms of what people were brought to court for, 515 00:33:18,480 --> 00:33:21,480 focus on that broken nose and that cracked rib, 516 00:33:21,480 --> 00:33:26,040 speaks to a fundamental change in how we think about the world 517 00:33:26,040 --> 00:33:28,840 and how we think about how social relations work. 518 00:33:30,600 --> 00:33:36,400 Look at the strongest word signals for violent crime across the period. 519 00:33:36,400 --> 00:33:39,000 In the 18th century, the age of highwaymen, 520 00:33:39,000 --> 00:33:41,680 words relating to property theft dominate. 521 00:33:43,880 --> 00:33:45,680 But by the 20th century, 522 00:33:45,680 --> 00:33:49,520 it's physical violence itself and the impact on the victim 523 00:33:49,520 --> 00:33:51,440 that carry the most weight. 524 00:33:54,600 --> 00:33:56,800 That notion that one can trace change over time 525 00:33:56,800 --> 00:33:59,200 by looking at language and how it's used, 526 00:33:59,200 --> 00:34:01,160 who deploys it in what context, 527 00:34:01,160 --> 00:34:05,040 that I think gives this kind of work its real power. 528 00:34:05,040 --> 00:34:07,760 There are billions of words, there's all of Google Books, 529 00:34:07,760 --> 00:34:10,560 there's every printed newspaper, 530 00:34:10,560 --> 00:34:13,360 there is every speech made in Parliament, 531 00:34:13,360 --> 00:34:15,920 every sermon given at most churches. 532 00:34:15,920 --> 00:34:19,640 All of it is suddenly data and capable of being analysed. 533 00:34:24,480 --> 00:34:27,640 The rapid development of computers in the mid 20th century 534 00:34:27,640 --> 00:34:32,120 transformed our ability to encode, store and analyse data. 535 00:34:34,400 --> 00:34:38,000 It took a little longer for us to work out how to share it. 536 00:34:40,400 --> 00:34:43,160 This place is home to one of the most 537 00:34:43,160 --> 00:34:46,400 important UK scientific institutions, 538 00:34:46,400 --> 00:34:49,840 although it's one you've probably never heard of before. 539 00:34:49,840 --> 00:34:55,160 But since the 1900s, this place has advanced all areas of physics, 540 00:34:55,160 --> 00:34:59,840 radio communications, engineering, materials science, aeronautics, 541 00:34:59,840 --> 00:35:02,520 even ship design. 542 00:35:02,520 --> 00:35:05,160 NPL, the National Physical Laboratory, 543 00:35:05,160 --> 00:35:08,920 in south-west London is where the first atomic clock was built 544 00:35:08,920 --> 00:35:13,680 and where radar and the Automatic Computer Engine, or Ace, were invented. 545 00:35:15,000 --> 00:35:19,360 The Ace computer was the brainchild of Alan Turing, 546 00:35:19,360 --> 00:35:22,640 who came to work here right after the Second World War. 547 00:35:22,640 --> 00:35:27,280 Now, Turing's contributions to the story of data are undoubtedly vast, 548 00:35:27,280 --> 00:35:31,560 but more important for our story is another person who worked here with 549 00:35:31,560 --> 00:35:36,760 Turing, someone who arguably is even less well known than this place, 550 00:35:36,760 --> 00:35:38,560 Donald Davies. 551 00:35:39,880 --> 00:35:44,160 Davies worked on secret British nuclear weapons research during the war... 552 00:35:45,560 --> 00:35:48,960 ..later joining Turing at NPL, 553 00:35:48,960 --> 00:35:52,880 climbing the ranks to be put in charge of computing in 1966. 554 00:35:55,040 --> 00:35:57,120 As well as the new digital computers, 555 00:35:57,120 --> 00:36:03,080 Davies had a lifelong fascination with telephones and communication. 556 00:36:03,080 --> 00:36:05,960 His mother had worked in the Post Office telephone exchange, 557 00:36:05,960 --> 00:36:07,120 so even when he was a kid, 558 00:36:07,120 --> 00:36:10,400 he had a real understanding of how these phone calls were routed and 559 00:36:10,400 --> 00:36:13,120 rerouted through this growing network, 560 00:36:13,120 --> 00:36:16,360 and that was the perfect training for what was to follow. 561 00:36:27,520 --> 00:36:29,320 What was Donald Davies like, then? 562 00:36:29,320 --> 00:36:32,280 He was a super boss because he was very approachable. 563 00:36:33,720 --> 00:36:39,440 Everybody realised he'd got huge intellect but not difficult with it. 564 00:36:39,440 --> 00:36:41,240 Very nice guy. 565 00:36:41,240 --> 00:36:44,480 Davies' innovation was to develop, with his team, 566 00:36:44,480 --> 00:36:50,000 a way of sharing data between computers, a prototype network. 567 00:36:50,000 --> 00:36:53,600 Donald had spotted that there was a need to connect computers together 568 00:36:53,600 --> 00:36:57,440 and to connect people to computers, not by punch cards or 569 00:36:57,440 --> 00:37:01,120 paper tape or on a motorcycle, but over the wires, 570 00:37:01,120 --> 00:37:03,600 where you can move files or programs, or 571 00:37:03,600 --> 00:37:06,120 run a program remotely on another computer, 572 00:37:06,120 --> 00:37:09,480 and the telephone network is not really suited for that. 573 00:37:11,160 --> 00:37:12,720 In the pre-digital era, 574 00:37:12,720 --> 00:37:16,760 sending an encoded file along a telephone line meant that the line 575 00:37:16,760 --> 00:37:20,840 was engaged for as long as the transmission took. 576 00:37:20,840 --> 00:37:24,280 So the opportunity here was because we owned the site, 577 00:37:24,280 --> 00:37:28,400 78 acres with some 50 buildings, we could build a network. 578 00:37:28,400 --> 00:37:31,560 Davies' team sidestepped the telephone problem 579 00:37:31,560 --> 00:37:37,000 by laying high-bandwidth data cables before instituting a new way of 580 00:37:37,000 --> 00:37:39,800 moving data around the network. 581 00:37:41,960 --> 00:37:45,000 The technique he came up with was packet switching, 582 00:37:45,000 --> 00:37:48,080 the idea being that you take whatever it is you're going to send, 583 00:37:48,080 --> 00:37:53,000 you chop it up into uniform pieces, like having a standard envelope, 584 00:37:53,000 --> 00:37:56,400 and you put the pieces into the envelope and you post them off and they go 585 00:37:56,400 --> 00:38:00,320 separately through the network and get reassembled at the far end. 586 00:38:00,320 --> 00:38:01,720 To demonstrate this idea, 587 00:38:01,720 --> 00:38:04,280 Roger and I are convening NPL's 588 00:38:04,280 --> 00:38:07,560 first-ever packet-switching data-dash... 589 00:38:09,040 --> 00:38:12,000 ..which is a bit more complicated than your average sports-day event. 590 00:38:13,440 --> 00:38:16,520 The course is a data network. 591 00:38:16,520 --> 00:38:19,480 There are two computers, represented 592 00:38:19,480 --> 00:38:22,120 here as the start and finish signs. 593 00:38:22,120 --> 00:38:24,720 Those computers are connected by a 594 00:38:24,720 --> 00:38:27,480 series of network cables and nodes. 595 00:38:27,480 --> 00:38:29,760 In our case, cables are lines of 596 00:38:29,760 --> 00:38:33,600 cones and the connecting nodes are Hula Hoops. 597 00:38:35,960 --> 00:38:40,440 Having built it, all we need now are some willing volunteers. 598 00:38:40,440 --> 00:38:41,920 And here they are. 599 00:38:41,920 --> 00:38:44,560 NPL's very own apprentices. 600 00:38:47,760 --> 00:38:50,360 So welcome to our packet-switching sports day. 601 00:38:50,360 --> 00:38:53,240 We've got two teams, red and blue. 602 00:38:53,240 --> 00:38:55,840 'Both teams are pretending to be data 603 00:38:55,840 --> 00:38:58,320 'and they're going to have to race.' 604 00:38:58,320 --> 00:38:59,920 You're going to start over there 605 00:38:59,920 --> 00:39:02,160 where it says "start", kind of obvious, 606 00:39:02,160 --> 00:39:06,000 and you're trying to get through to the end as quickly as you possibly 607 00:39:06,000 --> 00:39:07,760 can. You can't just go anywhere, 608 00:39:07,760 --> 00:39:12,360 you have to go through these hoops to get to the finish line, 609 00:39:12,360 --> 00:39:14,000 these little nodes in our network. 610 00:39:14,000 --> 00:39:17,440 You're only allowed to travel along the lines of the cones, 611 00:39:17,440 --> 00:39:20,840 but only if there's nobody else along that line. 612 00:39:20,840 --> 00:39:24,120 All clear? OK, there is one catch. 613 00:39:24,120 --> 00:39:26,200 All of you who are in the red team, 614 00:39:26,200 --> 00:39:28,320 we are going to tie your feet together. 615 00:39:29,960 --> 00:39:34,800 So you've got to travel round our network as one big chunk of data. 616 00:39:34,800 --> 00:39:38,760 Those of you who are in blue, you are allowed to travel on your own, 617 00:39:38,760 --> 00:39:40,280 so it's slightly easier. 618 00:39:40,280 --> 00:39:43,760 'The objective is for both teams to deposit their beanbags 619 00:39:43,760 --> 00:39:47,360 'in the goal in the right order, one to five.' 620 00:39:47,360 --> 00:39:52,120 EXCITED CHATTER 621 00:39:52,120 --> 00:39:53,760 Get in the hoop! Get in the hoop! 622 00:39:53,760 --> 00:39:55,840 Bring out your competitive spirit here. 623 00:39:55,840 --> 00:39:58,840 We've got packets versus big chunks of data. 624 00:39:58,840 --> 00:40:01,520 I'm going to time you. Everyone ready? 625 00:40:01,520 --> 00:40:03,160 OK, over to you, Roger. 626 00:40:03,160 --> 00:40:05,560 TOOT! 627 00:40:07,760 --> 00:40:10,640 Remember, you can't go down the route until it's clear. 628 00:40:10,640 --> 00:40:13,280 'The red and blue teams are exactly the same size, 629 00:40:13,280 --> 00:40:15,600 'let's say five megabytes each. 630 00:40:15,600 --> 00:40:21,400 'But their progress through the network is clearly very different.' 631 00:40:21,400 --> 00:40:25,120 THEY LAUGH 632 00:40:33,360 --> 00:40:36,720 OK, blues, you took 13 seconds, pretty impressive. 633 00:40:36,720 --> 00:40:38,760 Reds, 20 seconds. 634 00:40:38,760 --> 00:40:40,760 That's a victory for the packet switchers. 635 00:40:40,760 --> 00:40:44,000 Well done, you guys! Well done, you guys. 636 00:40:44,000 --> 00:40:48,000 The impact that packet switching has had on the world, I mean, 637 00:40:48,000 --> 00:40:51,320 it sort of came from here and then spread out elsewhere. 638 00:40:51,320 --> 00:40:54,760 It did indeed, we gave the world packet switching, and the world, 639 00:40:54,760 --> 00:40:59,040 of course, being America, they took it on and ran with it. 640 00:41:01,720 --> 00:41:05,520 This little race, Donald Davies' packet switching, 641 00:41:05,520 --> 00:41:09,560 was adopted by the people that would go on to build the internet, 642 00:41:09,560 --> 00:41:13,400 and today, the whole thing still runs on this idea. 643 00:41:16,720 --> 00:41:19,720 Let's say I want to e-mail you a picture of Molly. 644 00:41:19,720 --> 00:41:24,360 First, it will be broken up into over 1,000 data packets. 645 00:41:24,360 --> 00:41:27,680 Each one is stamped with the address of where it's from and where it's 646 00:41:27,680 --> 00:41:33,280 going to, which routers check to keep the packets moving. 647 00:41:33,280 --> 00:41:35,280 Regardless of the order they arrive, 648 00:41:35,280 --> 00:41:38,920 the image is reassembled, and there she is. 649 00:41:41,480 --> 00:41:43,000 This is quite a cool thing, right, 650 00:41:43,000 --> 00:41:45,320 that you've got one of the original creators 651 00:41:45,320 --> 00:41:48,680 of packet switching right here and you can ask him... 652 00:41:48,680 --> 00:41:51,920 Every time you're like... Well, do anything, really. 653 00:41:51,920 --> 00:41:54,680 "Why is my internet running so slowly?" 654 00:41:54,680 --> 00:41:57,920 THEY LAUGH Don't ask me! 655 00:42:01,840 --> 00:42:06,120 We've come a very long way in just a few decades. 656 00:42:06,120 --> 00:42:10,840 Around 3.4 billion people now have access to the internet at home 657 00:42:10,840 --> 00:42:13,880 and there are around four times the number of phones 658 00:42:13,880 --> 00:42:17,240 and other data-sharing devices online, 659 00:42:17,240 --> 00:42:19,880 the so-called Internet of Things. 660 00:42:22,080 --> 00:42:24,560 Just by being alive in the 21st century 661 00:42:24,560 --> 00:42:29,240 with our phones, our tablets, our smart devices, all of us are 662 00:42:29,240 --> 00:42:30,920 familiar with data. 663 00:42:30,920 --> 00:42:34,720 Really embrace your inner nerd here, because every time you wander around 664 00:42:34,720 --> 00:42:38,520 looking at your screen, you are gobbling up and churning out 665 00:42:38,520 --> 00:42:41,160 absolutely tons of the stuff. 666 00:42:41,160 --> 00:42:43,840 Our relationship with data has really changed - 667 00:42:43,840 --> 00:42:47,480 it's no longer just for specialists, it's for everyone. 668 00:42:49,120 --> 00:42:52,960 There's one city in the UK that's putting the sharing and real-time 669 00:42:52,960 --> 00:42:57,320 analysis of data at the heart of everything it does - 670 00:42:57,320 --> 00:42:58,600 Bristol. 671 00:43:00,080 --> 00:43:03,720 Using digital technology, we take the city's pulse. 672 00:43:04,760 --> 00:43:10,760 This data is the route to an open, smart, liveable city, 673 00:43:10,760 --> 00:43:15,440 a city where optical, wireless and mesh networks 674 00:43:15,440 --> 00:43:21,280 combine to create an open, urban canopy of connectivity. 675 00:43:21,280 --> 00:43:26,240 Taking the pulse of the city under a canopy of connectivity 676 00:43:26,240 --> 00:43:31,120 might sound a bit sci-fi, or like something from a broadband advert. 677 00:43:31,120 --> 00:43:34,040 But if you just hold on to your cynicism for a second, 678 00:43:34,040 --> 00:43:39,840 because Bristol are trying to build a new type of data-sharing network for its citizens. 679 00:43:41,680 --> 00:43:45,200 There's a city-centre area which now has next-generation 680 00:43:45,200 --> 00:43:47,560 or maybe the generation after next 681 00:43:47,560 --> 00:43:52,120 of superfast broadband and then that's coupled to a Wi-Fi network, as well. 682 00:43:52,120 --> 00:43:54,320 The question is, what can you do with it? 683 00:44:01,640 --> 00:44:06,880 We would have a wide area network of very simple Internet of Things 684 00:44:06,880 --> 00:44:10,360 sensing devices that just monitor a simple signal like air quality 685 00:44:10,360 --> 00:44:12,280 or traffic queued in a traffic jam. 686 00:44:12,280 --> 00:44:14,640 Once you've got all this network infrastructure, 687 00:44:14,640 --> 00:44:18,280 you can get an awful lot, a really huge amount of data 688 00:44:18,280 --> 00:44:20,640 arriving to you in real time. 689 00:44:23,280 --> 00:44:26,720 What's happening here is a city-scale experiment 690 00:44:26,720 --> 00:44:28,440 to try and develop and test 691 00:44:28,440 --> 00:44:32,360 what's going to be called the programmable city of the future. 692 00:44:33,880 --> 00:44:37,200 It relies on Bristol's futuristic network, 693 00:44:37,200 --> 00:44:41,200 vast amounts of data from as many sensors as possible 694 00:44:41,200 --> 00:44:43,640 and a computer system that can simulate 695 00:44:43,640 --> 00:44:47,320 and effectively reprogram the city. 696 00:44:47,320 --> 00:44:49,760 The computer system can intervene. 697 00:44:49,760 --> 00:44:53,640 It could reroute traffic and we can actually radio out to individuals, 698 00:44:53,640 --> 00:44:55,880 so maybe they get a message on their smartphone 699 00:44:55,880 --> 00:44:57,520 or perhaps a wrist-mounted device, 700 00:44:57,520 --> 00:45:00,200 saying, "If you have asthma, perhaps you should get indoors." 701 00:45:01,840 --> 00:45:06,720 Once you create that capacity for anything and everything in the city 702 00:45:06,720 --> 00:45:08,520 to be connected together, 703 00:45:08,520 --> 00:45:12,000 you can really start to re-imagine how a city might operate. 704 00:45:12,000 --> 00:45:16,400 We are starting to experiment with driverless cars and, in order for 705 00:45:16,400 --> 00:45:18,320 driverless cars to work, 706 00:45:18,320 --> 00:45:21,680 they have to be able to communicate with the city infrastructure. 707 00:45:21,680 --> 00:45:24,320 So, your car needs to speak to the traffic lights, 708 00:45:24,320 --> 00:45:28,320 the traffic lights need to speak to the car, the cars to speak to each other. 709 00:45:28,320 --> 00:45:32,240 All of that requires a completely different set of infrastructure. 710 00:45:34,240 --> 00:45:38,920 Of course, as the amount of data a city can share grows, 711 00:45:38,920 --> 00:45:43,160 the computing power needed to do something useful with it must grow, too. 712 00:45:45,800 --> 00:45:48,400 And for that, we have the cloud. 713 00:45:50,320 --> 00:45:54,720 For example, imagine trying to analyse all of Bristol's traffic data, 714 00:45:54,720 --> 00:45:57,560 weather and pollution data on your home computer. 715 00:45:57,560 --> 00:45:58,960 It could take a year. 716 00:46:01,480 --> 00:46:07,320 Well, you could reduce that to a day by getting 364 more computers, 717 00:46:07,320 --> 00:46:09,680 but that's expensive. 718 00:46:09,680 --> 00:46:14,320 A cheaper option is sharing the analysis with other computers over the internet, 719 00:46:14,320 --> 00:46:19,600 which Google worked out first, but they published the basics 720 00:46:19,600 --> 00:46:24,240 and now free software exists to help anyone do the same. 721 00:46:24,240 --> 00:46:27,280 Big online companies rent their spare computers 722 00:46:27,280 --> 00:46:28,680 for a few pence an hour. 723 00:46:30,080 --> 00:46:32,960 So, now anyone like me or you 724 00:46:32,960 --> 00:46:37,240 can do big data analytics quickly for a few quid. 725 00:46:41,800 --> 00:46:46,080 Such computing power is something we could never have dreamt of 726 00:46:46,080 --> 00:46:50,160 just a few years ago, but it will only fulfil its potential 727 00:46:50,160 --> 00:46:54,560 if we can share our own data in a safe and transparent way. 728 00:46:55,680 --> 00:47:01,800 If Bristol Council wanted to know where your car was at all times 729 00:47:01,800 --> 00:47:04,520 but could use that information to sort of minimise traffic jams, 730 00:47:04,520 --> 00:47:06,720 how would you feel about something like that? 731 00:47:06,720 --> 00:47:09,320 Er, I'm not sure if I'd particularly like it. 732 00:47:09,320 --> 00:47:12,480 I think it is up to me where I leave my car. 733 00:47:12,480 --> 00:47:16,040 I understand the idea of justifying it with all these great other ideas, 734 00:47:16,040 --> 00:47:18,360 but I still probably wouldn't like it very much. 735 00:47:18,360 --> 00:47:21,160 If they are using it for a better purpose, then yeah, 736 00:47:21,160 --> 00:47:24,880 but one should know how they are using it and why they'll be using it, for what purpose. 737 00:47:24,880 --> 00:47:30,080 I'd like to imagine a world in which all the data that was retained 738 00:47:30,080 --> 00:47:32,160 was used for the greater good of mankind, 739 00:47:32,160 --> 00:47:35,560 but I can't imagine a circumstance like that 740 00:47:35,560 --> 00:47:37,160 in the world that we have today. 741 00:47:37,160 --> 00:47:40,600 We live in a modern society, where if you don't 742 00:47:40,600 --> 00:47:43,440 let your data out there, not in the public domain, 743 00:47:43,440 --> 00:47:46,240 but in a secure business domain, 744 00:47:46,240 --> 00:47:48,080 then you can't take part in society, really. 745 00:47:49,640 --> 00:47:54,840 Unsurprisingly, people are pretty wary about what happens to their data. 746 00:47:56,360 --> 00:47:59,240 We need to be careful that civil liberties are not eroded, 747 00:47:59,240 --> 00:48:03,080 because otherwise the technology is likely to be rejected. 748 00:48:03,080 --> 00:48:07,000 I think it's an area where us as a society have yet to sort of fully 749 00:48:07,000 --> 00:48:11,720 understand what the correct way forward is 750 00:48:11,720 --> 00:48:14,120 and therefore it is very much a discussion. 751 00:48:14,120 --> 00:48:16,600 It's not a lecture, it's not a code, 752 00:48:16,600 --> 00:48:20,600 it's one where we are co-producing and co-forming these sorts of rules 753 00:48:20,600 --> 00:48:24,120 with people in the city, in order to sort of help us work out what the 754 00:48:24,120 --> 00:48:26,800 right and wrong things to do are. 755 00:48:26,800 --> 00:48:31,600 It will be intriguing to watch Bristol grapple with the technological 756 00:48:31,600 --> 00:48:36,240 and ethical challenges of being our first data-centric city. 757 00:48:38,280 --> 00:48:40,640 In all these contexts, Internet of Things... 758 00:48:41,880 --> 00:48:45,800 ..new forms of health care, smart cities, 759 00:48:45,800 --> 00:48:48,640 what we're seeing is an increase in transparency. 760 00:48:49,800 --> 00:48:52,400 You can see through the body, you can see through the house, 761 00:48:52,400 --> 00:48:56,560 you can see through the city and the square, you can see through society. 762 00:48:56,560 --> 00:48:59,320 Now, transparency may be good. 763 00:48:59,320 --> 00:49:03,680 It's something that we may need to handle carefully in order to extract 764 00:49:03,680 --> 00:49:06,960 the value from those data to improve 765 00:49:06,960 --> 00:49:10,600 your lifestyle, your social interactions, 766 00:49:10,600 --> 00:49:13,320 the way in which your city works and so on. 767 00:49:13,320 --> 00:49:17,160 But it also needs to be carefully handled, because it's touching 768 00:49:17,160 --> 00:49:19,880 the ultimate nerve of what it means to be human. 769 00:49:22,480 --> 00:49:25,320 So how much data should you give away? 770 00:49:25,320 --> 00:49:29,480 Traffic management is one thing but when it comes to health care, 771 00:49:29,480 --> 00:49:33,720 the stakes, the risks and benefits are even higher. 772 00:49:35,720 --> 00:49:38,720 And in Bristol, with a project called Sphere, 773 00:49:38,720 --> 00:49:40,920 they're pushing the boundaries here, too. 774 00:49:42,480 --> 00:49:46,360 The population is getting older, and an ageing population needs 775 00:49:46,360 --> 00:49:51,160 more intense health care, but it's very difficult to pay for that health care 776 00:49:51,160 --> 00:49:54,240 in institutions, paying for nurses and doctors. 777 00:49:54,240 --> 00:49:57,800 So, the key insight of the Sphere team was that it's now possible 778 00:49:57,800 --> 00:50:00,560 to arrange, in a house, lots of small devices 779 00:50:00,560 --> 00:50:04,000 where each device is monitoring a simple set of signals 780 00:50:04,000 --> 00:50:06,040 about what's going on in that house. 781 00:50:06,040 --> 00:50:09,080 There might be monitors for your heart rate or your temperature, 782 00:50:09,080 --> 00:50:12,800 but there might also be monitors that notice, as you're going up and down stairs, 783 00:50:12,800 --> 00:50:14,560 whether you're limping or not. 784 00:50:16,000 --> 00:50:19,520 They've invited me to go and spend a night in this 785 00:50:19,520 --> 00:50:24,480 very experimental house, but unfortunately, I'm not allowed to tell you where it is. 786 00:50:24,480 --> 00:50:28,520 The project is a live-in experiment and will soon roll out 787 00:50:28,520 --> 00:50:30,840 to 100 homes across Bristol. 788 00:50:30,840 --> 00:50:35,600 It's a gigantic data challenge, overseen by Professor Ian Craddock. 789 00:50:35,600 --> 00:50:37,040 So, that's one up there, then? 790 00:50:37,040 --> 00:50:39,280 Yes, that's one of the video sensors 791 00:50:39,280 --> 00:50:41,440 and we have more sensors in the kitchen. 792 00:50:41,440 --> 00:50:44,800 We have another video camera in the hall and some environmental sensors, 793 00:50:44,800 --> 00:50:46,360 and a few more in here. 794 00:50:48,240 --> 00:50:51,800 The house can generate 3-D video, 795 00:50:51,800 --> 00:50:55,800 body position, location and movement data 796 00:50:55,800 --> 00:50:57,200 from a special wearable. 797 00:50:58,440 --> 00:51:00,200 How much data are you collecting, then? 798 00:51:00,200 --> 00:51:03,440 So, when we scale from this house to 100 houses in Bristol, 799 00:51:03,440 --> 00:51:07,400 in total we'll be storing over two petabytes of data for the project. 800 00:51:07,400 --> 00:51:12,440 Lord. So, on my computer at home, I don't even have a terabyte hard drive 801 00:51:12,440 --> 00:51:15,160 and you're talking about 20,000 of those. 802 00:51:15,160 --> 00:51:18,120 Yes. I mean, you know, the interaction of people with their environment 803 00:51:18,120 --> 00:51:21,440 and with each other is a very complicated and very variable thing 804 00:51:21,440 --> 00:51:24,320 and that's why it is a very challenging area, 805 00:51:24,320 --> 00:51:26,720 especially for data analysts, 806 00:51:26,720 --> 00:51:29,520 machine learners, to make sense of this big mass of data. 807 00:51:31,520 --> 00:51:34,800 I'm happy to find out that the research doesn't call 808 00:51:34,800 --> 00:51:37,760 for cameras in the bedroom or bathroom, 809 00:51:37,760 --> 00:51:40,920 but I do have to be left entirely on my own for the night. 810 00:51:43,240 --> 00:51:48,440 The very first thing I'm going to do is pour myself a nice bloody big 811 00:51:48,440 --> 00:51:50,080 glass of wine. There we go. 812 00:51:52,640 --> 00:51:56,120 So, that nice glass of wine that I'm enjoying isn't completely guilt-free, 813 00:51:56,120 --> 00:51:59,960 because I've got to admit to it to the University of Bristol. 814 00:52:01,200 --> 00:52:04,200 I have to keep a log of everything I do, 815 00:52:04,200 --> 00:52:08,720 so that the data from my stay can be labelled with what I actually got up to. 816 00:52:08,720 --> 00:52:12,600 In this way, I'll be helping the process of machine learning, 817 00:52:12,600 --> 00:52:16,480 teaching the team's computers how to automatically monitor things like 818 00:52:16,480 --> 00:52:19,520 cooking, washing and sleeping, 819 00:52:19,520 --> 00:52:22,360 signals in the data of normal behaviour. 820 00:52:27,800 --> 00:52:29,840 In the interests of science. 821 00:52:29,840 --> 00:52:34,680 'I was also asked to do some things that are less expected.' 822 00:52:34,680 --> 00:52:37,880 Oh! I spilled my drink. 823 00:52:37,880 --> 00:52:41,120 'The team need to learn to detect out-of-the-ordinary behaviour, too, 824 00:52:41,120 --> 00:52:45,040 'if they want to, one day, spot specific signs of ill health.' 825 00:52:46,720 --> 00:52:51,000 Right, I'm going to run this back to the kitchen now. 826 00:52:52,320 --> 00:52:55,640 It's a fairly strange experience. 827 00:52:55,640 --> 00:52:58,600 I think the temperature sensors, the humidity sensors, 828 00:52:58,600 --> 00:53:01,520 the motion sensors, even the wearable 829 00:53:01,520 --> 00:53:04,320 I don't have a problem with at all. 830 00:53:04,320 --> 00:53:08,960 For some reason the body position is the one that's getting me. 831 00:53:08,960 --> 00:53:13,480 On the flipside, though, I would go absolutely crazy to have this data. 832 00:53:13,480 --> 00:53:16,760 This is the most wonderful... My goodness me. 833 00:53:16,760 --> 00:53:20,520 Everything you could learn about humans. It would be so brilliant. 834 00:53:26,360 --> 00:53:30,640 One thing I wanted to do was to do something completely crazy 835 00:53:30,640 --> 00:53:33,840 just to see if they can spot it in the data. Just to kind of test them. 836 00:53:36,520 --> 00:53:38,280 OK, ready? 837 00:53:44,680 --> 00:53:46,640 I can't believe this is my life now. 838 00:53:48,520 --> 00:53:53,320 'Anyone can get the data from my stay online if they fancy trying to find 839 00:53:53,320 --> 00:53:56,320 'my below-the-radar escape. 840 00:53:56,320 --> 00:53:59,760 'The man in charge of machine learning, Professor Peter Flach, 841 00:53:59,760 --> 00:54:00,840 'has the first look.' 842 00:54:02,440 --> 00:54:04,560 Between nine and ten, you were cooking. 843 00:54:04,560 --> 00:54:05,680 Correct. 844 00:54:05,680 --> 00:54:09,760 Then you went into the lounge. You had your meal in the lounge. 845 00:54:09,760 --> 00:54:11,400 You know what? I ate on the sofa. 846 00:54:11,400 --> 00:54:13,240 And you were watching crap television. 847 00:54:13,240 --> 00:54:14,640 I was watching crap television? 848 00:54:14,640 --> 00:54:16,080 I've been found out. 849 00:54:16,080 --> 00:54:20,920 We didn't switch the crap-television sensor on. That's not on here, but OK. 850 00:54:20,920 --> 00:54:25,960 So, you were in the lounge sort of until 11:30. Correct. 851 00:54:27,640 --> 00:54:30,760 Then you went upstairs, there's a very clear signal here. 852 00:54:30,760 --> 00:54:35,120 And then, from then on, there isn't a lot of movement. I was in bed. 853 00:54:35,120 --> 00:54:36,640 So, I guess you were in bed. 854 00:54:36,640 --> 00:54:38,120 Sleeping. 855 00:54:38,120 --> 00:54:41,920 Normal activities, like cooking or being in bed, are relatively 856 00:54:41,920 --> 00:54:43,600 straightforward to spot. 857 00:54:43,600 --> 00:54:45,440 But what about the weird stuff? 858 00:54:47,000 --> 00:54:49,320 This is yesterday, again. 859 00:54:49,320 --> 00:54:51,440 I can see it. I can see the moment. 860 00:54:51,440 --> 00:54:54,440 You can see the moment? I can see it, yeah. 861 00:54:54,440 --> 00:54:59,320 There's something happening here which is sort of rather quick. 862 00:54:59,320 --> 00:55:02,680 You've been in the lounge for quite a while and then, suddenly, 863 00:55:02,680 --> 00:55:06,640 there's a brief move to the kitchen here 864 00:55:06,640 --> 00:55:11,280 and then very quick cleaning up in the lounge. 865 00:55:11,280 --> 00:55:15,160 I wasted good wine on this experiment. Good wine? 866 00:55:15,160 --> 00:55:19,400 Humans are extraordinarily good at spotting most patterns. 867 00:55:21,000 --> 00:55:23,960 For machines, the task is much more challenging, 868 00:55:23,960 --> 00:55:26,800 but, once they've learned what to look for, 869 00:55:26,800 --> 00:55:28,600 they can do it tirelessly. 870 00:55:30,640 --> 00:55:32,240 I suppose, in the long run, 871 00:55:32,240 --> 00:55:35,600 if you are going to scale this up to more houses, 872 00:55:35,600 --> 00:55:40,400 you can't have people sifting through these graphs trying to find... 873 00:55:40,400 --> 00:55:42,440 I mean, you have to train computers to do them. 874 00:55:42,440 --> 00:55:47,880 You have to train computers to do them. One challenge that we are facing is that our models, 875 00:55:47,880 --> 00:55:51,720 our machine learning classifiers and models, need to be robust 876 00:55:51,720 --> 00:55:56,320 against changes in layout, changes in personal behaviour, 877 00:55:56,320 --> 00:55:59,080 changes in the number of people that are in a house. 878 00:55:59,080 --> 00:56:02,760 And maybe we are wildly optimistic about what it can do, 879 00:56:02,760 --> 00:56:06,840 but we are in the process of trying to find out what it can do, 880 00:56:06,840 --> 00:56:08,480 at what cost, at what... 881 00:56:09,600 --> 00:56:14,920 ..invasion into privacy, and then we can have a discussion about whether, 882 00:56:14,920 --> 00:56:16,720 as a society, we want this or not. 883 00:56:18,400 --> 00:56:21,160 If this type of technology rolls out, 884 00:56:21,160 --> 00:56:24,440 machines will be modelling us in mathematical terms 885 00:56:24,440 --> 00:56:28,800 and intervening to help keep us healthy in real time - 886 00:56:28,800 --> 00:56:30,680 and that's completely new. 887 00:56:32,520 --> 00:56:38,280 It's true that our fascination with machine, or artificial, intelligence 888 00:56:38,280 --> 00:56:40,760 is as old as computers themselves. 889 00:56:40,760 --> 00:56:44,400 Claude Shannon and Alan Turing both explored the possibilities 890 00:56:44,400 --> 00:56:46,080 of machines that could learn. 891 00:56:48,120 --> 00:56:50,040 But it's only today, 892 00:56:50,040 --> 00:56:53,320 with torrents of data and pattern-finding algorithms, 893 00:56:53,320 --> 00:56:57,240 that intelligent machines will realise their potential. 894 00:57:00,640 --> 00:57:03,720 You'll hear a lot of heady stuff about what's going to happen when we 895 00:57:03,720 --> 00:57:06,520 mix big data with artificial intelligence. 896 00:57:06,520 --> 00:57:10,920 A lot of people, understandably, are very anxious about it. 897 00:57:10,920 --> 00:57:13,160 But, for me, despite how much the world has changed, 898 00:57:13,160 --> 00:57:16,880 the core challenge is the same as it always was. 899 00:57:16,880 --> 00:57:20,880 It doesn't matter if you are William Farr in Victorian London trying to 900 00:57:20,880 --> 00:57:25,600 understand cholera or in one of Bristol's wired-up houses, 901 00:57:25,600 --> 00:57:29,640 all you're trying to do is to understand patterns in the data 902 00:57:29,640 --> 00:57:32,680 using the language of mathematics. 903 00:57:32,680 --> 00:57:36,400 And machines can certainly help us to find those patterns, 904 00:57:36,400 --> 00:57:39,160 but it takes us to find the meaning in them. 905 00:57:40,680 --> 00:57:44,240 We should be worried about what we're going to do with these smart technologies, 906 00:57:44,240 --> 00:57:47,560 not about the smart technologies in themselves. 907 00:57:47,560 --> 00:57:50,280 They are in our hands to shape our future. 908 00:57:51,720 --> 00:57:53,800 They will not shape our futures for us. 909 00:58:00,680 --> 00:58:04,840 In the blink of an eye, we have gone from a world where data, 910 00:58:04,840 --> 00:58:09,560 information and knowledge belonged only to the privileged few, 911 00:58:09,560 --> 00:58:12,720 to what we have now, where it doesn't matter if you're trying to work out 912 00:58:12,720 --> 00:58:17,400 where to go on holiday next or researching the best cancer treatments. 913 00:58:17,400 --> 00:58:20,680 Data has really empowered all of us. 914 00:58:20,680 --> 00:58:23,960 Now, of course, there are some concerns about big corporations 915 00:58:23,960 --> 00:58:29,040 hoovering up the data traces that we all leave behind in our everyday lives, 916 00:58:29,040 --> 00:58:33,840 but I, for one, am an optimist as well as a rationalist 917 00:58:33,840 --> 00:58:37,920 and I think that if we can marshal together the power of data, 918 00:58:37,920 --> 00:58:43,560 then the future lies in the hands of the many and not just the few. 919 00:58:43,560 --> 00:58:47,400 And that, for me, is the real joy of data. 920 00:58:47,400 --> 00:58:49,160 MUSIC: Good Vibrations by The Beach Boys