Half of the internet uploads photos these days, it seems

instagram_phone_bw

The creative class keeps on rising, and expanding.

For the first time, over half of the US adult internet population has posted a photo or video, according to the latest data from the Pew Internet & American Life Project.

Pew surveyed 1000 adults and found that 54% of adult internet users have posted some of their own photos or video to the web. Last year, that figure was 46%, meaning that 8% of adults on the internet posted their own photos or video to the web within the last year.

7769BF0C5E0B4F4582BF57A9602A5590

Almost as many are content curators—47% have shared a photo or video that someone else uploaded. I checked the crosstabs for how many of those were cat macros, but sadly, Pew didn’t ask that question.

“Pictures document life from a special angle,” stated Maeve Duggan, one of the authors of the study, with no apparent trace of irony.

For the first time, Pew did ask adult internet users whether they used certain apps. Turns out 18% use Instagram and 9% use Snapchat. There’s no data on teens, though, for which Snapchat ought to be substantially higher.

Another caveat to those numbers: Pew didn’t ask how many of the people who are “using” Instagram and Snapchat if they were also posting pictures to the service, or just looking at the feeds of the people they follow.

With the rise of smartphones (only 4% in the survey didn’t know what the term meant), it’s no surprise to find that everyone is a photographer these days. Even on pro-leaning Flickr, the most popular camera is an iPhone—and so is the second most popular, and the third.

flickr_most_popular_cameras

The margins of error of Pew’s results varied from +/- 3.7–4.0 percentage points.

Advertisements

This is my homework assignment.

my_facebook_network

The story of my life, more or less.

Yes, this plot of Skittles is my homework—for an online class I’m taking through Coursera called Social Network Analysis. This was our first assignment: using a couple of free online tools to download your Facebook friends data and visualize your network of friends.

The ease with which I was able to complete the assignment made me thankful for how (relatively) accessible network science is as a field. Thanks to social network APIs and open-source software, the tools you need to analyze your own social data are easily available. Consider it the social networking version of 23andme—personal memomics, if you will. (And thanks to the NSA, awareness has gone up, too!)

Taking the graph of my network above, each circle is a friend of mine (known as a “node” in the parlance of graph theory) and each link (or “edge”) between nodes indicates they’re Facebook friends. Not all my friends are connected to all my other friends—there were some free-floating clusters. But for the sake of clarity, I’ve shown only the largest connected component (LCC) of my graph.

The spacing of the nodes is determined by an algorithm based on simple physics. In it, each node is repulsed from each other, like magnets. Each link is like a spring, tugging groups of people together based on their common friendships. Another algorithm detects these clusters, draws boundaries, and then assigns them colors. (I went through and annotated some of them.)

I want to emphasize that I’m not in the center of this graph—a so-called ego network. I’m not in the graph at all! Each link between nodes is a direct friendship between those individuals. It shows how my friends are connected to each other, not how connected I am to them. In other words, this a disclaimer to all my friends out there: You’re all awesome, and how central you are in the chart has nothing to do with how important you are to me!

So what can network science tell me? The analysis tool, called Gephi, calculates several standard network metrics. For example, the “average path length” across the network is 5.2. In other words, there are, on average, just over 4 degrees of separation between any two people in the network—indicating a “smaller” network than the famous “six degrees of separation” maxim. This pattern is replicated over Facebook as a whole—the company’s data team reported in 2011 that the entire network had an average of just 3.74 degrees of separation, and that it was decreasing each year. The world is shrinking.

But for me, the most fascinating aspect wasn’t the numbers, but simply zooming in and browsing through my graph. As a record of my social life, it’s a strange thing. It’s all there—my friendships, my relationships, my bridges I’ve burned. By tracing edges to their nodes, I can remember the moments that linked them together, the chance friendships that I intend to keep for a lifetime, forming structures like filaments of galaxies fanning out across the night sky.

Some of the most interesting connections are the ones that unexpectedly link clusters. A kid from high school who’s now a b-boy in Seoul. Or the person who subletted my room one summer in college and then ran into a classmate from grad school while they were visiting physics grad schools. The people who are connected that you had no idea knew each other.

And isn’t it funny how some of your best friends can be the smallest nodes?

The Error Cone and Visualizing Uncertainty

Tropical Storm Karen Advisory 3

The National Hurricane Center’s 3rd advisory issued for Tropical Storm Karen.

When we’re kids, one of the first subjects in which we learn the concepts of probability and uncertainty is the weather. It’s perhaps the only area of our life in which we all use probabilistic models on a daily basis to guide our decisions—decisions that can come back to bite us. It’s one thing when Nature decides to deliver on that 10% chance of rain; it can be catastrophic when a hurricane makes good on a 10% chance landfall.

In a post last week, I wrote about conveying uncertainty in exoplanet detection—a matter of curiosity. But conveying uncertainty in a hurricane’s predicted track is a matter of public safety. So it would make sense for the National Hurricane Center to take great pains in communicating uncertainty to the public. Its method of visualizing it is known as the “error cone.”

Originating at the current location of the hurricane’s center, it expands along the predicted path to show how the forecasted path becomes more uncertain in the longer term. To be specific, the edge of the cone represents a 67% chance that the hurricane remains inside the cone based on the accuracy of the past five years of forecasts.

But there are some well-known issues with the error cone. For starters, it can give the false impression that it represents the extent of the storm itself, not the extent of its predicted track. Interpreted that way, it seems that the storm expands over time. Another is that by drawing a hard line in the sand at the 67% contour, it gives people just outside the cone a false sense of security, despite the fact that there’s a 1-in-6 chance the hurricane will deviate outside of the cone towards them. (If you’re wondering why it’s not 1-in-3, it’s there’s also a 1-in-6 chance it goes outside the cone on the other side.)

The issue is that a hurricane’s predicted path isn’t a probability—it’s a probability distribution. Some places are more probable than others to lie along the path, but there’s no clear-cut boundary. Choosing an arbitrary 67% contour is convenient, but it’s an awful way to convey the full distribution of possible tracks.

A team of scientists led by Jonathan Cox of Clemson University recently published an alternative method of visualizing a hurricane’s predicted path that looks like this:

What they’ve done is simulate the hurricane’s path hundreds of times, but rigged the simulation’s settings so that it should have the same statistical distribution as the error cone. It’s a bit like loading dice. There’s an element of randomness in each track, but after generating hundreds of tracks, they cluster around the original, predicted track. They also check after each track to make sure the overall set is similar to the error cone. If they’re making too many tracks outside the error cone, they reset the simulations so it will make more inside of it. It’s another application of Monte Carlo models.

The authors don’t claim to have evidence yet that this method leads to a more accurate public perception. (I can think of one possible objection: since the tracks must necessarily diverge, the decreased density makes the tracks appear fainter, which could give a false impression that the storm will get weaker.) But they do report results from a small focus group in their study and found that almost all preferred their new method: in addition to giving a better sense of the dynamic nature of hurricane tracks, it was also simply more visually interesting.

Why auto racing is a geek’s dream sport

Hello, geek.

Hello, you science nerd, you technology aficionado, you analytical thinker, you.

Do you like watching sports?

I ask because there is a sport that will appeal to every aforementioned aspect of your personality, although judging from American TV viewing figures, you are probably not paying attention to it—even though its competitors are geeks, just like you. It is the pinnacle of automobile racing, the league known as Formula 1.

A Ferarri and Red Bull scream around the streets of Singapore in 2011. Photo: Chuljae Lee / CC

When it comes to adrenaline, these cars have no match. They’re screaming, winged rockets of carbon fiber cradling a driver with no roof over his head at top speeds exceeding 200 mph. There are no fenders to protect the wheels and suspension as they strain under the 5 Gs of stress that these cars exert as they scream around corners.

But despite that, forget the notion that modern racing is an exercise in pure sensation and blind bravery. Nor is it the gentlemanly pastime of European princes, hobbyist mechanics, and thrill-seeking rascals that it once was many decades ago. Today, more than any other sport, F1 is driven by design and data. It’s engineering. It’s technology. It’s physics soup for the scientific soul.

It’s no wonder that when Ron Howard began production on his 1970s-era F1 pic Rush, he described the world he found as a “combo of engineering brilliance and fearless courage [that] reminded me of people I met at NASA while directing Apollo 13.”

The workings of an F1 team are relentless, iterative, like a computer algorithm designed to obtain a minimum value: for a race distance of 305 km, solve for the shortest time possible.

Watching a race on TV, it’s almost startling to hear the quantitative way in which the most competent commentators analyze the race as it unfolds—the cars are going over 200 mph and the guys on TV are calculating fuel loads and tire wear. It’s a bit like that epic moment in Apollo 13 when astronaut Jim Lovell is struggling to convert the gimbal angles from the stricken command module to the lifeboat lunar module and everyone in Mission Control whips out their slide rule.

To see a bit of this strategy and how F1’s geeks solve it, consider the quandary teams face when planning pit stops to change tires. A typical race might last between 50–80 laps, but the tires on an F1 car wear quickly, and each successive lap takes a tenth of a second longer on average, or more. Changing to fresh rubber means the drivers regain their speed, but a total of about 20 seconds is lost as the team swaps tires and the driver obeys a 100 km/hr speed limit on pit lane. (This is called the “bogey time” and is measured by the teams at each track.) So how often should a driver sacrifice those 20 seconds to gain back the most time on fresh rubber?

The math works out to be 1 to 3 times during a race, depending on the rate of wear, trading 20 to 60 seconds in the pits for the consistently quicker lap times on fresh tires.

But when? Imagine you’re the leader of the race. If you time it too early, you may emerge from the pits in the middle of the swarming peloton of cars, fighting with them for position. That would cost you precious time. Perhaps you should wait a handful of laps and let the cars behind you pit first.

But wait. If they pit first, they will have fresh tires while you are running around on worn rubber, bleeding time each lap. By the time you pit, the other cars may have leapfrogged you as you sit in pit lane. (This tactic is called the “undercut”.)

Now perhaps, my geeky race strategist, you have determined the perfect laps on which to pit to minimize your time (and made sure that your team is free of moles who might leak your strategy—a very real danger). But here’s the thing: the other teams can calculate their numbers just as well as you can. What are they likely to do? Well, it depends. Does that change what should you do? Maybe.

No computer could find a single perfect solution for this kind of problem. It’s mathematically impossible; there are simply too many variables. Instead, the best method is to simulate tens of thousands of races, randomly trying as many different strategies as you can to see which ones result in you winning the race the most times.

This kind of technique is called a Monte Carlo method, named since every simulation is like a gambler’s roll of the dice. It was enabled by the rise of computers and pioneered on the primitive ENIAC. Today, it’s ubiquitous. It’s the same probabilistic math that Nate Silver uses to predict elections and that scientists use to forecast the paths of hurricanes—the rolling of multitudes of virtual dice to see which outcomes are most likely to come true, down which branches of reality the river of time will meet the least resistance. And it’s why the top F1 teams have squads of statisticians and data analysts working in Mission Control-style computer rooms back in their factories during a race, conducting their simulations, feeding their teams the latest model runs and dictating race strategy.

So what does this mean for you, dear geek? For one, the raw timing data is available to view at Formula1.com during races. Observing the lap times and the gaps between cars will allow you to see strategies unfold faster than the TV announcers can comment on them. If you want to go even further, there is an open source API project to intercept the data, allowing you to write your own code and make your own predictions.

F1 isn’t just about watching a competition—it also gives fans the chance to experience the joy of watching an outcome emerge from a sea of data. That’s something every geek can appreciate.

The FAP trap

rendering of Alpha Centauri system

In this artist’s rendering, the exoplanet Alpha Centauri Bb looms in the foreground, with the Alpha Centauri binary system in the background.

Almost one year ago, a team of astronomers announced a detection of a rocky exoplanet right next door in the star system Alpha Centauri, the closest to our own solar system. Yes, Alpha Centauri—that near-mythical system that has such a hold on our imagination, its fictional appearances have their own Wikipedia article.

Ok, ok, so this planet, named Alpha Centauri Bb, wasn’t actually habitable. It was too close to its star, more like a scorched, oversized Mercury than Earth. But the fact that a small rocky planet was right next door boded well for the likelihood that rocky planets were everywhere. Debra Fischer, a Yale exoplanet researcher, told the New York Times it was the “story of the century.” If Joe Biden were an astronomer, he’d have called it a big fucking deal.

Except…the detection wasn’t quite a slam dunk. The team, based in Geneva and led by astronomer Xavier Dumusque, found the planet by detecting the wobble that its gravity exerts on its star. But that wobble was so small that its signal was buried deep, deep within the noise of the data. They had to attempt to control for 23 different effects that could have thrown off their measurements—things like the star’s pulsations and magnetic spots. It was only after stripping them away, one by one, that a signal started to emerge. Here’s what it looked like:

alphacenbb

Dumusque et al. (2012), Figure 5

All those scattered little dots that seem almost random—that’s the post-analysis data. But the red dots are what you get when you group data points that are close together and average them. That’s how the team was able to recover their signal. They reported that the odds that the data in the plot could have been a fluke of nature (a statistic called the False Alarm Probability, or FAP) were pretty slim: one in a thousand.

This was a key point that many journalists picked up on and quoted the authors repeating it in a press conference to bolster the case for the planet. To wit:

Mike Wall at Space.com: “Udry, however, said that the team’s statistical analyses show a ‘false alarm probability’ of just one in 1,000 — meaning there’s a 99.9 percent chance that the planet exists.”

Ian Sample in The Guardian: “The astronomers told a press briefing that the chance of their discovery being false was about one in 1,000…”

And Camille Carlisle in Sky & Telescope: “Study coauthor Stéphane Udry (Geneva Observatory) noted in a press conference earlier this week that there is one chance in 1,000 that the signal his team sees is a fluke.”

Well, that sounded like pretty good odds to me. That is, until early this summer when exoplanet astronomer Artie Hatzes published a paper in which he did his own analysis of the same data, and found nothing. In fact, he concluded that if you assumed the planet was there, he should have found it with a confidence of 99%.

So hang on a second. According to the Geneva team, they have only a 1/1000 chance of being wrong. But Hatzes finds the opposite, and says there’s only a 1/100 chance that Geneva are right. So who’s “correct”? What do those numbers even mean?

So I asked Debra Fischer. Her answer confirmed my thinking. That False Alarm Probability of 1/1000? That’s the probability that the data in that plot is a fluke—but remember, that’s the data after all of their analysis. In other words, the 1/1000 figure holds only if you assume that their analysis of those 23 parameters is absolutely perfect. It’s a comparison of the signal against the flukey nature of reality, but says nothing about the confidence in the analysis that led to that signal in the first place!

Yikes. That’s a difference with a big distinction, and one that got very little play in the media. (And it’s a point I didn’t call out when I wrote about Hatzes paper for Sky & Telescope.)

Now, that doesn’t mean the analysis is junk. Dumusque and his team weren’t trying to hide anything about their analysis—quite the opposite, in fact. They released their data publicly, inviting scrutiny; that’s what enabled Hatzes to do his independent analysis. And Dusmusqe’s team did a check of their analysis as part of their original study to see if it might introduce a false signal and concluded it did not. So Alpha Centauri Bb is not dead—not by a long shot. Both Dumusque and Fischer are currently analyzing fresh observations to try to get that slam-dunk confirmation. (Peter Edmonds has written an excellent blog post taking a look at the whole saga.)

But it does mean that it’s difficult to quantify how convincing the data are as they stand, and that the FAP is not the entire story. For a journalist, that is difficult to explain to the public. It’s yet another example of how tricky it can be to communicate probability and uncertainty—both from scientists to journalists, and from journalists to the public. That False Alarm Probability might be alluringly small, but we better make sure we know what it means.

Now, this may seem like an esoteric case. Alpha Centauri Bb winking out of existence would be a big disappointment, but not, say, hazardous to anyone’s health. But it’s not hard to see how the latter case is problematic. Perhaps the biggest shift wrought by our era of Big Data isn’t the sheer amount of data but that the nature of reality and our predictions of the future are increasingly described in probabilistic terms—in everything from election results to climate change. When we communicate this, we all have to work hard to get it right.

Online gameworlds less linked other social networks

claimtoken-5155e59b3fc56

fantasy-mmo-games-everquest-2-extended-players-screenshot Massively multiplayer online games (MMOs) are, by their very nature, social games. Players join guilds, team up to raid others, and participate in a virtual economy that has real-life value. But the social networks forged in online gameworlds may not be as connected as their real world counterparts, according to research presented at the annual meeting of the American Association for the Advancement of Science in Boston earlier this month.

Scientists at the University of Minnesota performed an analysis of data from servers for Everquest II and Eve Online that demonstrates that the networks of social links formed by the players through virtual teamwork, trading, and mentoring don’t become as tightly woven as normal social networks such as Facebook or in real life. In fact, instead of becoming more connected as the network grows, players tend, on average, to grow more distant from one another. While networks in the real world shrink, tying people closer to one another as its population expands, an MMO pulls them apart, its population full of players with few or no social ties to other players.

To conduct their analysis, the researchers broke down the vast web of networks on game servers into their individual cliques. As is typical in social networks, a strong core of connected players tends to form, and most people are linked to it through at least one tie to another player. This main core is referred to as the Largest Connected Component (LCC). But some people belong to smaller splinter groups that remain outside and unconnected to the LCC.

One metric for measuring the connectedness of a social network is to consider the maximum number of links it takes to get from one person to another in the LCC—the common notion of “degrees of separation”. The maximum distance across the LCC of a social network is known as the network’s “diameter”. In a typical social network, said Muhammad Aurangzeb Ahmad, lead author on the study, the network diameter starts out large—meaning that the early adopters aren’t really very well connected—and then consistently decreases over time, becoming more linked. Thus, somewhat counterintuitively, even as the overall population of the network increases, the interaction of members tends to shrink the diameter and the members of the network become socially closer to one another.

But what surprised the researchers was that in an MMO, the network diameter doesn’t shrink. For instance, when one Everquest server was first initialized, the researchers found that in its early stages of growth, the diameter was 17 degrees of separation. The researchers’ baseline model simulating standard behavior showed that the server’s diameter ought to quickly shrink until it reached six degrees of separation, a typical real-world value. But instead, what their data showed was that the diameter oscillated around 17 degrees of separation and then grew, indicating that, on average, a given person would need to make more connections to find someone else.

Another way of measuring the amount of interaction is to measure the total population of the LCC over time. As most social networks grow more interconnected, there usually comes a point
where the two largest components merge, causing a spike in the size of the new, combined LCC. This is what Ahmad calls the “gelling point,” after which, the behavior of the network changes and the LCC quickly amasses the vast majority of people and continues to grow. The second- and third-largest components tend to stay at about the same size, acting only as temporary cliques—people begin by joining a fringe group, but then move on to the main social club. But the social gelling point of MMO networks tends to be strangely mild. Ahmad and his colleagues found that even eight months after the gelling point on a typical Everquest II server, a full 41% of the gameworld’s players were still moving in outsider social circles, not connected to the LCC.

Ahmad gives a number of possible reasons for why MMOs, despite the mechanics of teamwork
built into the games, seem to create more social distance between players. Some may be joining a guild and then not spending anytime interacting outside of their small component. Some may be dropouts who become inactive and do not play the game again. Others might be true solo players who eschew the social mechanics of the online world and try to complete the games’ tasks by themselves. Either way, he cautions against applying currently accepted social network models to MMOs. The behavior is fundamentally different than in the real world—possibly, he acknowledges, due to the goal-oriented nature of a gameworld.

Screenshot: MMOGames.com