As a child, Danica McKellar played Winnie Cooper on The Wonder Years. After the show was over, McKellar had difficulty breaking away from other people's perceptions of her. But in college, she discovered an aptitude for mathematics, went on to have a theorem named after her -- not because she was famous but because she'd helped prove it -- and forged a new identity. (via @stevenstrogatz)

When Grade-A nerds get together and talk about programming and math, a popular topic is P vs NP complexity. There's a lot to P vs NP, but boiled down to its essence, according to the video:

Does being able to quickly recognize correct answers [to problems] mean there's also a quick way to find [correct answers]?

Most people suspect that the answer to that question is "no", but it remains famously unproven.

In fact, one of the outstanding problems in computer science is determining whether questions exist whose answer can be quickly checked, but which require an impossibly long time to solve by any direct procedure. Problems like the one listed above certainly seem to be of this kind, but so far no one has managed to prove that any of them really are so hard as they appear, i.e., that there really is no feasible way to generate an answer with the help of a computer.

The curve shaped by the CDC's available statistics, however, does allow one to estimate the number of American men between the ages of 20 and 40 who are 7 feet or taller: fewer than 70 in all. Which indicates, by further extrapolation, that while the probability of, say, an American between 6'6" and 6'8" being an NBA player today stands at a mere 0.07%, it's a staggering 17% for someone 7 feet or taller.

Being seven feet tall is absurdly tall and comes with a whole host of challenges, from bumping one's head on door frames to difficulty finding clothes to health issues. Some of these difficulties arise out of simple geometry: as height and width increase, volume increases more quickly.^{1}

Mathematicians have calculated pi out to more than 13 trillion decimal places, a calculation that took 208 days. NASA's Marc Rayman explains that in order to send out probes and slingshot them accurately throughout the solar system, NASA needs to use only 15 decimal places, or 3.141592653589793. How precise are calculations with that number? This precise:

The most distant spacecraft from Earth is Voyager 1. It is about 12.5 billion miles away. Let's say we have a circle with a radius of exactly that size (or 25 billion miles in diameter) and we want to calculate the circumference, which is pi times the radius times 2. Using pi rounded to the 15th decimal, as I gave above, that comes out to a little more than 78 billion miles. We don't need to be concerned here with exactly what the value is (you can multiply it out if you like) but rather what the error in the value is by not using more digits of pi. In other words, by cutting pi off at the 15th decimal point, we would calculate a circumference for that circle that is very slightly off. It turns out that our calculated circumference of the 25 billion mile diameter circle would be wrong by 1.5 inches. Think about that. We have a circle more than 78 billion miles around, and our calculation of that distance would be off by perhaps less than the length of your little finger.

When was humanity's calculation of pi accurate enough for NASA? In 1424, Persian astronomer and mathematician Jamshid al-Kashi calculated pi to 17 digits.

In the book In Pursuit of the Unknown, Ian Stewart discusses how equations from the likes of Pythagoras, Euler, Newton, Fourier, Maxwell, and Einstein have been used to build the modern world.

I love how as time progresses, the equations get more complicated and difficult for the layperson to read (much less understand) and then Boltzmann and Einstein are like, boom!, entropy is increasing and energy is proportional to mass, suckas!

The "hidden" mathematics and order behind everyday objects & phenomenon like spinning tops, dice, magnifying glasses, and airplanes. (via @stevenstrogatz)

Steven Strogatz walks us through the first mathematical proof Albert Einstein did when he was a boy: a proof of the Pythagorean theorem.

Einstein, unfortunately, left no such record of his childhood proof. In his Saturday Review essay, he described it in general terms, mentioning only that it relied on "the similarity of triangles." The consensus among Einstein's biographers is that he probably discovered, on his own, a standard textbook proof in which similar triangles (meaning triangles that are like photographic reductions or enlargements of one another) do indeed play a starring role. Walter Isaacson, Jeremy Bernstein, and Banesh Hoffman all come to this deflating conclusion, and each of them describes the steps that Einstein would have followed as he unwittingly reinvented a well-known proof.

Twenty-four years ago, however, an alternative contender for the lost proof emerged. In his book "Fractals, Chaos, Power Laws," the physicist Manfred Schroeder presented a breathtakingly simple proof of the Pythagorean theorem whose provenance he traced to Einstein.

Everyone knows that the square of the hypotenuse of a right triangle is equal to the sum of the squares of the other two sides. What this video presupposes is, fuck yeah math!

Some iOS apps still seem like magic. Case in point: PhotoMath. Here's how it works. You point your camera at a math problem and PhotoMath shows the answer. It'll even give you a step-by-step explanation and solution.

Pascal's triangle^{1} is a simple arrangement of numbers in a triangle...rows are formed by the successive addition of numbers in previous rows. But out of those simple rows comes deep and useful mathematical relationships related to probability, fractals, squares, and binomial expansions. (via digg)

As the video says, Pascal was nowhere near the discoverer of this particular mathematical tool. By the time he came along in 1653, the triangle had already been described in India (possibly as early as the 2nd century B.C.) and later in Persia and China.↩

The Fibonacci Shelf by designer Peng Wang might not be the most functional piece of furniture, but I still want one.

The design of the shelf is based on the Fibonacci sequence of numbers (0, 1, 1, 2, 3, 5, 8, 13, 21, ...), which is related to the Golden Rectangle. When assembled, the Fibonacci Shelf resembles a series of Golden Rectangles partitioned into squares. (via ignant)

If you divide 1 by 999,999,999,999,999,999,999,998,999,999,999,999,999,999,999,999 (that's 999 quattuordecillion btw), the Fibonacci sequence neatly pops out. MATH FTW!

At the end of Carl Sagan's Contact (spoilers!), the aliens give Ellie a hint about something hidden deep in the digits of π. After a long search, a circle made from a sequence of 1s and 0s is found, providing evidence that intelligence was built into the fabric of the Universe. I don't know if this Fibonacci division thing is on quite the same level, but it might bake your noodle if you think about it too hard. (via @stevenstrogatz)

Update: From svat at Hacker News, an explanation of the magic behind the math.

It's actually easier to understand if you work backwards and arrive at the expression yourself, by asking yourself: "If I wanted the number that starts like 0.0...000 0...001 0...001 0...002 0...003 0...005 0...008 ... (with each block being 24 digits long), how would I express that number?"

This web app allows you to explore the Mandelbrot set interactively...just click and zoom. I had an application like this on my computer in college, but it only went a few zooms deep before crashing though. There was nothing quite like zooming in a bunch of times on something that looked like a satellite photo of a river delta and seeing something that looks exactly like when you started. (via @stevenstrogatz)

Finding Zero is an adventure filled saga of Amir Aczel's lifelong obsession: to find the original sources of our numerals. Aczel has doggedly crisscrossed the ancient world, scouring dusty, moldy texts, cross examining so-called scholars who offered wildly differing sets of facts, and ultimately penetrating deep into a Cambodian jungle to find a definitive proof.

I'm dreading it. No hope of solving any equations that day, what with the pie-eating contests, the bickering over the merits of pi versus tau (pi times two), and the throwdowns over who can recite more digits of pi. Just stay off the streets at 9:26:53, when the time will approximate pi to ten places: 3.141592653.

In a lecture given in 1924, German mathematician David Hilbert introduced the idea of the paradox of the Grand Hotel, which might help you wrap your head around the concept of infinity. (Spoiler alert: it probably won't help...that's the paradox.) In his book One Two Three... Infinity, George Gamow describes Hilbert's paradox:

Let us imagine a hotel with a finite number of rooms, and assume that all the rooms are occupied. A new guest arrives and asks for a room. "Sorry," says the proprietor, "but all the rooms are occupied." Now let us imagine a hotel with an infinite number of rooms, and all the rooms are occupied. To this hotel, too, comes a new guest and asks for a room.

"But of course!" exclaims the proprietor, and he moves the person previously occupying room N1 into room N2, the person from room N2 into room N3, the person from room N3 into room N4, and so on.... And the new customer receives room N1, which became free as the result of these transpositions.

Let us imagine now a hotel with an infinite number of rooms, all taken up, and an infinite number of new guests who come in and ask for rooms.

"Certainly, gentlemen," says the proprietor, "just wait a minute."

He moves the occupant of N1 into N2, the occupant of N2 into N4, and occupant of N3 into N6, and so on, and so on...

Now all odd-numbered rooms became free and the infinite of new guests can easily be accommodated in them.

This TED video created by Jeff Dekofsky explains that there are similar strategies for finding space in such a hotel for infinite numbers of infinite groups of people and even infinite amounts of infinite numbers of infinite groups of people (and so on, and so on...) and is very much worth watching:

Given that there's so much mathematicians don't know about prime numbers, you might be surprised to learn that there's a very simple regular expression for detecting prime numbers:

/^1?$|^(11+?)\\1+$/

If you've got access to Perl on the command line, try it out with some of these (just replace [number] with any integer):

perl -wle 'print "Prime" if (1 x shift) !~ /^1?$|^(11+?)\\1+$/' [number]

However while cute, it is very slow. It tries every possible factorization as a pattern match. When it succeeds, on a string of length n that means that n times it tries to match a string of length n against a specific pattern. This is O(n^2). Try it on primes like 35509, 195341, 526049 and 1030793 and you can observe the slowdown.

Using only squares, triangles, and the condition that each shape wants to move if less than 1/3 of its neighbors are like it, watch how extreme segregation appears in even the most random mixing of shapes.

These little cuties are 50% Triangles, 50% Squares, and 100% slightly shapist. But only slightly! In fact, every polygon prefers being in a diverse crowd. You can only move them if they're unhappy with their immediate neighborhood. Once they're OK where they are, you can't move them until they're unhappy with their neighbors again. They've got one, simple rule: "I wanna move if less than 1/3 of my neighbors are like me."

Harmless, right? Every polygon would be happy with a mixed neighborhood. Surely their small bias can't affect the larger shape society that much? Well... And... our shape society becomes super segregated. Daaaaang. Sometimes a neighborhood just becomes square, and it's not their fault if no triangles wanna stick around. And a triangular neighborhood would welcome a square, but they can't help it if squares ain't interested.

Super super fascinating. Take your time and go through and play with all the interactive widgets. (via @ftrain)

What's a large number? A billion? A billion times a billion? A billion to the billionth power? A googol? A googolplex? A googolplex is 10^googol, BTW:

So a googol is 1 with just 100 zeros after it, which is a number 10 billion times bigger than the grains of sand that would fill the universe. Can you possibly imagine what kind of number is produced when you put a googol zeros after the 1?

That's pretty big, right? Not. Even. It turns out you can construct numbers that are so much larger than a googolplex, that it's gonna light your head on fire just to read about them. Put on your asbestos hat and feast your eyes on Graham's Number.

Moving up another level, exponentiation is iterated multiplication. Instead of saying 3 x 3 x 3 x 3, exponentiation allows me to bundle that string into the more concise 3^4.

Now, the thing is, this is where most people stop. In the real world, exponentiation is the highest operation we tend to ever use in the hyperoperation sequence. And when I was envisioning my huge googolplex^googolplex number, I was doing the very best I could using the highest level I knew -- exponentiation. On Level 3, the way to go as huge as possible is to make the base number massive and the exponent number massive. Once I had done that, I had maxed out.

The key to breaking through the ceiling to the really big numbers is understanding that you can go up more levels of operations -- you can keep iterating up infinitely. That's the way numbers get truly huge.

You might get lost around the "power tower feeding frenzy" bit or the "power tower feeding frenzies psycho festival" bit, but persist...the end result is really just beyond superlatives. (via @daveg)

Update: In this video, you can listen to the inventor of Graham's number, Ron Graham, explain all about it.

The principles of rubber sheet geometry can be extended into three dimensions, which explains the quip that a topologist is someone who cannot tell the difference between a doughnut and a coffee cup. In other words, a coffee cup has just one hole, created by the handle, and a doughnut has just one hole, in its middle. Hence, a coffee cup made of a rubbery clay could be stretched and twisted into the shape of a doughnut. This makes them homeomorphic.

By contrast, a doughnut cannot be transformed into a sphere, because a sphere lacks any holes, and no amount of stretching, squeezing, and twisting can remove the hole that is integral to a doughnut. Indeed, it is a proven mathematical theorem that a doughnut is topologically distinct from a sphere. Nevertheless, Homer's blackboard scribbling seems to achieve the impossible, because the diagrams show the successful transformation of a doughnut into a sphere. How?

Although cutting is forbidden in topology, Homer has decided that nibbling and biting are acceptable. After all, the initial object is a doughnut, so who could resist nibbling? Taking enough nibbles out of the doughnut turns it into a banana shape, which can then be reshaped into a sphere by standard stretching, squeezing, and twisting. Mainstream topologists might not be thrilled to see one of their cherished theorems going up in smoke, but a doughnut and a sphere are identical according to Homer's personal rules of topology. Perhaps the correct term is not homeomorphic, but rather Homermorphic.

The Fields Medal is viewed as the greatest honor in mathematics; the Nobel of math. Today, Iranian mathematician Maryam Mirzakhani became the first woman (and Iranian) to win a Fields Medal.

Maryam Mirzakhani has made stunning advances in the theory of Riemann surfaces and their moduli spaces, and led the way to new frontiers in this area. Her insights have integrated methods from diverse fields, such as algebraic geometry, topology and probability theory.

In hyperbolic geometry, Mirzakhani established asymptotic formulas and statistics for the number of simple closed geodesics on a Riemann surface of genus g. She next used these results to give a new and completely unexpected proof of Witten's conjecture, a formula for characteristic classes for the moduli spaces of Riemann surfaces with marked points.

In dynamics, she found a remarkable new construction that bridges the holomorphic and symplectic aspects of moduli space, and used it to show that Thurston's earthquake flow is ergodic and mixing.

Most recently, in the complex realm, Mirzakhani and her coworkers produced the long sought-after proof of the conjecture that - while the closure of a real geodesic in moduli space can be a fractal cobweb, defying classification - the closure of a complex geodesic is always an algebraic subvariety.

Geoge Hart, who cut this bagel and made this video, is an engineering professor at SUNY-Stony Brook and "mathematical sculptor. On his web site, he offers two bagel-derived math problems: What is the ratio of the surface area of this linked cut to the surface area of the usual planar bagel slice? and Modify the cut so the cutting surface is a one-twist Mobius strip.

The book cover for Naive Set Theory by Paul Halmos is so so good:

The cover is a riff on, I think, Russell's Paradox, a problem with naive set theory described by Bertrand Russell in 1901 about whether sets can contain themselves.

Russell's paradox is based on examples like this: Consider a group of barbers who shave only those men who do not shave themselves. Suppose there is a barber in this collection who does not shave himself; then by the definition of the collection, he must shave himself. But no barber in the collection can shave himself. (If so, he would be a man who does shave men who shave themselves.)

In France, pie charts are called "le camembert" after the cheese. Or sometimes "un diagramme en fromage" (cheese diagram). In Brazil, they are pizza charts. (via numberphile & reddit)

It's possible to make a .zip file that contains itself infinitely many times. So a 440 byte file could conceivably be expanded into eleventy dickety two zootayunafliptobytes of data and beyond. Here's the full explanation.

"My kids used to love math! Now it makes them cry." So tweeted Louis C.K. earlier this week. His opinion of the new math and standardized tests is echoed by a lot of parents who "have found themselves puzzled by the manner in which math concepts are being presented to this generation of learners as well as perplexed as to how to offer the most basic assistance when their children are struggling with homework." Rebecca Mead in the The New Yorker: Louis C.K. Against the Common Core.

The math of why bigger pizzas are such a good deal is simple. A pizza is a circle, and the area of a circle increases with the square of the radius.

So, for example, a 16-inch pizza is actually four times as big as an 8-inch pizza.

And when you look at thousands of pizza prices from around the U.S., you see that you almost always get a much, much better deal when you buy a bigger pizza.

What do you think you get if you add 1+2+3+4+5+... all the way on up to infinity? Probably a massively huge number, right? Nope. You get a small negative number:

This is, by a wide margin, the most noodle-bending counterintuitive thing I have ever seen. Mathematician Leonard Euler actually proved this result in 1735, but the result was only made rigorous later and now physicists have been seeing this result actually show up in nature. Amazing. (thx, chris)

Update: Of course (of course!) the actual truth seems more complicated, hinging on what "sum" means mathematically, etc. (via @cenedella)

A short time before his death, Benoît B. Mandelbrot filmed an interview with Errol Morris. Morris charmingly starts off my asking Mandelbrot where "the fractal stuff" came from.

Note: as always, the "B." in "Benoît B. Mandelbrot" stands for "Benoît B. Mandelbrot". (via @sampotts)

In one instance, when Margie was the last contestant to bid, she guessed the retail price of an oven was $1,150. There had already been one bid for $1,200 and another for $1,050. She therefore could only win if the actual price was between $1,150 and $1,200. Since she was the last to bid, she could have guessed $1051, expanding her range by almost $100 (any price from $1051 to $1199 would have made her a winner), with no downside. What she really should have done, however, is bid $1,201. Game theory says that when you are last to bid, you should bid one dollar more than the highest bidder. You obviously won't win every time, but in the last 1,500 Contestants' Rows to have aired, had final bidders committed to this strategy, they would have won 54 percent of the time.

Throughout my years playing around with fractals, the Sierpinski triangle has been a consistent staple. The triangle is named after Wacław Sierpiński and as fractals are wont the pattern appears in many places, so there are many different ways of constructing the triangle on a computer.

All of the methods are fundamentally iterative. The most obvious method is probably the triangle-in-triangle approach. We start with one triangle, and at every step we replace each triangle with 3 subtriangles:

The discussion even veers into cows at some point...but zero mentions of the Menger sponge though? (via hacker news)

People are also emotional, and it turns out an unhappy truck driver can be trouble. Modern routing models incorporate whether a truck driver is happy or not -- something he may not know about himself. For example, one major trucking company that declined to be named does "predictive analysis" on when drivers are at greater risk of being involved in a crash. Not only does the company have information on how the truck is being driven -- speeding, hard-braking events, rapid lane changes -- but on the life of the driver. "We actually have built into the model a number of indicators that could be surrogates for dissatisfaction," said one employee familiar with the program.

This could be a change in a driver's take-home pay, a life event like a death in the family or divorce, or something as subtle as a driver whose morning start time has been suddenly changed. The analysis takes into account everything the company's engineers can think of, and then teases out which factors seem correlated to accident risk. Drivers who appear to be at highest risk are flagged. Then there are programs in place to ensure the driver's manager will talk to a flagged driver.

In other words, the traveling salesman problem grows considerably more complex when you actually have to think about the happiness of the salesman. And, not only do you have to know when he's unhappy, you have to know if your model might make him unhappy. Warren Powell, director of the Castle Laboratory at Princeton University's Department of Operations Research and Financial Engineering, has optimized transportation companies from Netjets to Burlington Northern. He recalls how, at Yellow Freight company, "we were doing things with drivers -- they said, you just can't do that." There were union rules, there was industry practice. Tractors can be stored anywhere, humans like to go home at night. "I said we're going to need a file with 2,000 rules. Trucks are simple; drivers are complicated."

From a new site called Stupid Calculations, here's what an iPhone consisting of all the iPhone displays ever built would look like plopped down in the midst of Manhattan. Behold the Monophone:

What if Marissa preferred instead to thumb off hundred-dollar bills into an ecstatic crowd of Tumblr owners? Using the stack of hundreds kept handy around the house, I conducted a test that worked out to a rate of 90 bills per minute. It could certainly go faster, but it's important to make a little flourish with each flick, a self-satisfied grin spread across the face. 90 bills per minute x $100= $9000. $1.1 billion / $9000 per minute = 122,222 minutes or 2037 hours or 84.87 continuous, no-bathroom, no-sleep days.

And what will she be getting for all this generosity? In addition to the office, it buys 175 Six Million Dollar Men; with 175 employees as of May, the acquisition works out to $6,285,714 per employee. That's $41,904 per pound in livestock terms (175 employees @ an average of 150 lbs= 26,250 lbs total).

Editors of prominent mathematics journals are used to fielding grandiose claims from obscure authors, but this paper was different. Written with crystalline clarity and a total command of the topic's current state of the art, it was evidently a serious piece of work, and the Annals editors decided to put it on the fast track.

Just three weeks later -- a blink of an eye compared to the usual pace of mathematics journals -- Zhang received the referee report on his paper.

"The main results are of the first rank," one of the referees wrote. The author had proved "a landmark theorem in the distribution of prime numbers."

Rumors swept through the mathematics community that a great advance had been made by a researcher no one seemed to know -- someone whose talents had been so overlooked after he earned his doctorate in 1992 that he had found it difficult to get an academic job, working for several years as an accountant and even in a Subway sandwich shop.

"Basically, no one knows him," said Andrew Granville, a number theorist at the Universite de Montreal. "Now, suddenly, he has proved one of the great results in the history of number theory."

Reminds me of a certain patent clerk and his theories about time and space. History doesn't repeat itself, but it does rhyme. (via @daveg)

Erica Klarreich, a Berkeley-based science writer who has a Ph.D. in mathematics and has written about Zhang, says his proof demonstrates the remarkable balance between order and randomness within the prime numbers. "Prime numbers are anything but random -- they are completely determined," Klarreich says. "Nevertheless, they seem to behave in many respects like randomly-sprinkled numbers that eventually display all possible clumps and clusters. Zhang's work helps to put this conjectured picture of the primes on a solid footing."

Update: Alec Wilkinson has a profile of Zhang in the Feb 2, 2015 issue of the New Yorker: The Pursuit of Beauty.

Zhang, who also calls himself Tom, had published only one paper, to quiet acclaim, in 2001. In 2010, he was fifty-five. "No mathematician should ever allow himself to forget that mathematics, more than any other art or science, is a young man's game," Hardy wrote. He also wrote, "I do not know of an instance of a major mathematical advance initiated by a man past fifty." Zhang had received a Ph.D. in algebraic geometry from Purdue in 1991. His adviser, T. T. Moh, with whom he parted unhappily, recently wrote a description on his Web site of Zhang as a graduate student: "When I looked into his eyes, I found a disturbing soul, a burning bush, an explorer who wanted to reach the North Pole." Zhang left Purdue without Moh's support, and, having published no papers, was unable to find an academic job. He lived, sometimes with friends, in Lexington, Kentucky, where he had occasional work, and in New York City, where he also had friends and occasional work. In Kentucky, he became involved with a group interested in Chinese democracy. Its slogan was "Freedom, Democracy, Rule of Law, and Pluralism." A member of the group, a chemist in a lab, opened a Subway franchise as a means of raising money. "Since Tom was a genius at numbers," another member of the group told me, "he was invited to help him." Zhang kept the books. "Sometimes, if it was busy at the store, I helped with the cash register," Zhang told me recently. "Even I knew how to make the sandwiches, but I didn't do it so much." When Zhang wasn't working, he would go to the library at the University of Kentucky and read journals in algebraic geometry and number theory. "For years, I didn't really keep up my dream in mathematics," he said.

"You must have been unhappy."

He shrugged. "My life is not always easy," he said.

In August of 2012, mathematician Shinichi Mochizuki posted a series of four papers online that purported to prove the ABC Conjecture, "a famed, beguilingly simple number theory problem that had stumped mathematicians for decades". Then, nothing. Or nearly nothing.

The problem, as many mathematicians were discovering when they flocked to Mochizuki's website, was that the proof was impossible to read. The first paper, entitled "Inter-universal Teichmuller Theory I: Construction of Hodge Theaters," starts out by stating that the goal is "to establish an arithmetic version of Teichmuller theory for number fields equipped with an elliptic curve...by applying the theory of semi-graphs of anabelioids, Frobenioids, the etale theta function, and log-shells."

This is not just gibberish to the average layman. It was gibberish to the math community as well.

"Looking at it, you feel a bit like you might be reading a paper from the future, or from outer space," wrote Ellenberg on his blog.

But seeming jibberish by a genius might just be solid mathematics, but Mochizuki isn't doing much to help other mathematicians confirm or refute his assertions. Which raises an interesting point: mathematics isn't all just logic and truth...there's a social element to it as well.

"You don't get to say you've proved something if you haven't explained it," she says. "A proof is a social construct. If the community doesn't understand it, you haven't done your job."

N Is a Number is an hour-long documentary about Hungarian mathematician Paul Erdős.

Erdős was famously a prolific mathematician who collaborated widely....he coauthored over 1500 papers with 500 different collaborators. He was also a homeless methamphetamine user.

In 2006, Garth Sundem and John Tierney published an equation in the NY Times that attempted to predict celebrity marriage crackups using a few metrics: age, fame, sexiness, etc. The pair recently modified the equation based on the evidence of the last five years and surprisingly, the equation is simpler.

What went right with them -- and wrong with our equation? Garth, a self-professed "uber-geek," has crunched the numbers and discovered a better way to gauge the toxic effects of celebrity. Whereas the old equation measured fame by counting the millions of Google hits, the new equation uses a ratio of two other measures: the number of mentions in The Times divided by mentions in The National Enquirer.

"This is a major improvement in the equation," Garth says. "It turns out that overall fame doesn't matter as much as the flavor of the fame. It's tabloid fame that dooms you. Sure, Katie Holmes had about 160 Enquirer hits, but she had more than twice as many NYT hits. A high NYT/ENQ ratio also explains why Chelsea Clinton and Kate Middleton have better chances than the Kardashian sisters."

Garth's new analysis shows that it's the wife's fame that really matters. While the husband's NYT/ENQ ratio is mildly predictive, the effect is so much weaker than the wife's that it's not included in the new equation. Nor are some variables from the old equation, like the number of previous marriages and the age gap between husband and wife.

Now available in its entirety on YouTube, a 95-minute documentary on physicist Richard Feynman called No Ordinary Genius.

The excellent film on Andrew Wiles' search for the solution to Fermat's Last Theorem is available as well (watch the first two minutes and you'll be hooked).

You are comfortable with feeling like you have no deep understanding of the problem you are studying. Indeed, when you do have a deep understanding, you have solved the problem and it is time to do something else. This makes the total time you spend in life reveling in your mastery of something quite brief. One of the main skills of research scientists of any type is knowing how to work comfortably and productively in a state of confusion.

The distance between the metal bands holding the cylindrical structure together decreases from top to bottom because the pressure the water exerts increases with depth. The top band only needs to fight against the water at the very top of the tower but the bottom bands have to hold the entire volume from bursting out.

And in celebration, this is my new favorite fact about pi: we have calculated pi out to over 6.4 billion digits but only 39 of them are needed to calculate the circumference of a circle as big as the universe "with a precision comparable to the radius of a hydrogen atom". (via @santheo)

This equation's initial purpose, he wrote, was to put meaningful prices on the terrestrial exoplanets that Kepler was bound to discover. But he soon found it could be used equally well to place any planet-even our own-in a context that was simultaneously cosmic and commercial. In essence, you feed Laughlin's equation some key parameters -- a planet's mass, its estimated temperature, and the age, type, and apparent brightness of its star -- and out pops a number that should, Laughlin says, equate to cold, hard cash.

At the time, the exoplanet Gliese 581 c was thought to be the most Earth-like world known beyond our solar system. The equation said it was worth a measly $160. Mars fared better, priced at $14,000. And Earth? Our planet's value emerged as nearly 5 quadrillion dollars. That's about 100 times Earth's yearly GDP, and perhaps, Laughlin thought, not a bad ballpark estimate for the total economic value of our world and the technological civilization it supports.

Nothing in the news media yet, but many folks on Twitter and colleague Nassim Taleb are reporting that the father of fractal geometry is dead at age 85. We're not there yet, but someday Mandelbrot's name will be mentioned in the same breath as Einstein's as a genius who fundamentally shifted our perception of how the world works.

This 45-minute documentary on Andrew Wiles' proof of Fermat's Last Theorem is surprisingly powerful and emotional. Give it until 1:45 or so and you'll want to watch the whole thing. The film is not really about math; it's about all of those movie trailer cliches -- "one man!", "finds the truth!", "fights the odds!", etc. -- except that this is actually true and poignant.

With Friedman's work, it seems Gödel's delayed triumph has arrived: the final proof that if there is a universal grammar of numbers in which all facets of their behaviour can be expressed, it lies beyond our ken.

But don't worry..."the most severe implications are philosophical". Phew?

Here's the entire text of a talk given at math, magic, and puzzle gathering (attendees included Stephen Wolfram and John Horton "Game of Life" Conway) by Gary Foshee:

I have two children. One is a boy born on a Tuesday. What is the probability I have two boys?

The first thing you think is "What has Tuesday got to do with it?" Well, it has everything to do with it.

The key word in the puzzle is "probability", which is not a very well understood term outside of the mathematics community. The full answer is at the end of the article.

After analyzing dozens of Hollywood films, a team of researchers has found evidence that the visual rhythm of movies at the shot level matches a pattern called the 1/f fluctuation, the same pattern that is found in dozens of natually occurring phenomena, including the length of the human attention span.

These results suggest that Hollywood film has become increasingly clustered in packets of shots of similar length. For example, action sequences are typically a cluster of relatively short shots, whereas dialogue sequences (with alternating shots and reverse-shots focused sequentially on the speakers) are likely to be a cluster of longer shots. In this manner and others, film editors and directors have incrementally increased their control over the visual momentum of their narratives, making the relations among shot lengths more coherent over a 70-year span.

Modern action movies are particularly adept at matching the audience's attention span in this manner. The full paper is available here.

I was really into fractals in college (I know...) when I was making rave flyers (I know!) for a friend's parties in Iowa (I know! I know! Shut up already!). Anyway, the thing that I really used to love doing with this fractal application that I had on my computer was zooming in to different parts of the familiar Mandelbrot set as far as I could. I never got very far...between 5 or 6 zooms in, my Packard Bell 486/66 (running Windows 3.11) would buckle under the computational pressure and hang. Therefore, I absolutely love this extremely deep HD zoom into the Mandelbrot set:

Just how deep is this computational rabbit hole?

The final magnification is e.214. Want some perspective? a magnification of e.12 would increase the size of a particle to the same as the earths orbit! e.21 would make a particle look the same size as the milky way and e.42 would be equal to the universe. This zoom smashes all of them all away. If you were "actually" traveling into the fractal your speed would be faster than the speed of light.

After awhile, the self-similarity of the thing is almost too much to bear; I think I went into a coma around 5:00 but snapped to in time for the exciting (but not unexpected) conclusion. Full-screen in a dark room is recommended.

Update: This 46-minute video seems to be the deepest fractal zoom out there right now, with a zoom level of 10^10000.

The magnification factor is so much less in the video above but that one's more fun/artistic. And 10^10000 is such an absurdly large number^{1} that there's no way to think about it in physical terms...the zoom factor from the size of the universe to the smallest measurable distance (the Planck length) is only about 10^60.

I'll be writing about the elements of mathematics, from pre-school to grad school, for anyone out there who'd like to have a second chance at the subject -- but this time from an adult perspective. It's not intended to be remedial. The goal is to give you a better feeling for what math is all about and why it's so enthralling to those who get it.

More subject blogs like this, please. There are lots of art, politics, technology, fashion, economics, typography, photography, and physics blogs out there, but almost none of them appeal to the beginner or interested non-expert. (thx, steve)

The ham sandwich theorem is also sometimes referred to as the "ham and cheese sandwich theorem", again referring to the special case when n = 3 and the three objects are

1. a chunk of ham, 2. a slice of cheese, and 3. two slices of bread (treated as a single disconnected object).

The theorem then states that it is possible to slice the ham and cheese sandwich in half such that each half contains the same amount of bread, cheese, and ham. It is possible to treat the two slices of bread as a single object, because the theorem only requires that the portion on each side of the plane vary continuously as the plane moves through 3-space.

No idea how this is related to the I Cut You Choose conundrum.

A round pizza with radius 'z' and thickness 'a' has the volume pi*z*z*a. That and other math jokes are available on Wikipedia. Don't you love it when people explain jokes:

In this case, DEAD refers to a hexadecimal number (57005 base 10), not the state of being no longer alive.

If you and someone else hate the same third person, but like each other, balance theory says you're golden -- all three can persist without changing their opinions. On the other hand, if all three of you despise the others, it's an unstable triad, as well as a wildly common plot point for crime movies. While there are numerous resolutions -- one person changes his preference toward another, a relationship tie is cut -- another route back to stability, albeit a messy one, is the gunning down of at least one person.

Arbesman has some videos and stills on his web site from the movies mentioned in the article as well as the relevant mathematical materials.

Let's say, for example, you want to bet on one of the highlights of the British sporting calendar, the annual university boat race between old rivals Oxford and Cambridge. One bookie is offering 3 to 1 on Cambridge to win and 1 to 4 on Oxford. But a second bookie disagrees and has Cambridge evens (1 to 1) and Oxford at 1 to 2.

Each bookie has looked after his own back, ensuring that it is impossible for you to bet on both Oxford and Cambridge with him and make a profit regardless of the result. However, if you spread your bets between the two bookies, it is possible to guarantee success (see diagram, for details). Having done the calculations, you place £37.50 on Cambridge with bookie 1 and £100 on Oxford with bookie 2. Whatever the result you make a profit of £12.50.

I say relatively because there are literally millions of pages on the web just about blackjack statistics. For instance, it's easy to see how you'll lose money playing blackjack in the long run -- card counting aside -- by looking at this house edge calculator. The only real advantage to the player occurs with a one-deck shoe and a bunch of other pro-player rules, which I imagine are difficult to find at the casinos. (via big contrarian)

Now, here's the part that really boggled me: the Consumption/Waste idea is a 1:1 correspondence (something in yields something out), what mathematicians call a linear function. The Parabola idea connects, pretty obviously, with parabolas -- now we're looking at x raised to the power of two. Annular Systems are modeled by circles which are given in analytic geometry by equations with both x^2 and y^2. Limits and Infinity, of course, become necessary in order to find the area of shapes under curves like parabolas and three-dimensional projections of circles.

Whoa. That is a tiny bit mind-blowing...do I really have time for a reread right now? (thx, nick)

Watch as David Attenborough signals his interest in mating with a male cicada. Scientists think that cicadas have 13- or 17-year mating cycles because, being prime numbers, those periods are not divisible by those periods of potential predators. From Stephen J. Gould:

Many potential predators have 2-5-year life cycles. Such cycles are not set by the availability of cicadas (for they peak too often in years of nonemergence), but cicadas might be eagerly harvested when the cycles coincide. Consider a predator with a life-cycle of five years: if cicadas emerged every 15 years, each bloom would be hit by the predator. By cycling at a large prime number, cicadas minimize the number of coincidences (every 5 x 17, or 85 years, in this case). Thirteen- and 17-year cycles cannot be tracked by any smaller number.

Newish episode of Radiolab about randomness: Stochasticity.

How big a role does randomness play in our lives? Do we live in a world of magic and meaning or ... is it all just chance and happenstance? To tackle this question, we look at the role chance and randomness play in sports, lottery tickets, and even the cells in our own body. Along the way, we talk to a woman suddenly consumed by a frenzied gambling addiction, two friends whose meeting seems purely providential, and some very noisy bacteria.

Regarding the game of Who Can Name the Bigger Number?, Scott Aaronson shows that while 9^9^9^9 might cut the mustard in the first couple of rounds, the numbers and the notation used to express them get much more complicated.

Exponentials are familiar, relevant, intimately connected to the physical world and to human hopes and fears. Using the notational systems I'll discuss next, we can concisely name numbers that make exponentials picayune by comparison, that subjectively speaking exceed 9^9^9^9 as much as the latter exceeds 9.

Geoffrey West of the Santa Fe Institute and his colleagues Jim Brown and Brian Enquist have argued that a 3/4-power law is exactly what you'd expect if natural selection has evolved a transport system for conveying energy and nutrients as efficiently and rapidly as possible to all points of a three-dimensional body, using a fractal network built from a series of branching tubes -- precisely the architecture seen in the circulatory system and the airways of the lung, and not too different from the roads and cables and pipes that keep a city alive.

Joe liked the idea of measuring how long this number would be if it were set in type, which immediately called into question the choice of font. The number's length would depend chiefly on the width of the font selected, and even listener-friendly choices like Times Roman and Helvetica would produce dramatically different outcomes. Small eccentricities in the design of a particular number, such as Times Roman's inexplicably scrawny figure one, would have huge consequences when multiplied out to this length. But even this isn't the hairy part. Where things get difficult, as always, is in the kerning.

In some cases, properly kerning the number resulted in a difference of more than 1000 feet for 12 pt. text.

In the complex formula L represents the number of lumps in the batter and C equals its consistency. The letter F stands for the flipping score, k is the ideal consistency and T is the temperature of the pan. Ideal temp of pan is represented by m, S is the length of time the batter stands before cooking and E is the length of time the cooked pancake sits before being eaten. The closer to 100 the result is -- the better the pancake.

However, a commenter notes:

According to that formula, if you left the pancake batter standing for ten years, (s-e) would be large, and so the pancake would be near perfect. If you let it stand for the same time as you left the pancake to cool, (s-e) would be zero and the pancake would be infinitely bad.

In The Method, Archimedes was working out a way to compute the areas and volumes of objects with curved surfaces, which was also one of the problems that motivated Newton and Leibniz. Ancient mathematicians had long struggled to "square the circle" by calculating its exact area. That problem turned out to be impossible using only a straightedge and compass, the only tools the ancient Greeks allowed themselves. Nevertheless, Archimedes worked out ways of computing the areas of many other curved regions.

The same thing happened: something would look good at first and then turn out to be horrifying. For example, there was a book that started out with four pictures: first there was a windup toy; then there was an automobile; then there was a boy riding a bicycle; then there was something else. And underneath each picture it said, "What makes it go?"

I thought, "I know what it is: They're going to talk about mechanics, how the springs work inside the toy; about chemistry, how the engine of the automobile works; and biology, about how the muscles work."

It was the kind of thing my father would have talked about: "What makes it go? Everything goes because the sun is shining." And then we would have fun discussing it:

"No, the toy goes because the spring is wound up," I would say. "How did the spring get wound up?" he would ask.

"I wound it up."

"And how did you get moving?"

"From eating."

"And food grows only because the sun is shining. So it's because the sun is shining that all these things are moving." That would get the concept across that motion is simply the transformation of the sun's power.

I know I've posted this one before but I'm probably gonna post it each time I run across it.

That's chef Kin Jing Mark stretching and dividing dough into super-thin noodles. Seeing this when I was a kid made a great impression on me about the wonder of mathematics.

DARPA is soliciting research proposals for people wishing to solve one of twenty-three mathematical challenges, many of which deal with attempting to find a mathematical basis underlying biology.

What are the Fundamental Laws of Biology?: This question will remain front and center for the next 100 years. DARPA places this challenge last as finding these laws will undoubtedly require the mathematics developed in answering several of the questions listed above.

- The air in the Empire State Building weighs about 4 million pounds.
- The energy consumption of the world's population will be greater than the energy coming from the sun in less than 500 years. (Peak photons?)

What's surprising about such estimates is how often they are very close to the reality. This is especially true in a multi-step approximation, where over- and underestimates at various steps tend to cancel each other out, usually resulting in something not too far off from the truth.

Both Microsoft and Google use questions like these as part of their job interview process. We did a bunch of them in my freshman physics class; I loved them.

Banknote patterns fascinate me. I can get lost for hours in all the details, seeing how the patterns fit together, how the lettering works, the tiny security 'flaws' -- they're amazing. Central to banknote designs are Guilloche patterns, which can be created mechanically with a geometric lathe, or more likely these days, mathematically. The mathematical process attracted me immediately as I don't have a geometric lathe and nor do I have anywhere to put one. I do, however, have a computer, and at the point I first started playing with the designs (mid-2004) Illustrator and Photoshop had gained the ability to be scripted.

In case you're wondering, the most densely populated block group is one in New York County, New York -- 3,240 people in 0.0097 square miles, for about 330,000 per square mile. The least dense is in the North Slope Borough of Alaska -- 3 people in 3,246 square miles, or one per 1,082 square miles. The Manhattan block group I mention here is 360 million times more dense than the Alaska one; population densities vary over a huge range.

That's approximately the same range from the height of an iPod to the diameter of the Earth. (via fakeisthenewreal)

Benoit Mandelbrot and Paola Antonelli talk about, among other things, fractals, self-similarity in architecture, algorithms that could specify the creation of entire cities, visual mathematics, and generalists.

This has been for me an extraordinary pleasure because it means a certain misuse of Euclid is dead. Now, of course, I think that Euclid is marvelous, he produced one of the masterpieces of the human mind. But it was not meant to be used as a textbook by millions of students century after century. It was meant for a very small community of mathematicians who were describing their works to one another. It's a very complicated, very interesting book which I admire greatly. But to force beginners into a mathematics in this particular style was a decision taken by teachers and forced upon society. I don't feel that Euclid is the way to start learning mathematics. Learning mathematics should begin by learning the geometry of mountains, of humans. In a certain sense, the geometry of...well, of Mother Nature, and also of buildings, of great architecture.

Speaking of the Yankees, Derek Jeter always seems to get a lot of credit for those four World Series victories in five years but a quick look at the OBP stats for those years shows that Bernie Williams was the engine driving that offense. Jeter's a little overrated maybe?

Called "Hilbert" after the influential German mathematician, David Hilbert, the newly licensed software will be browser accessible and, utilizing AJAX technologies, will emulate the desktop version of the software with remarkable fidelity. "The magic of AJAX will allow OST to combine or 'mash-up' Mathematica with other web-based technologies to deliver and support high quality science and mathematics courses online such as the Calculus&Mathematica courses currently taught through NetMath at the University of Illinois and other universities," explains Scott Gray, Director of the O'Reilly School of Technology.

Hilbert should be available before the end of the year.

Infinite Jest once again proved finite, although it's taken me since August to get through it. This book was such a revelation the first time through that I was afraid of a reread letdown but I enjoyed it even more this time around...and got much more out of the experience too.

Right as I was finishing the book, I read a transcription of an interview with Wallace in which interviewer Michael Silverblatt asked him about the fractal-like structure of the novel:

MICHAEL SILVERBLATT: I don't know how, exactly, to talk about this book, so I'm going to be reliant upon you to kind of guide me. But something came into my head that may be entirely imaginary, which seemed to be that the book was written in fractals.

DAVID FOSTER WALLACE: Expand on that.

MS: It occurred to me that the way in which the material is presented allows for a subject to be announced in a small form, then there seems to be a fan of subject matter, other subjects, and then it comes back in a second form containing the other subjects in small, and then comes back again as if what were being described were -- and I don't know this kind of science, but it just -- I said to myself this must be fractals.

DFW: It's -- I've heard you were an acute reader. That's one of the things, structurally, that's going on. It's actually structured like something called a Sierpinski Gasket, which is a very primitive kind of pyramidical fractal, although what was structured as a Sierpinski Gasket was the first- was the draft that I delivered to Michael in '94, and it went through some I think 'mercy cuts', so it's probably kind of a lopsided Sierpinski Gasket now. But it's interesting, that's one of the structural ways that it's supposed to kind of come together.

MS: "Michael" is Michael Pietsche, the editor at Little, Brown. What is a Sierpinski Gasket?

DFW: It would be almost im- ... I would almost have to show you. It's kind of a design that a man named Sierpinski I believe developed -- it was quite a bit before the introduction of fractals and before any of the kind of technologies that fractals are a really useful metaphor for. But it looks basically like a pyramid on acid --

To answer Silverblatt's question, a Sierpinski Gasket is constructed by taking a triangle, removing a triangle-shaped piece out of the middle, then doing the same for the remaining pieces, and so on and so forth, like so:

The result is an object of infinite boundary and zero area -- almost literally everything and nothing at the same time. A Sierpinski Gasket is also self-similar...any smaller triangular portion is an exact replica of the whole gasket. You can see why Wallace would have wanted to structure his novel in this fashion.

What's sort of great about it is that it will happen to everybody if you live long enough. If you were born in 2000, it happens instantaneously. The people who were born at the end of the century have to take care of themselves.

Basically, as the leaf grows it is constrained to a 2-d surface, but the cells of some leaves reproduce fast enough to require more surface area than a pi-r-squared plane surface can provide. Its only recourse is to buckle out-of-plane, giving the wrinkles. Since the exuberant growth continues as the leaf grows outward, the buckling process repeats and you get the multi-scale (ripples on ripples on ripples) shape that you see in kale and daffodils.

Cadaeic Cadenza is a 3834-word story by Mike Keith where each word in sequence has the same number of letters as the corresponding digit in pi. (thx, mark, who has more info on constrained writing) Related: The Feynman point is the sequence of six 9s which begins 762 digits into pi. "[Feynman] once stated during a lecture he would like to memorize the digits of pi until that point, so he could recite them and quip 'nine nine nine nine nine nine and so on.'"

On March 14, 1998, I made the first post to this little site. And I'm still standin' (yeah yeah yeah). Here's to 9 more years. Actually, I'll settle for making it to 10. Baby steps.

And if that weren't enough excitement for one day, it's also Pi Day. (Whoa, the Pi Day web site uses Silkscreen!) I bet the Pi Dayers are really looking forward to 2015 when they can extend the fun to two additional decimal places.

Rule of thumb to avoid photographing people with their eyes closed: divide the number of people by three (or by two if the light is bad). That means that if you're taking a photo of 12 people, you need to take at least 4 photos to have a good chance of getting a photo with everyone's eyes open. (via photojojo)

Update:Jeff writes: "Way back when we only used film I learned you could tell before looking at the photo whether someone blinked by asking them what color was the flash. If it was white or bluish white, then their eyes were open. If it was orange, then their eyes were closed and they had 'seen' the flash through their eyelids."

A look at Saks Fifth Avenue's new logo and identity. The identity system consists of cutting up the logo into patterns....98,137,610,226,945,526,221,323,127,451,938,506, 431,029,735,326,490,840,972,261,848,186,538, 906,070,058,088,365,083,852,800,000,000,000 possible patterns.

Last month I covered the hubbub surrounding the still-potential proof of the Poincare conjecture. The best take on the situation was a New Yorker article by Sylvia Nasar and David Gruber, detailing the barest glimpse of the behind-the-scenes workings of the mathematics community, particularly those involving Grigory Perelman, a recluse Russian mathematician who unveiled his potential Poincare proof in 2002 and Shing-Tung Yau, a Chinese mathematician who, the article suggested, was out for more than his fair share of the credit in this matter.

After declining the Fields Medal, the Nobel Prize of mathematics, Perelman has quit mathematics and lives quietly in his native Russia. Yau, however, is upset at his portrayal (both literally and literary) in the New Yorker article and has written a letter to the New Yorker asking them to make a prominent correction and apologize for an illustration of Yau that accompanied the article. From the letter:

I write in the hope of enlisting your immediate assistance, as well as the assistance of The New Yorker, in undoing, to the extent possible, the literally world-wide damage done to Dr. Yau's reputation as a result of the publication of your article. I also write to outline for you, on a preliminary basis, but in some detail, several of the more egregious and actionable errors which you made in the article, and the demonstrably shoddy "journalism" which resulted in their publication.

The letter, addressed to the two authors as well as the fact-checker on the article and CC'd to David Remnick and the New Yorker's general counsel, runs 12 pages, so you may want to have a look at the press release instead. A webcast discussing all the details of the letter is being held at noon on September 20...information on how to tune in will be available at Dr. Yau's web site. (thx, david)

As I mentioned yesterday, the New Yorker published an article by Sylvia Nasar^{1} and David Gruber about the recent proof of the Poincare Conjecture^{2}. (Previous coverage in the NY Times and the Guardian.) The article, which is unavailable from the New Yorker's web site (they've now made it available), contains the only interview I've seen with Grigory Perelman, the Russian mathematician who published a potential proof of the conjecture in late 2002, gave a series of lectures in the US, and then went back to Russia. Since then, he hasn't communicated with anyone about the proof, has quit mathematics, and recently refused the Fields Medal, the most prestigious award that mathematics has to offer, saying:

It was completely irrelevent for me. Everybody understood that if the proof is correct then no other recognition is needed.

Meanwhile, a Chinese group of mathematicians, led by Shing-Tung Yau^{3}, are claiming that Perelman's proof was too complicated and are offering a reworked proof instead of Perelman's. That is, they're claiming the first complete proof of the conjecture. Yau The active director of Yau's mathematics institute explained the relative contributions thusly:

Hamilton contributed over fifty per cent; the Russian, Perelman, about twenty five per cent; and the Chinese, Yau, Zhu, and Cao et al., about thirty per cent. (Evidently, simple addition can sometimes trip up even a mathematician.)

Clearly the Chinese gave more than 100% in solving this proof, but Yau is regarded by some mathematicians as attempting to grab glory that does not belong to him. John Morgan, a mathematician at Columbia University, says:

Perelman already did it and what he did was complete and correct. I don't seen anything that [Yau et al.] did different.

Yau wants to be associated with the proof of the Poincare Conjecture, to have China associated with it, and for his student, Zhu, to be elevated in status by it. The $1 million in prize money for the proof of the conjecture offered by the Clay Mathematics Institute can't be far from Yau's mind as well. For his part, Grigory Perelman won't say whether he'll accept the prize money until it is offered. Stay tuned, I guess.

[2] Poincare (properly written as Poincaré) is pronounced Pwan-cah-RAY, not Poyn-care as I said it up until a few weeks ago. ↩

[3] Yau proved a conjecture by Eugenio Calabi which gave birth to a highly useful mathematical structure called a Calabi-Yau manifold; Yau won the Fields Medal for it. The C-Y manifold is important in string theory and Andrew Wiles used it as part of his proof of Fermat's Last Theorem. In short, Yau is a mathematical stud, no question. ↩

Benford's Law describes a curious phenomenon about the counterintuitive distribution of numbers in sets of non-random data:

A phenomenological law also called the first digit law, first digit phenomenon, or leading digit phenomenon. Benford's law states that in listings, tables of statistics, etc., the digit 1 tends to occur with probability ~30%, much greater than the expected 11.1% (i.e., one digit out of 9). Benford's law can be observed, for instance, by examining tables of logarithms and noting that the first pages are much more worn and smudged than later pages (Newcomb 1881). While Benford's law unquestionably applies to many situations in the real world, a satisfactory explanation has been given only recently through the work of Hill (1996).

I first heard of Benford's Law in connection with the IRS using it to detect tax fraud. If you're cheating on your taxes, you might fill in amounts of money somewhat at random, the distribution of which would not match that of actual financial data. So if the digit "1" shows up on Al Capone's tax return about 15% of the time (as opposed to the expected 30%), the IRS can reasonably assume they should take a closer look at Mr. Capone's return.

Since I installed Movable Type 3.15 back in March 2005, I have been using its "post to the future" option pretty regularly to post my remaindered links...and have been using it almost exclusively for the last few months[1]. That means I'm saving the entries in draft, manually changing the dates and times, and then setting the entries to post at some point in the future. For example, an entry with a timestamp like "2006-02-20 22:19:09" when I wrote the draft might get changed to something like "2006-02-21 08:41:09" for future posting at around 8:41 am the next morning. The point is, I'm choosing basically random numbers for the timestamps of my remaindered links, particularly for the hours and minutes digits. I'm "cheating"...committing post timestamp fraud.

That got me thinking...can I use the distribution of numbers in these post timestamps to detect my cheating? Hoping that I could (or this would be a lot of work wasted), I whipped up a MT template that produced two long strings of numbers: 1) one of all the hours and minutes digits from the post timestamps from May 2005 to the present (i.e. the cheating period), 2) and one of all the hours and minutes digits from Dec 2002 - Jan 2005 (i.e. the control group). Then I used a PHP script to count the numbers in each string, dumped the results into Excel, and graphed the two distributions together. And here's what they look like, followed by a table of the values used to produce the chart:

Digit

5/05-now

12/02-1/05

Difference

1

31.76%

33.46%

1.70%

2

11.76%

14.65%

2.89%

3

10.30%

9.96%

0.34%

4

10.44%

9.58%

0.86%

5

10.02%

10.52%

0.51%

6

4.83%

5.40%

0.57%

7

5.66%

4.96%

0.70%

8

7.62%

4.65%

2.97%

9

7.60%

6.81%

0.79%

As expected, 1 & 2 show up less than they should during the cheating period, but not overly so[2]. The real fingerprint of the crime lies with the 8s. The number 8 shows up during the cheating period ~64% more than expected. After thinking about it for awhile, I came up with an explanation for the abundance of 8s. I often schedule posts between 8am-9am so that there's stuff on the site for the early-morning browse and I usually finish off the day with something between 6pm-7pm (18:00 - 19:00). Not exactly the glaring evidence I was expecting, but you can still tell.

The obvious next question is, can this technqiue be utilized for anything useful? How about detecting comment, trackback. or ping spam? I imagine IPs and timestamps from these types of spam are forged to at least some extent. The difficulties are getting enough data to be statistically significant (one forged timestamp isn't enough to tell anything) and having "clean" data to compare it against. In my case, I knew when and where to look for the cheating...it's unclear if someone who didn't know about the timestamp tampering would have been able to detect it. I bet companies with services that deal with huge amounts of spam (Gmail, Yahoo Mail, Hotmail, TypePad, Technorati) could use this technique to filter out the unwanted emails, comments, trackbacks, or pings...although there's probably better methods for doing so.

[1] I've been doing this to achieve a more regular publishing schedule for kottke.org. I typically do a lot of work in the evening and at night and instead of posting all the links in a bunch from 10pm to 1am, I space them out over the course of the next day. Not a big deal because increasing few of the links I feature are time-sensitive and it's better for readers who check back several times a day for updates...they've always got a little something new to read.

[2] You'll also notice that the distributions don't quite follow Benford's Law either. Because of the constraints on which digits can appear in timestamps (e.g. you can never have a timestamp of 71:95), some digits appear proportionally more or less than they would in statistical data. Here's the distribution of digits of every possible time from 00:00 to 23:59: