Moved Permanently
This blog has now moved permanently. Please update your bookmarks/google reader.
This blog has now moved permanently. Please update your bookmarks/google reader.
The proposed boycott of Elsevier seems to have them running scared. I just got this email from them:
A letter to the mathematics community.
We are writing to let you know of a series of changes that we are making to how the Elsevier mathematics program will be run. Some of these are new initiatives, and some reflect changes that we have been working on over a longer period.
We have been listening actively to the community and we see a number of issues that we need to address, not least being open to what the community has to say:
Pricing
Mathematics journals published by Elsevier tend to be larger than those of other publishers. On a price-per-article, or price-per-page level, our prices are typically, but not always, lower than those of other mathematics publishers.
Our target is for all of our core mathematics titles to be priced at or below US$11 per article (equivalent to 50-60 cents per normal typeset page) by next year, placing us below most University presses, some societies and other commercial competitors. Where journals are more expensive than this, we will lower our prices, as we already have in recent years for journals such as the Journal of Algebra and Topology and its Applications, among others.
We realize that this is just part of the concerns about pricing -and we will seek to address concerns about the nature and composition of the large discounted agreements, through which most Universities now access journals - but addressing the base line pricing is a necessary first step.
Access and Open Archives
To make clear that we are committed to wider access, we have made the archives of 14 core mathematics journals open, from four years after publication, back to 1995, the year when we started publishing digitally. All current and future papers featured in these journals will become free to read, for subscribers and non subscribers alike. This initiative is part of a number of open access publishing options we have available which give researchers the freedom to choose to open their research beyond the academic community. For more information about Elsevier's open access options, visit www.elsevier.com/openaccess.
>
We are a founding partner in Research4Life, a public private partnership providing journal content to researchers in the developing world. More than 1600 Elsevier journals, including our mathematics titles, are available in more than 100 developing countries.
Our position on RWA
Elsevier has announced today that we are withdrawing our support for the Research Works Act. In recent weeks, our support for the Act has caused some in the community to question our commitment to serving the global research community and ensuring the best possible access to research publications and data. We have heard concerns from some Elsevier journal authors, editors and reviewers that the Act would be seen as a step backwards for expanding options for free and low cost public access to scholarly literature. That was certainly not the intention of the legislation or our intention in supporting it. Please read our full statement online.
Moving forward
Now that we have explained the steps we have taken so far we want to stress this is just the beginning.
We will create a scientific council for mathematics, to ensure that we are working in tandem with the mathematics community to address feedback and to give greater control and transparency to the community and we will engage actively with leaders in a number of countries to ensure that our mathematics program is meeting the needs of the community, globally and locally.
There are many other issues where we wish to engage with the community, including our efforts to improve digital rendering of mathematics, the use and misuse of citation measures for the discipline and our efforts to ensure high standard across all of our journals.
We welcome your views on these and all our efforts at: mathematics@elsevier.com
Sincerely,
David Clark & Laura Hassink
Senior Vice Presidents, Physical Sciences
A major problem with running Postgres on EC2 is that EBS performance often sucks. In addition to performing poorly, EBS also uses the network connection, which can be undesirable. Ephemeral storage is provided, and tends to have better performance characteristics, but unfortunately it lacks durability.
However, we can still use it to speed up Postgres on complex queries. Postgresql has a concept called tablespaces – a tablespace is basically a location in the filesystem where postgres objects can be stored.
In particular, postgres has the temp_tablespaces configuration setting. This setting determines which tablespace postgres will use to store temporary tables, which are often created during complex queries.
So here is how we speed up postgres – we move the temporary tablespace to ephemeral storage. This is safe, since no data is permanently stored in the temporary tablespace.
psql> CREATE TABLESPACE ephemeral LOCATION \'/mnt/postgresql_tmp\'
We then modify postgresql.conf to tell postgres to use this tablespace for temporary objects:
temp_tablespaces = 'ephemeral'
Finally, restart postgres.
For some of my complex queries (involving 4 joins), this gives a 30% improvement in speed and cuts network usage by about half.
I've run into a curious error while cloning Ubuntu VM hard drives. After cloning, the network card no longer works. However, if I clone the VM completely (including the mac address), there is no problem.
The reason for this is that Ubuntu, upon installation, stores the mac address of the network card. This ensures the same network card will always be mapped to eth0, and future network cards will be eth1, etc.
However, this behavior is not desirable when cloning a VM.
One solution is to delete the stored mac address upon booting, which can be done with this script (add it to /etc/rc.local ):
This will fail horribly if your VM has multiple network interfaces.
A fairly recent trend in discussions of obesity is to focus on weight stability. Weight stability is a phenomenon by which a human maintains roughly the same bodyweight over a long period of time. Karl Smith recently brought it up, for example:
...its hard to square [theories of obesity based on lack of self control] with the stability of weight, which is probably the biggest or second biggest next to heritability, stylized fact about obesity. That is, if you always over eat you don’t get fat, you get fatter. Presumably, you will get fatter and fatter over time.
However, non-dieting obese folks have a weight stability that is roughly the same as thin people. Its as if they tried to go exactly to a particular too heavy weight and then stop and act like a thin person with 50 extra lbs.
Why does it stop like that?
That’s a puzzle.
It's a puzzle, but it's a puzzle with a solution. It turns out that the Calories In/Calories Out model of human metabolism completely explains the phenomenon of weight stability for the fat and thin alike.
The Calories In/Calories out model predicts that the human body converts calories to bodyfat at a rate of 3500 calories/1 lb of bodyfat:
Change in bodyweight = (Calories in - Calories out) x (1lb bodyfat / 3500 Cal)
Calories in is simple, but calories out is a little tricky. The calories out portion is described by the Harris-Benedict equation. The Harris-Benedict equation (which I'll abbreviate to HBE) states that a man has a Base Metabolic Rate of
BMR = 66 + ( 6.23 x weight in pounds ) + ( 12.7 x height in inches ) – ( 6.76 x age in years )
(the numbers are a bit different for women, but the equation is similar) and that
Calories out = E x BMR
where E is a constant that varies with an individual's exercise rate. For a sedentary individual, E = 1.2, whereas serious athletes might have A as high as 1.9.
The key fact to note about this equation is that as a person's weight increases, his metabolic rate increases. This means that a fat person must eat more calories to maintain bodyweight than a thin person.
Now for some math. If you are unfamiliar with ordinary differential equations, you might want to skip to the pictures.
The mathematical part
Based on this, we can derive an equation for a human's bodyweight as a function of time (with t having units of days):
w'(t) = c(t) - a - b w(t) + g t
where
c(t) = Calories consumption as a function of time / 3500
a = E x (66 + 12.7 x height) / 3500
b = E x 6.23 / 3500
g = E x 6.775 / (365 x 3500)
Here, w'(t) denotes the derivative of w(t) with respect to time. Now, suppose an individual consumes food at a rate of c(t) = x + yt. In that case, their bodyweight obeys the equation:
w'(t) = (x - a) - b w(t) + (g+y) t
This equation has a simple solution:
w(t) = [(x-a)+bgt-g]/b^2 + Const x exp(-bt)
This is the equation of exponential decay - specifically, it shows that if a human eats a constant amount of food, their bodyweight will exponentially approach an asymptote which depends on x (their rate of consumption) as well as their age and activity level (which affect a and b).
The Results - Mathophobes can start reading
I've graphed the result for a 6' tall 25 year old male who consumes 3000 Cals/day. After 3 years, regardless of his initial weight, his terminal weight lies between 190lb and 193lbs.
Now consider two hypothetical people, both 6' tall, 28 years old, and exactly 193lbs. The first person becomes a fan of Alton Brown and cooks lots of healthy food. He continues eating 3000 Cals/day. He'll gain a few pounds before he reaches 40, but nothing remarkable. His weight will remain stable.
Now consider a second person, who becomes a fan of Paula Deen. By following her delicious recipes, he adds 5 tablespoons of butter y'all to his diet (500 extra calories). Over the next 3-4 years he gains 50lbs and then stops - all additional weight gain is due solely to aging, and is mirrored by the Alton Brown fan.
The puzzle of weight stability is solved - fat people eat more than thin people, but by a constant amount. As they add food to their diet, their weight rapidly rises and then plataus at a higher level.
The difference between the 190lb man and the 250lb man is the latter eats everything the thin man eats, plus an extra 5 tablespoons of butter.
Here is the source code I used to create the graphs.
Eric Ries over at TechCrunch recently wrote an article discussing racism/sexism in Silicon Valley and the technology industry. The article discusses differences in aptitude between men and women, and attempts to downplay them as the cause of a lack of women in technology (and at YC-funded companies, specifically):
Could this be the result of innate differences between white men and other groups? The math simply doesn’t hold up to support this view. Think about two overlapping populations of people, like men and women. They would naturally be normally distributed in a bell curve around a mean aptitude. So picture those two bell curves. Here in Silicon Valley, we’re looking for the absolute best and brightest, the people far out on the tail end of aptitude. So imagine that region of the curve. How far apart would the two populations have to be to explain YC’s historical admission rate of 4% women? It would have to be really extreme.
It looks like Eric Ries didn't actually do the math.
According to the most recent data, men and women have more or less identical mean aptitude for mathematics. But there is a considerable difference in the variance - men's aptitude has a variance 11-20% higher than women's. I.e., the population distributions look like this (splitting the difference and taking men's variance to be 16% higher):
Doesn't look like a "really extreme difference", right? It's not.But lets zoom in on people who are in the top 2.5% in mathematical ability alone, i.e. people with the capability to be decent programmers:
In this case, men make up 58% of the total. If we zoom in to people who are 1 in 1000, i.e. people with the programming aptitude for YCombinator, we find men make up 67.5% of this population.
Small differences in mathematical ability alone can add up to a lot, at least among smart people.
Eric Ries further goes on to present some numbers:
We all know there is a huge gender gap in computer science. But that gap means that women receive only about 30% of degrees in CS. But 30% is a lot larger than 4% – and that’s a big math problem for advocates of the pipeline theory.
It's true - if YCombinator looks only for mathematical ability which is 1 in 1000, then women should be closer to 30%. But YCombinator isn't Putnam - they look for more than just raw math ability.
But mathematical ability is clearly not the only thing they look for. Suppose, hypothetically, that "business aptitude" is also distributed the same way as math ability - identical mean, marginally different variance. Suppose it's also statistically independent of mathematical ability, and assume YCombinator wants a person who is top 0.1% in business sense.
At this point, the pool is up to 81% men purely on the basis of aptitude in two separate traits.
Unlike aptitude, men and women actually display large differences in risk aversion. For example, one study shows that out of a sample of MBA students, 57% of men are willing to pursue a risky career in finance, compared to only 36% of women. Assuming the same probabilities apply to men and women with aptitude sufficient for YCombinator, then the set of people with sufficient aptitude and attitude would be 87% men.
Small differences in the distributions of men and women don't allow you to predict much about any individual trait of a randomly selected individual. But when you limit yourself to the far tails of probability distributions, they do.
At this point, we cannot pin down any specific innate differences that explain precisely why YCombinator has so few female-founded companies. I'm not attempting to claim science currently knows everything about this topic. The only point I'm making is that it does not take "really extreme" differences in aptitude or innate preferences to account for gender differences in high end professions. It just takes a few small differences chained together.
Recently, Bryan Caplan criticized the ZMP theory of unemployment on the following grounds:
Most economists aren't entrepreneurs. The fact that we can't think of a productive job for a low-skilled worker is weak evidence for ZMP. But if even ivory tower economists can think of productive jobs for low-skilled workers, that is strong evidence against ZMP. And thinking of such jobs is easy. My first stab: How about as personal servants for high-skilled workers? In Third World countries, the middle class routinely hires live-in housekeepers, drivers, and so on. In the worst-case scenario, we can learn from them.
I am an entrepreneur who recently started a business with an office in a third world country. I stayed in the home of my business partner, who's wife managed several servants. These included a cook, a maid, a guy who washes the car, and a guy who takes care of the plants. People with children often hire a nanny. In the US, neither my business partner nor myself have servants.
I left a comment on Bryan Caplan's blog which I think is worth expanding on.
In India, there are many conscientious low skill workers. If your maid didn't leave school when she was forcibly married off at age 16, she might have worked hard, graduated high school, learned English or Hindi and gotten a decent job. In short, your Indian maid is low skill mainly due to circumstances and she does what she needs to do to keep her family well fed.
And make no mistake, her job is not particularly pleasant. It requires a submissive attitude, getting along with the boss, and probably doesn't provide much of an ego boost. When the boss adopts a stray dog, and keeps her on the porch because she hasn't learned to poo outside, guess who takes on the job of cleanup?
Lets now consider a typical poor, unskilled American and consider the question of whether they are suited to this job. As Bryan Caplan observes,
when leftist social scientists actually talk to and observe the poor, they confirm the stereotypes of the harshest Victorian. Poverty isn't about money; it's a state of mind. That state of mind is low conscientiousness.
He quotes the book book Promises I Can Keep, which suggests that
he [a poor, unskilled man] seems unwilling to keep at a job for any length of time, usually because of issues related to respect. Some of the jobs he can get don't pay enough to give him the self-respect he feels he needs, and others require him to get along with unpleasant customers and coworkers, and to maintain a submissive attitude toward the boss.
...[his] criminal behavior, the spells of incarceration that so often follow, a pattern of intimate violence, [...] and an inability to leave drugs and alcohol alone [also cause relationship problems, which is the primary focus of the book]
Does this unemployed, unskilled laborer sound like the sort of person who you would allow into your home at any price?
Certainly, not all unemployed people fit this profile. But many do, which makes the market for unskilled labor a lemon market. At this point, the cost of labor is now wages for the employee + time and effort from a skilled person. But no matter how low wages go, my time and effort is not becoming cheaper. I have a business to build, and I don't have time to hunt for a maid. So I wash my own dishes, and live in a house that isn't as clean as it could be.
Exploiting cheap labor isn't a simple matter. It can certainly be done (my business does it), but there are logistics involved. An additional difficulty of doing it in the US is that it needs to be done fast. Assuming the end of the recession causes worker's wages to rise, then your business might have to shut down as soon as the economy improves. This gives you a narrow window to recoup your investment, which further complicates the logistics and makes hiring cheap labor tricky.
Many people have commented on the fact that, after adjusting for chained CPI, median income has not risen significantly since the 1970's. Tyler Cowen points to this as evidence for his theory of the "Great Stagnation", which purports that the economy has grown more slowly during the latter parts of the 20'th century than during the former.
It's important to understand what the figures in the above graph mean. At any point in time, the income distribution of the US population was measured. Then percentiles were calculated, and plotted on the above graph. This is called a cross-sectional study.
A flaw with cross sectional studies is that trends they measure may not exist at the level of individuals - instead, a trend in a cross sectional study may be the result of a change in the composition of the sample.
A simple example: suppose you want to measure whether red apples turn green. You might fill a bucket with 90% red apples and 10% green apples. Later on, you might observe the bucket is now comprised of 15% green apples. Is this evidence that red apples turn green?
Not necessarily - someone might have dropped a few extra green apples into the bucket in between your observations. Since you don't track any individual apple in a cross sectional study, you have no way to know for sure.
In contrast to a cross-sectional study, a longitudinal study repeatedly measures the sample over time. I.e., a longitudinal study would label 100 apples and look at the color of each individual apple before and after (e.g., apple #1 started red and ended red, repeat for apple #2, etc). This differs from a cross sectional study because no new apples are added to the batch, and none are removed.
The hypothesis I'm proposing in this blogpost is this: the "Great Stagnation" is an artifact of cross sectional income measures, and is mainly a statistical artifact caused by immigration.
In 2008 the Brookings Institute did a study, based on PSID data, which attempts to measure income dynamics longitudinally rather than cross sectionally. In particular, they take as a sample a set of people who were American children in 1968 and compare their incomes to that of their parents at similar ages. The result?
Median income rose 29%.
The increase in income is not even across the population - it is concentrated at the bottom. The bottom quintile roughly doubled their income, while the top quintile experienced no income growth at all (relative to their parents at the same age).
The longitudinal data paints a very different picture than the cross sectional data. How can we explain the difference? The best way is to try to figure out which group of people are included in the cross sectional data but excluded by the longitudinal data.
The answer to this question is clear - anyone who's parents did not live in the US in 1968, i.e. immigrants.
Immigrants tend to occupy the lowest rungs of the economic ladder in the US, and these rungs tend to be lower than the average of the parents of US natives. This has caused income averages (both mean and median) to stagnate even though neither immigrants [1] nor natives has experienced income stagnation.
Thus, I propose the hypothesis that immigrants and Simpson's Paradox are the cause of the income stagnation in the US.
[1] I have no data to back this up, but it seems intuitively obvious based on the fact that most countries which provide a sizable number of immigrants to the US (Mexico, China, the Phillipines and India) are considerably poorer than the US.
Goal: count the number of books in the library.
Map: You count up shelf #1, I count up shelf #2.
(The more people we get, the faster this part goes. )
Reduce: We all get together and add up our individual counts.
lolz teh html spec h8ts twitter http://bit.ly/fGeB8x
4.10.7.2.8 The
maxlengthattributeThe
maxlengthattribute, when it applies, is a form controlmaxlengthattribute controlled by theinputelement's dirty value flag.If the input element has a maximum allowed value length, then the code-point length of the value of the element's value attribute must be equal to or less than the element's maximum allowed value length.The following extract shows how a messaging client's text entry could be arbitrarily restricted to a fixed number of characters, thus forcing any conversation through this medium to be terse and discouraging intelligent discourse.
What are you doing? <input name=status maxlength=140>