Tuesday, January 10, 2012

Distance coefficients---positive?!!

Finally got it! I did two regressions:
Arable rent = arable yield + distance from London (and a bunch of other variables)
Pasture rent = pasture yield + distance from London

Agricultural location theory (and common sense) would tell us that the sign for yield should be positive (rent goes up for better land) and negative for distance from the main market. The further away you are, the more your transport costs, so your rent is less. So why was I getting negative signs for pasture (meaning livestock) and positive signs for arable? Seems contradictory. The pasture sign is fine---the animals had to walk all the way to London, so you would expect a negative. But arable?

I finally got it at about three this morning. Couldn't sleep. Wheat was produced in the eastern counties and the western counties were importers from the east. They could not produce enough to feed themselves. So those western producers who could grow wheat were in a sense protected by the distance. So the further you are away from London (which is close to the wheat-producing counties) the greater your arable rent.

I checked this by getting averages of wheat prices in 1820 for all counties (no data at the county level available for 1836). The map below shows that the west (darker colour for higher wheat prices) did have higher wheat prices. In effect Cornwall acted as a main market, which is why the prices are highest there. Here is the map.
London is the red dot to the east. The 588 parishes are coloured by their r-squared values. You can see there is patch of dark red (high r-squared) in north Devon/Somerset. Now I also made a graph of the regression coefficients for distance from London against distance from London. Here:
You can see that at the point closest to London (ie further from Cornwall) the regression coefficient is 0.00002. As we move towards Cornwall, the coefficient goes down to 0.000008, a reduction of 2.5 times. Notice the interest shape of the curve (which I think I can model). The 'straightest' or most linear section is between about 200,000 metres and 300,000 metres (marked with vertical red lines). It happens that our high r-squared values lie between these distances. The r-squared values are high because distance is being accurately incorporated.

It seems that the size of the coefficient is a function of distance. So if I can get the equation of the curve above, I can use integration to get the total cost of any journey....then plug that back into the regression. Hey the sun is finally shining!

No comments: