Sunday, June 26, 2011

A beautiful day!

No pictures or equations---just a huge sigh of relief. I finally managed to export data from my agent-based model and get it into Stata statistics software for analysis. The thing that was holding me up was that the attribute data for the 604 parishes in the SW of England is held in polygons....and when those polygons are imported into the Netlogo agent-based model, some of the data "disappears" or reappears in a corrupted form, which is even worse. I converted the polygons to points and that does the trick.

So---the agent - based model calculates the closest market-town for each farmer, and tells the farmer its population. Then exports that data so I can work out the effect of distance and population. Sounds simple doesn't it...

Anyway, I found that we have an interesting "gravity" model here, where the gravity equals the population divided by the square root of the distance. This declining effect makes sense: the average cost per unit distance was higher for shorter distances because of the fixed costs of loading and unloading all those sacks of grain.

It is days like these that I live for.

Friday, April 8, 2011

The effect of soil type on rent elasticity

The railways paper is moving along. We now have 23 estates in the dataset. We have shown that rent rose with the amount of track within a 40km radius of the estate. We have also shown that the year when the railway track near the estate was connected to London was statistically highly significant. This implies that the London market was really dominant. One result which is still a bit puzzling and which we are working on is the differences in elasticity. By elasticity, we mean the percentage change in rent caused by a percentage change in amount of track. The elasticities are really quite different....Holkham Hall for example is twice that of the smallest. This is interesting! The rate of increase may show us something about the relationship between landowner and tenant.

One possible reason is the production type of the farm. We know that dairy and meat prices rose more than wheat farming during the 1832-1869 time period. Folks were getting richer and so could afford a better diet. So farms which were on soil suitable for dairying might be more profitable and so the landowner could charge a higher rent. The map shows the estates, with the green circle proportional to the elasticity. Red soil is good for dairying and light blue for wheat. The seems to be a pattern: smaller elasticities on wheat soil. BUT look at the circle I've drawn around Leconfield and Emmanuel Hospital. They seem to be on the same (red) soil but the elasticities are quite different. Leconfield is the little circle just to the left of the larger Emmanuel circle. Why? Perhaps because Emmanuel Hospital is institutionally owned. Or the owner of Leconfield wasn't a tough businessman? More work to do.

Thursday, February 24, 2011

Railways and rent

So far we have twelve estates with kilometres of railway track counted up. In the regression we get very satisfactory results, which go to show that railway construction did indeed cut costs for farmers---but those savings were transferred to their landlords in the form of higher rents. However, in my readings I am finding some interesting differences between landlords. Some landlords had political ambitions and wished to be able to direct the votes of their tenants. As a result they the next research step is to try to find out which of our landlords had political ambitions, and then include this variable as a dummy variable. Then we can see the effect of politics on rent.

Wednesday, February 16, 2011

Coastal counties got cheaper wheat

I coded the counties with a dummy variable (0 = interior, 1 = coastal) so that I could test whether there was
any difference in the 'mark-up' according to whether a county importing wheat was on the coast or not.  A lot of wheat was shipped by coastal routes because that mode was often easier than using a horse and cart. I want to test for that. So
was the regression equation I used. On the left-hand side we have the dependent variable, the difference in wheat price between any one county and the source of the wheat, which was either Cambridgeshire or Lincolnshire. On the right-hand side is the intercept, then the coefficient for distance between the counties; the coefficient for amount of surplus or deficit in wheat for that county; and finally the dummy variable for 'coastal'.

As you can see from the output below, all the explanatory variables are statistically significant at the 95% level (p < 0.05). The coefficient for distance is positive, meaning that the further away, the higher the price difference. In effect this is giving us the transport rate for hauling wheat. It should be positive, makes sense doesn't it? Surplus/deficit has a negative sign. That means that the more surplus the county has, the smaller the price difference. Again, that makes sense. Why would a county pay more for wheat if they already have lots of it? The coastal dummy is the most interesting. It has a negative sign. So if you lived in a county on the coast, you paid less for wheat "other things being equal". We are 'controlling for" distance and surplus. So if we happened to have two counties, both the same with respect to distance and surplus, the one on the coast would pay less for wheat. Because they got more of their wheat through coastal shipping would seem to be a reasonable explanation! Can you see the power of these techniques? We can learn a huge amount just with a few bits of data scraped off the floor.

Tuesday, February 15, 2011

Difference in wheat market prices by distance

I came across a really special dataset that has wheat (and other grain) prices by the week from 1760 to 1820 for every one of the 40+ English counties. Think of the work some dedicated soul did in copying out all those numbers! I used this dataset in a recent post to show that wheat prices varied quite a lot, especially by 'demand', ie whether the destination market was in surplus or deficit.
Malcolm's map showing wheat origins

Malcolm has kindly provided me with a set of measurments from the two major wheat producing counties (Cambridgeshire and Lincolnshire) to each of the counties. See Malcolm's map on the right. We know the price difference between the wheat exporting county and county of importation; the distance between the two; the mean elevation of the journey, and the standard deviation of the elevation of the journey. The last measurement is to proxy the 'roughness' of the journey. I figured that greater changes in the standard deviation would be more expensive for the horse and cart operators of that time, and so they would increase the prices accordingly. We only have about 35 observations, but the results are interesting. After some experimentation, I found that regressing the natural log of the price difference against the natural log of the distance produced an acceptable result (p=0.02). The scatter plot of log price difference on the x axis with log distance on the y axis is shown here, together with a trend line of predicted values.

Regression of log price diff against log distance
If we use logs in the regression, then we also have a measure of the elasticity. The coefficient from the regression is 0.47, which means that a one percent increase in distance increases the price difference by 0.47 per cent. (See how useful logs and elasticities are!). This is quite a lot, and more than I had expected. The other measurements, such as roughness of the journey, didn't seem to matter as much. Now I am going to move on to 'control' for other variables, such as whether or not the county was industrial/agricultural, or on the coast. Why should coastal be interesting? Quite a lot of wheat was shipped by coastal vessels, probably a lot cheaper than by land. More later.

Thursday, February 10, 2011

Wheat price differentials surplus/deficit areas

Earlier I calculated the 'wheat flow' from counties where they grew more than they ate to counties which grew less wheat and more livestock. Recently I found extraordinarily detailed data which lists the wheat price by county only by year but by week within the year! I thought that the difference in wheat price between counties might be explicable just by some function of the distance between them. Turns out not to be so simple as that, but more interesting. The difference between counties is much more pronounced when there is a bad harvest. When the harvest is bad, the wheat exporting counties really take advantage and the prices in the importing counties surge. Sounds like the sort of behaviour we see over tickets to hockey games, doesn't it! So I worked out which counties had the most surplus (Lincolnshire and Cambridgeshire) and which had the biggest deficits (Lancashire and Middlesex). Then I subtracted Lincolnshire from Cambridgeshire...because if my theory os correct, the gap between the two surplus counties shouldn't change much. Then between the biggest surplus county (Lincolnshire) and the highest deficit county (Lancashire). If things are going my way, then the difference between these two should be accentuated in years of bad harvests. Sure enough, the red line really jumps in years when we know from old books that the harvest was poor. The years 1816/17 are a good example of this. 1800 looks dramatic, but we were at war with the French then (remember Napoleon?) and so there were all sorts of other factors involved. These results have a modern-day significance in terms of food security. Especially now as the world looks like it is running out of food. The events in Eqypt were basically triggered by high food prices. Learn from history! Next step is to try to quantify the effect. This is fun!

Tuesday, February 8, 2011

Relationship between flow and price in wheat

Negative relationship between surplus/deficit
In my last post I presented a map showing counties which were in surplus or deficit. I am convinced there is a relationship between price and surplus or deficit, with the price difference between counties being due to transport costs. I regressed (there you are again!) the flow against the wheat price, and found a strong statistical relationship (p=0.007). Here is a scatterplot which admittedly looks---well, scattered! But you can tell by eye that the relationship is negative. The greater the deficit, shown by having a negative sign on the y axis, the more you pay for wheat, price of wheat on the x axis. Now I need to try somehow to build a mathematical relationship between flow, distances travelled, and wheat prices. If I can find this relationship, then I can set it to zero on one side of the equation to find the 'limit of cultivation'. If that limit happens to be the border between Devon and Somerset----problem solved and game over!

Wheat price betrween farm and market

Wheat movements
England is rather regional in wheat production. But of course consumption of wheat (typically in the form of bread) is not regional. Everybody likes a toasted bagel! So that meant that wheat had to be moved (by horse and cart usually) from areas where production exceeded consumption to areas where they ate more than they grew. I've added surplus and deficit to the red areas exported wheat to areas in deficit. I have looked at wheat price records for 1760-1820 trying to associate surplus/deficit with difference in price. Sure enough, areas with surplus had a lower wheat price than areas with a big deficit. The graph below shows the wheat price for Lincoln and Norfolk (big surplus) and Lancashire and Devon (deficit). As you can see, surplus areas were pretty much always cheaper than deficit areas. Isn't it interesting how correlated the prices are? The gap between them is remarkably consistent. So, it might be reasonable to suppose that this gap is due to the cost of moving the grain from one county to another. If this was the case, then we could work out the distance and from that get the transport costs per unit of distance. We would do this by (our friend!) regression, something like this:

difference in prices between two counties = intercept + beta1*distance

beta 1 would be the transport costs.

(Now, I've linked the word 'regression' here to a Youtube video I've done in case you need a little revision). Take a look!
Remarkable price correlations
Why do we want to know this? Look at Devon. It's wheat cost is very high and it is also furthest from the big wheat producing areas. We have also found that 'distance to market' sign changes around the Devon/Somerset border. It could just be that this is the point where wheat brought in from Norfolk etc is just too expensive because of transport costs. This leaves the local farmers in control of their own market, so distance to market becomes important. Just a thought!

Monday, February 7, 2011

Wheat Prices and Geography

Mean wheat price 1760-1820
Last post, I showed how the prices of wheat in the counties of interest (Cornwall, Devon, Dorset and Somerset) all seemed pretty much in lock-step over the period 1760-1820, although Somerset seemed marginally higher. Now I've calculated the mean for the whole period for most English counties and plotted them on the map. The darker the green, the higher the price. Originally I thought this might be connected somehow with London, but it isn't. The big wheat growing areas in England then and now are the flat areas in the east (Norfolk, Essex, Cambridgeshire). Our counties of interest didn't really produce much wheat. So you can see that the greater the local wheat production, the lower the price. Or put it another way, if you live far away from the area of production you pay more because of the cost of that horse and cart. The area within the red pencilled bit is the wheat-growing heartland. See how the price of wheat goes up the further away you are. Except for Cornwall! Those people down in the extreme south-west have always gone their own way!

Sunday, February 6, 2011

Wheat price coordination

Wheat price yearly averages for four counties
In a recent post about the theory of agricultural location, I missed out the definition of 'P' in Dunn's equation...thanks Mi for pointing this out. P is the price of the commodity (e.g. wheat) at the market. I've corrected the omission. Mi's remark made me think about the variation in wheat prices on a locality basis. If the wheat price varies from point to point, then Dunn's equation gets more complicated. I've found a remarkable set of wheat prices by county which last from 1770 1820. I have been looking at these to see what regional difference there might be. On the graph I've plotted the yearly mean price for our four counties of interest. Interesting things: the wheat prices converge when the variation is small. But when there are peaks (for example a huge jump in 1800) then there is considerable divergence. And it is always Somerset that is most expensive. Cornwall is always cheapest, with the two counties between them geographically, also between them with regard to price. I'd guess that this effect is caused by proximity to London. London was a major market, with about 14% of the population living there. This is really interesting (isn't it?).

Thursday, February 3, 2011

Sunshine and wheat yields

Malcolm has been plugging away at data from the UK Meteorological Office, calculating mean hours of sunshine in August for our 715 parishes. Now, the data comes from earlier last century and not from 1836, when the wheat yields were recorded, but that’s the best we can do.

August sunshine is a critical factor in wheat yields: best is hot of course, allowing the ear to ripen fully before harvest. I’ve done a regression of wheat yields against hours of sunshine in August, as well as the other data we have: elevation, amounts of rainfall in different groupings. There is a map of sunshine to the right and the regression output is below. The map is interesting, because you can see the parishes in bright yellow, indicating the most sunshine. Along the coasts, and down to the tip of Cornwall. And of course this matches up with where people like to go on holiday. The best wheat yields come from Somerset, inland and with plenty of August sunshine.

Here is the regression output. Elevation has a negative sign, meaning less yield with height. Sunshine is expected. Rainfall is negative above a certain amount. More than 1000 mm in a year waterlogs the wheat plant. This is not a bad result at all, and it will improve once I get data for soil types and amounts of available water into the regression.

Statement of the theory!

We are trying to apply a combination of von Thunen’s rent and distance theory [1] and Ricardo’s rent and land fertility theories, using Dunn’s equation[2] from 1954:

where R is the rent for crop j at location i. E is the yield of crop i, and a is the production cost. k is the cost per unit distance of transporting crop j and d is the distance from the farm to the next commercial transit point (let’s call it CTP). P is the price of the crop at the market. This might be the mill, or a dealer’s yard---wherever the production next changes hands and where its value needs to be calculated.

The rent right next to the CTP will be highest because the distance is shortest. In the same way there will eventually be a ‘margin of cultivation’ where the cost of transport exactly equals the revenue from production. So at this point the rent will be zero. Farmers will bid for the use of the land at any point along a bid curve we construct just by joining up rent at the CTP and rent at the margin of cultivation:

Farmers won’t bid for land if the rent is set at any point to the right of the red line: it is too expensive. Likewise, landowners won’t accept any offer to the left of the red line, because they think they can get a better offer.

In the empirical testing we have been doing, we found that Dunn’s equation worked very well for Cornwall, Devon and Dorset. So far so good. But when I tried to include counties to the east, such as Somerset, the results were no longer statistically significant. In particular, the regression results gave a POSITIVE sign for Ekd, on the right hand side whereas it should (the theory goes) be negative.

In the literature I found several theoretical reasons for why this might happen. First:  worker wages are less further out from the CTP. So this would mean a reduction in ‘a’ in the equation, making the net revenue greater the further away from the CTP. (We see this happening in contemporary business: that’s why firms ‘outsource’ to China and India). Second: the perception of the farmer as to future harvests might differ from place to place. If he/she thinks that the weather will make yields highly variable, then he/she probably won’t bid as much for the land [3]. This second reason is why I have been getting you working on meteorological data these last few weeks.

But actually----I think the solution is simpler (they usually are). I have been mis-specifying the model. Instead of just distance to nearest market town (which worked well enough for Devon), I should have been thinking about the ‘connectivity’ of the towns and villages. The network density of the county/area in which the farm was. If it was highly interconnected with several different routes to the next town, then we would expect the rent to be higher, because the flow of goods in any direction would be less. Just have to figure out how to do that. Less reliance on distance to market town and more on how the towns were all connected together.

[1] von Thunen, J.H. Von Thunen's' Isolated State': An English Edition Pergamon, 1966.
[2] Dunn, E.S. The location of agricultural production University of Florida Press, Gainesville, FL, 1954.
[3] Cromley, R.G. "The von Th√ľnen model and environmental uncertainty." Annals of the Association of American Geographers 72 (1982):404-10.

Malcolm's comment on Cornish railways

Old Cornish tin-mine pumping station (Wikipedia, Tim Corsler)
Malcolm made a perceptive comment about why distance to market seemed to matter much more down in the 'toe' of Cornwall than in the lusher pastures of Somerset, well to the east. He thought this might have something to do with railways. This is interesting, because railways did develop early on in Cornwall, but mostly for the freighting of minerals, especially tin. The steam engine started off in Cornwall to pump water out of mines. The Cornish railway system was purely local though, and didn't get connected up with the rest of the system until the 1850s. What is interesting about Cornwall, and which Malcolm's comment made me think about, was the amount of food rioting that went on among the tin-miners. As the linked text notes, the rioters were isolated groups of non-local non-farmers who suffered when food prices went up. This isn't just history: we're seeing the same thing these last few weeks in Tunisia and Eqypt. Underlying the protests are concerns about high food prices, caused by stock-piling, poor harvests (possibly climate-change driven though this is risky speculation) and the large amount of food production being converted into ethanol (primarily in the US). There is an article here from the British Daily Telegraph about the world being one poor harvest away from chaos. History does repeat itself, and only fools don't learn from the past.

Wednesday, February 2, 2011

Market distance signs: a clear break

I used geographically-weighted regression to get location specific coefficients for market distance in the regression: Rent = wheat yield + market-town population + distance to market-town. As I have gone on and on about before, the theory is that there should be a negative sign for distance, because the further you are from your market, the more it costs you to truck your produce in on market day. The fact that this sign changes has been perplexing me...and there is something interesting going on here. Look at the map below.  Can you see that the locations to the west of that thick red line have a negative sign, those to the right a positive sign? I did put a marking on the map but it is a little faint. Wow! Such a clear boundary. But why is it there and not a few kilometres to the west or east for that matter? What are factors that decide the location of this line? Interesting!

Mountains and food security

Just been reading a really interesting article about food supply chains in mountainous areas. The ideas behind the mountainous areas you rely more on your neighbours etc, could apply to any time period. The article is a bit light on theory and math, but it gave me the idea that perhaps Devon was unusually reliant on local food markets. That is why the sign for distance to market is so resolutely negative. So it is not Somerset that is the odd county out for having a positive sign for distance to is Devon because Devon was a bit cut off (in more ways than one) and relied on its local food markets. Here is a map of population density and slopes. The population density I got by looking at the 1831 census and then dividing the number of inhabitants by the area of the county. Having less than one person per square kilometre seems unimaginably empty to us now.

You can see how fewer people and big slopes go together. So in Devon they would have felt pretty isolated, and so relied on their local market towns. I think I'll work on this theory for a bit. I've circled Devon in the map of southwest England below.
Density (in green) and slope (red for small, blue for big)

Tuesday, February 1, 2011

Water and wheat yield

I've been mapping topsoil water availability throughout the year and wheat yield. The relationship is clear: more water gives more yield. A more reliable indicator than rainfall. Here is a map with high water availability matched by high wheat yield.

Monday, January 31, 2011

Predicting wheat yields

I want to get a function that predicts wheat yields on the basis of climate and soil. I have extracted (with Malcolm help at the GIS end) some basic climate data from an old map of Britain. I did a regression of wheat yields against elevation, amounts of rainfall and length of the growing season. The results are below. This is what the results mean: ELEV is negative, meaning the higher up the farm, the less the yield. Makes sense. GS is growing season....strangely the longer the growing season, the lower the yield. The two 'RAINFALL' variables show the effect of rainfall of 1000mm a year and 1250mm a year. The coefficient for the 1250 is greater (at -4.42) than the one for 1000 (at -1.34). Meaning is that the greater the rainfall, a large reduction in yield. See the adjusted R-squared on the right? That gives us the percentage of the variation in wheat yield explained by the model. Here it is 0.3094 or a tad over 30%. This isn't very good, but better than I had expected. Now I need to add soil data and more accurate climate data, such as hours of sunshine.

Market distance and wheat yields

I am still having problems sleeping because I can't understand why we have some positive signs for distance to market. The theory is that the sign should be negative, meaning that the further the farm is from the market, the lower the rent. Makes sense, doesn't it? So why am I getting a positive sign for some regions in the the south-west of England. Could be related to relatively high yields. Below is a map showing in the top panel our 715 parishes with the wheat yields. In the bottom panel is a map showing the magnitude of the coefficient of the variable 'market distance'. The areas I have circled in the top panel show high yields, in the bottom panel a positive sign. The two areas seem to correspond, don't they?
So it could be that the gains to the farmer of high yields more than make up for distance to market. I'll work on the math to get a function for this and then test it.

Monday, January 24, 2011

Climatic data found

Malcolm has scored what look's like a bulls-eye. I asked him to locate historical data that we could use to test my hypothesis about weather causing a positive sign in the distance to market regression. Today he found some weather maps which look just the thing---I am particularly interested in July rainfall and August/September sunshine. Wheat does best with good rain in July and then a hot dry couple of months. Somewhere also in his find is some historical time-series data. I need that to calculate the variance. The weather maps give an average which is really useful, but it would be nice to know the variance. The two sequences {1,2,3,4,5} and {3,3,3,3,3} have the same mean but very different variances. If you were a farmer, the historical variance in your rainfall and sunshine might alter your cropping pattern and therefore the rent that you might bid for the use of the land.

Saturday, January 22, 2011

Possible interesting solution to distance to market mystery

I haven't been able to sleep these last two nights for thinking about why distance to market might increase rent (ie has a positive sign in the regression). I posted something about this mystery a few days ago. I think the solution might be connected with perceptions of risk. Farmers who face stable long-term conditions with regard to climate can generally out-bid farmers who are concerned only about the short-term. Folks with deep pockets can wait out the troughs because they aren't so worried about bringing home food to their family every single day. They can store food or buy it. So it is possible that the unusual pattern of a positive sign might be caused by highly variable weather conditions in that location. So I need to go back over metereological records and calculate the coefficients of variation for temperature and rainfall in various parts of the southwest. I'll put the numbers into the regression and see what happens. This is fun!

Track, rent and causality

I have been working on the mathematical model for the relationship between the laying of railway track and changes in rent. A bit of a problem is showing that there is 'causality' between the two. How can we prove that track caused change in rent? The answer is we can't, and the whole area of causality is frankly speaking a philosophical minefield. It is extraordinarily difficult to show that one thing 'causes' another. For our purposes, the only tool we can use is Granger causality which tests the relationship using time. What we want to see is that rent changes AFTER a change in track, not simultaneously or even worse, before. Malcolm has just given me the track for our eighth estate and I have done the Granger test on all of the them. I'm pleased to say that they all scraped through, some only just. I limited the range of years from 1832-1882 which gives us a half-century. We don't have thorough track measurements for the period after 1872, and wheat and cattle prices were highly volatile in the 1880s. That's OK...we've made the point.

I've been using an interesting form of regression, called Vector Auto-Regression or VAR to get out the stats.  It is simply beautiful! Life doesn't get better than this!

Tuesday, January 18, 2011

Wheat flows within Britain

Mi has helped me with the data for wheat flows within Britain towards the end of the 19th century. The map shows whether a county was in surplus or deficit. The calculations are for production by county minus consumption within the county. What is left over could be carted and sold to another county. As you can see, the pattern is predictable: the counties in blue had a surplus. These counties are on the arable lands towards the east and south of the country. The areas most in deficit were the sheep raising and also industrialising counties, in red and orange. The country as a whole had a net deficit which was covered by imports from Ireland and also Prussia, and later on, the United States. The point of the map is to show that not much grain moved within Britain by rail. It went by coastal steamer.

Calculating net flows is a useful step in analysing the agricultural structure of a country. We could if we wanted expand the scope to include European countries and North America. Here we had to make an assumption that the per capita consumption of wheat (in the form of bread) was the same across counties. There is evidence for and against that assumption.

Monday, January 17, 2011

Malcolm's comment about railway shareholding

Malcolm made an interesting comment on my last post. We were discussing whether the fact that the owner of a large estate invested in railway shares was interesting or not. Malcolm pointed out that the very substantial investment by the Earl of Leicester, owner of the very large Holkham Hall estate, in railway shares, was just prior to the expansion of railway track near his estate. This would indeed be interesting if:

  • we could see a pattern, such as other estate owners also making large investments
  • we could infer from this that the estate owners knew that they would be able to increase rents as a result of 'extracting' the savings from their tenants. If thus was the case, the estate owners were getting a free ride: their investment in railway shares would be (probably) be profitable AND they trousered the extra rents. Nice work if you can get it!
The difficulty with this is finding out whether an estate owner was a shareholder. Records do exist, but it would be a monumental task to go through them. Unless anyone has seen a searchable database?

Friday, January 14, 2011

Connectivity in the railways paper

Some interesting advances today in the railways paper.

1. Malcolm has been calculating the length of track available cumulatively on an annual basis from 1832 within a 40km radius of each of our six estates. So far the regressions of rent against track, controlling for wheat and cattle prices for the period 1832-1899, have been excellent. Today I was writing up the mathematical model for the paper and I am not sure that just length of track is enough. How about if there was a very long track but only one station? Of course that didn't happen, but we need to have a variable that we can prove mathematically. While we'll stick with track lengths for the moment, let's consider other options. We want a measure of connectivity: how easy was it for a farmer to get his cattle on a train? How about number of stations? From this I thought about network theory.....and nodes. I found there is a useful add-in to Excel called NodeXl that will draw a graph of connections and also give us a statistic of how connected a network is. So we could update the network year by year and then use the annual statistics of connectivity as the variables in the regression. We could get names of stations and years when they opened from railway timetables, the famous Bradshaw.

I tried this for Norfolk. Here is the map of modern stations in Norfolk and a bit of the Nodexl output. There is obviously a lot I need to learn about Nodexl but it does have the advantage of having some math theory to back up the output stats.
Nodexl output of some of the map. I got bored before I put in all the stations

Modern rail network

Holkham Hall is close to Fakenham towards the north of the map. When I was a kid we went on family holidays at Hunstanton on the coast to the north-west. And we went on the railways!

2. I was looking for involvement by estate owners in railway companies. Mi found me a book on Holkham Hall and guess what: the Earl of Leicester, the owner of the Holkham Estate, spent a year and a half of the estate's profit in buying railway shares in the 1880s. That was a lot of money. AND he campaigned for a branch line to run near the estate. Mi is getting me more books on this. I'd like to know whether the owners of our other estates also bought railway shares. But is this actually useful data? What do you think?

3. I think I've found somewhere equally interested in Victorian railways, found his blog by chance this morning. Beautifully laid out and all the sources meticulously included.

Thursday, January 13, 2011

Railways paper: six panels up

We now have six 'panels' each representing an estate's rent for the period 1832-1899. That is a total of 337 observations allowing for missing data and the 'instruments' needed to adjust for temporal effects. Malcolm just sent me the track records for the last estate, Badminton....I held my breath while the computer did the work (takes a few minutes). I was blue in the face when the answer came out---still significant! So we have a model that looks like this:

Family home for the owners of the Thorndon Estate
Note the positive sign for track, which means that an increase in railway within 40 km of the estate increases the rent. The effect isn't large, probably because the difference in costs between using the railway and 'droving' the livestock along a road wasn't a large part of the total farm budget. BUT it is statistically significant, and that is what counts.

One of the six estates is Thorndon, pictured here. Think of the heating bill! No wonder they had to raise the rents of all their tenant farmers!

Next I am going to work on  a separate regression for Holkham Hall, a big estate in Norfolk for which Mi has been getting me the data. Holkham is one of the six, so we already have some knowledge of this estate. It is in Norfolk, on the east coast, about a hundred miles north of London.


One of the variables that has a considerable impact on yields, both for arable and livestock, is the type of soil. It is clear why for crops such as wheat, but not so obvious for livestock. Don't they just eat grass? Well, yes, but grass won't grow everywhere, or perhaps not as luxuriantly as the livestock would prefer. Malcolm has identified some soil data, and on the map below I've layered the soil data with the 648 parishes. The next step is to add more soil layers (water, slope etc) and then extract the readings for each parish. Then build a model of yields using regression. We'll also add metereological data, and Mi is helping me with that.

Just from a glance at the map you can see that most of our parishes lie on fairly sandy soil (2) while those to the north-east lie on more clayey soil. Usually clayey soil is better for arable. It will be interesting to test this.

You'll see that there is a gap in the extreme south-west. That is the county of Cornwall, which I wasn't originally going to include. But it looks as though the thesis of 'market integration' is going to be something we'll run with, so Malcolm is working on that data now. That will bring our total number of observations to nearly eight hundred.

Pasture yields

This concerns the 'Devon Rents' paper. The analysis we have done for 648 parishes in six counties in south-west England has used only the arable rent and the wheat yield. But many of the parishes were right in the middle of prime sheep and cattle-raising areas, and in fact livestock farming would have been more important in 1836 than wheat farming. I haven't analysed the livestock sector for the parishes because a lot of the data is missing for livestock: we have livestock yields and livestock rents only for one county, Devon. We don't have livestock yields for the other counties. It is very tricky to calculate livestock yields, so that is probably why  the Inspector didn't bother. But we do have livestock rents, which for the Inspector would have been much easier to record; he would just have written them down.

So we aren't currently making use of the livestock rent data...which seems a shame. Hate to waste data! This morning I was out running and I thought of a way that we would use the livestock rent data. We know the livestock yields AND the livestock rent for Devon (n =96). Could we use the mathematical relationship that we know for Devon yields and rents to construct a simulation for the other counties? We know the soil, the rainfall, distance to market town----perhaps this is enough to calculate the relationship between rent and simulated yield for our other 600+ parishes? This would be a significant contribution to the literature. And the applications to developing countries are obvious.

Tuesday, January 11, 2011

A distance to market mystery!

I have been working on the geographically-weighted regression tool, exploring the spatial distribution of the various explanatory variables in the 'Devon rents' paper.  One variable is distance to the nearest market town of the parish. When I was doing 'ordinary' regression this particular variable caused me some grief because it wasn't always significant. It kept changing its sign depending on the size of the dataset. The theory holds that the sign should be negative. Greater distance to market should lower the rent. Below is a map of the signs and coefficients of the market distances for the parishes. Some areas, notably to the west and the northeast (Devon and Herefordshire) are respectably negative, but Somerset and Dorset are positive. How can being  further from the market increase your rent?
 Red and ochre colour are positive for market distance, blue and dark blue are (respectable and what we like!) negatives. I can only think that this result is connected somehow with transport to market over longer distance, or integration into the wider market. If you are putting cattle and wheat onto railways and canals, the distance to the nearest market town doesn't matter. Next step is to control for soil and climatic conditions. Fascinating!

Five railway 'panels' significant

Malcolm just sent me the lengths of railway track for the fifth estate---Tavistock. Tavistock is away down in the south-west of England in Devon. I was interested to test for this estate because it took longer for the railway to reach down into this relatively remote part of England. Below is a graph showing track mileage for Tavistock and for Dalemain.
Track mileage for Dalemain (in the north-east) and Tavistock (in the south-west)

I'm pleased to say that the results remain highly significant, regressing rent per acre against wheat and cattle prices, and length of track. This regression is 'longitudinal' over the years 1832-1899 with five 'panels', one for each estate. Next step is to add one further estate (Badminton) and then adjust the track amounts with some new data for 1872. I also want to include some data specific to each estate, such as yields, but there are no consistent and reliable time-series for this. It might be possible to 'control' for climatic conditions, such as average rainfall, but this might not be worth the work. With at least one more estate we already have a good outcome.

Now I am working on the structure of the paper and I will post a draft on Google docs for you to read and comment on very shortly.

Geographically Weighted Regression

I finally found out how to do geographically weighted regression! This allows us to look at local changes in a statistical relationship. I tried out the technique by regressing the natural log of arable rent against the natural log of wheat yield. The coefficient on wheat yield will give us the elasticity: the percentage that arable rent changes for a certain percentage of wheat yield. Elasticity is a commonly-used measurement in economics....often used for price and demand. So if a 100% percentage decrease in price for some item caused the demand for that item to double (go up 100%) , we could say that the elasticity was one, or unitary.  I have used elasticity to try to show how much the landlord keeps from the rent. Here is a map indicating the coefficient for the natural log of wheat yield, or the elasticity. To the west, the elasticity is lowest, but rises gradually as we move to the east, towards London. Just why we should see this very clear trend is highly interesting. Any ideas?

Monday, January 10, 2011

Cartogram of arable rents and wheat yields

I have been learning how to use some free software called Geoda. Frankly, it is not easy! So far I have constructed two cartogram of arable rent and wheat yields. Cartograms aren't really maps: they represent quantities of interest. Here they are:

The top one is wheat yield and the bottom one is rent. I find it interesting that the red dots don't match: red represents a large positive outlier, green represents normal. So we have quite a few instances of high rents, but rather fewer of high yields. So some landlords were extracting high rents when the yields were only normal. Now, let's not get too carried away with blaming landlords: there might have been other factors, such as closeness to the market. Anyway that is something to test.

Friday, January 7, 2011

The parishes and Thorndon added to panel data

Two advances today:

1. Finally worked out how to join the dataset that Malcolm has been so patiently preparing to the map of the locations of the parishes. The result is below. Looks a bit as though south-west England is suffering from chicken-pox! We have the data on arable rents, yields, distances to nearest market-town and elevations for these parishes, nearly 800 of them. We have already done the basic statistics and found that there is a very strong relationship between rent, yields, distance and elevations. We also found that the elasticity of rent to what the farmer took home increased markedly as we moved east towards London. I don't know why that should be, but I suspect it might be connected to the amount of 'enclosure' that went on in the area. I'll work on that idea. Next step is to use some free software called Geoda to calculate the 'spatial lag', which is the grouping together of rents. Here's the map and then below some notes on Thorndon.
The 800 (or so!) parishes in south-west England: data from the 1836 Tithe Files
 2. The 'railways' paper: Malcolm calculated the amount of track laid on an annual cumulative basis for Thorndon, the fourth of the estates. Thorndon is in Essex, right over on the east coast, not far north of London. As a result, they had railway track early on. I added Thorndon to the other three estates in the panel data set, and I'm delighted to say that the results remain highly significant. It is clear that landlords were extracting the savings from their tenants----but hey! what else is new? Mi has found me useful information on yields which is part of the estate-specific information I will begin adding to the dataset. Think of it this way: we want to isolate the impact of just one factor---track---so we need to hold steady anything else that might have an effect on rents. This is what we can do with panel-data regression and that's why it is such a powerful tool. So much is done with regression....learn it whenever you get a chance. It will be really useful to you. I'll teach you if you like.

Population growths

Mi has helped me to calculate the annual population total for three counties: Cumberland (where the Dalemain estate lies), Norfolk (Holkham Hall) and Sussex (Petworth). There is a graph below. Since the area of the county, didn't change, we can use the population figures as representing population densities. I tried including population density in the panel data regression, but it wasn't significant. I think we'll try the population size of the nearest market town to the estate. That worked well for the 'Devon' data. Take a look at the graph: can you see that the population of Sussex more than doubled in half a century? This huge growth meant many more mouths to feed and so required agriculture to increase its yields. The area of cultivatable land is fixed and so the only way to increase output is to increase the yields. This is a close parallel to the world situation today: food prices are climbing because the global population is swelling. That's one reason why the work we are doing has relevance.
I'll spend today at UBC getting more maps of railway construction: the Victorians helped solve their food problem by transporting agricultural output more efficiently. Lessons to be learned!

Wednesday, January 5, 2011

Track around 3 estates significant and elasticities

Two interesting advances today:

 1. For the 'railways and rents' paper:  I have built a 'panel' dataset using annual total railtrack in a 40km radius of three estates: Dalemain, Holkham Hall and Petworth. I regressed the rent against the track, cattle and wheat prices. The three independent variables---track, wheat and cattle---are all significant and with the 'right' signs. This is very satisfying. I'd like to get more data specific to each estate, such as yields etc. Mi is working on this. Soon we will develop a clear picture of how agricultural rents were set in the 19th century. This is something no one has tried before. The techniques will have analytical uses in developing countries where data is --- like 19th century Britain --- a bit sparse.

2. For the 'Devon rents' paper: I built a variable which represents the amount of money a farmer would get after he had paid his farming expenses. I regressed this, together with population of nearest market town and elevation, against rent. The results are highly significant. Then I built four 'windows' moving from west to east, so that I selected only the farms inside the windows. For each window I calculated the 'elasticity', which is the percentage change in rent for a percentage change in farmer's take-home money. I think (!) that this is a measure of the 'surplus extraction' of the landlord: how much he can 'squeeze' out his tenant. What is remarkable is that the elasticity changes as we move east towards London. It more than doubles over two hundred miles. This is a fascinating result, but at the moment I am at a loss as to how to explain it! I have put the elasticities into the relevant counties in the map below....not quite the same as the moving window but I can't see how else to show you.

It has been a good day. Thank you!

Distances to Market Decreasing With Longitude

I have been analysing data from about 700 parishes in the southwest of England, looking for patterns of how rent was set in the 1830s. The relationship between arable rent and wheat and barley yields is very strong, as we would expect. I started off the analysis with 96 parishes in Devon, in the far southwest of England. Here the relationship between rent and distance to market is very clear....further from market the lower the rent. Over the holidays, Malcolm helped me to increase the dataset, moving east towards London. The larger dataset was initially puzzling, because distance to market was no longer significant and had a positive sign. This morning I regressed distance to market against longitude and found that distance to market decreases as we move east towards London. In other words, there is a higher population density and so the farmer doesn't have to cart his produce so far to sell it. Here is a scatter plot with the regression trend line:

Now, I'll be the first to agree that this looks pretty much like wasps around a honey jar: BUT the relationship is statistically highly significant although with very little explanatory power (r-squared is small). I think this negative relationship goes towards explaining my initially confusing results. [Obvious when you think about it, which is (probably) what Newton thought after the apple landed on his head. ]

I did some more probing and found that the sign for market distance changes from negative and significant to either positive or not significant at about longitude= - 3.25. This is close to the eastern borders of our two western counties. Next step is to go to the 1841 census files and get population densities for the six counties we have been analysing. Clearly it wouldn't be a huge surprise if distance to market correlated with population density.

Tuesday, January 4, 2011

Dalemain Results

Malcolm has calculated the track for Dalemain, an estate in the north of England noted for sheep and cattle raising. The results look significant: the graph below shows the actual rent and the rent predicted by the regression model. Not bad....but I need to use some deflated prices and more localised variables, such as population density.

Google ngram as a research tool

There is a highly useful Google tool here:

which allows you to search for a word or phrase through all those books that Google has been scanning and also track the rise and fall of the word or phrase over time. You can select by language and bracket by years. And at the bottom of the graph there is a clickable link to the original books. Naturally I immediately tried 'agriculture','railway' for 1800-1870 and found some useful source documents. Might help you. 

Holkam Track Results

I've done a time-series regression of amount of track within a 40km radius of Holkham Hall against land rent, controlling for the price of wheat and cattle. I deflated the rent and the commodities to adjust for changes in the cost of living. The result is statistically significant, and I'd love to show you the output but can't work out how to paste it into the blog. Track and cattle have positive signs, but wheat is negative. Why should a drop in the wheat price result in increased rent..I don't know yet! I'll get there. The positive sign for Track is what we had expected to see: more track means increased rent. The savings are being extracted by the landowner. Recall that the equation for locational rent is

where m is the market price of the commodity, c the cost of production, E the yield, f the cost of transportation per unit distance and d the distance. As the distance increases, the right hand side gets smaller, so the rent gets smaller. Eventually the rent would be zero right on the edge of the cultivatable land. By increasing the amount of track, in effect the distance is getting the rent goes up.

Encouraged by this, we're going to build a larger dataset. Malcolm is calculating the track for three more estates: Petworth, Thorndon and Dalemain. A graph of their rents is here:

You can see that there is a bump in the rents in the period around 1850----what a coincidence! Once we have the track data in, I'll do the same type of  regression, but this time it will be a panel-data longitudinal regression. This is a very powerful technique, which I'd urge you to learn if you see the chance.

Sunday, January 2, 2011

Track in 40km radius of Holkham Hall

Malcolm has done a wizard job of calculating the amount of track on an annual basis within a 40km radius of Holkham Hall in Norfolk. The graph is here:
Our hypothesis is that the availability of track would reduce the costs of farming to the tenant farmer, but that the landowner would grab the savings in the form of higher land rents---the 'resource extraction' theory.The next step is for me to test this using a time-series regression, controlling for other variables such as the price of wheat and the price of livestock. We need to hold steady the other variables so that we can isolate the effect of the reduced transport cost. Rent is the dependent variable and then the amount of track and the various prices are the independent (or 'explanatory' variables). The equation looks like this:

The regression is a time-series, and so we have to remove the effects of auto-correlation over time. I've omitted all the subscripts for time for clarity. We can use ARIMA for the regression. This is exciting and a great way to spend the holiday! Thanks Malcolm for your speedy work. I'll be back with some statistical output shortly.

Saturday, January 1, 2011

New Year's update

So, we have four papers and a book (yes!) to finish off this year. Here is a little run-down job by job:

1. The 'political' paper is under peer review at the moment. Let me know if you want a copy. We showed that there was a strong statitistical relationship between the type of crops grown in a political constituency, the attendance at church of the residents, and how the MP for that constituency voted. The parliament of 1841 was very much about 'church and state' and so these findings make sense. We used some novel statistical techniques to get round the 'missing voter' problem. Many MPs didn't turn up for divisions and so our sample size is small. But we know all we need about the MP except for how he would have voted had he turned up. We want to keep this information, not toss it out. We used Anne Sartori's improvement on the Heckman procedure (he won a Nobel for that). Special kudos to Hugh (Salway) from the University of York in the UK for volunteering a huge amount of time and helping us with hard to get data. Come and visit us, Hugh!

2. The 'Devon rents' paper. Here we are using very old data: from the 1836 Tithe Commission. We find that in the county of Devon, there is an amazingly robust relationship between arable rent, wheat yields, elevation and distance to market town. This fits all the 'locational rent' theories: von Thunen, Ricardo etc. So not content with that, we want to see whether this relationship holds in neighbouring counties. Malcolm has built up a large dataset (n>600) of parishes in six contiguous (look it up!) counties. What is interesting is that some parts of the relationship change as we move east, seeming at first sight to indicate less reliance on local markets as we get closer to London. Makes sense. To test this, we'll be using a relatively new statistical technique called Geographically Weighted Regression. In regular regression, we are trying to estimate some 'global' parameter that fits throughout the statistical population. In GWR, we allow the parameter estimates to vary according to local conditions. I am keen to measure the elasticity between rent and what the tenant farmer could take home to his wife and kids: in other words, were some landlords greedier than others, and if so, why? I am half-hopeful that we'll see the hand of the Anglican Church behind all this, but mustn't get my hopes up.

3. The 'railways and rents' paper. A huge amount of track construction went on in 'our' period, see the graph below. This must have had some impact on farming. James Caird, writing in 1851, makes an intriguing reference to a Norfolk farmer saving four hundred pounds a year ( a lot. You could have bought a Bentley if they had made them then) because his cattle didn't lose weight when they went by rail; when he walked them to market they ended up quite thin. But amazingly there isn't much published on this. And I can see why. Getting the data is like pulling hen's teeth. Malcolm is figuring out the total amount of track in a 40km radius of Holkham Hall, Norfolk for the years 1836 to 1866 on an annual basis. And Mi is scouring the libraries of the world for any sort of references that might help. Here is the graph:.

Growth like this must have had consequences! We hypothesise that the tenant-farmer's savings would have been transferred to the landlord via an increased rent. This is the phenomenon of 'surplus extraction'. Generally you want to be the extractor, not the extracted. But the tenants were in a weak we would expect to see their rents going up in tandem with better transportation. Mi has got us the rents, now we await Malcolm's annual track data. Then I'll use a time-series analysis procedure called ARIMA to test for linkage. If the Norfolk estate gives us a YES, we'll extend the procedure to other estates and use a panel-data approach. Nice cutting edge stuff.

4. The 'supply response' paper. In the 1870s, the price of wheat fell dramatically because those blasted Americans opened up their railroads and shipped in wheat below the domestic price. See the graph of wheat prices below. Quite shocking: halved in price in less than two decades.
Look what has happened: wheat has halved but livestock numbers have shot up. This look like a structural shift in agriculture. But this is at the national level....individual farmers won't all have been able to shift out of wheat and into livestock.  We hypothesise that those estates which were more flexible with what the tenants did with their land probably didn't need to drop their rents as much as those estates which were more rigid. Farmers are not (always) stupid and can adapt pretty well to changing market conditions. Different matter if they can't adapt because of regulations on land use. We will test for a 'breakpoint' in rents, and then use that year as an indicator. Going to borrow from medical statistics, normally for use in working out how long a patient has got to live. Again, nice cutting edge stuff.
5. A book! New idea, inspired by Sarah, one of my two fantastic sisters. This is historical fiction, in other words a story based on real people. The working title is 'Breaking Free From Dr Malthus' and the theme is how the heck did the farmers increase yields enough to allow enough folk to escape from the hard scrabble of agriculture to start off the Industrial Revolution. A new format...left hand side page is fiction, the facing right hand side is economic analysis and commentary on the the fiction. Including the highly exciting new field of neuro