I am basically trying to fit a data set of goals scored to a distribution (probably Poisson distribution although I am doubtful of the accuracy of this method) to attempt to predict the goals scored by a football team from the data of previous results that I already have. I do not require programming help as I have programmed the database of leagues,teams,fixtures and think I am capable of the programming side without any input, hence why I asked in this forum rather than the programming one. Does anyone have any ideas how I would go about this most successfully or even better but I doubt this, does anyone have any experience predicting results in sport?

## 8 Replies - 984 Views - Last Post: 13 June 2013 - 10:10 AM

### #1

# Does anyone have any experience with statistical modelling?

Posted 13 June 2013 - 08:50 AM

##
**Replies To:** Does anyone have any experience with statistical modelling?

### #2

## Re: Does anyone have any experience with statistical modelling?

Posted 13 June 2013 - 08:58 AM

Moved to Software Development.

The Law of Large Numbers and Central Limit Theorem imply that a distribution will approach normality with sufficient data. Why use a Poisson distribution here? A Poisson distribution deals with the average rate of occurrence.

The Law of Large Numbers and Central Limit Theorem imply that a distribution will approach normality with sufficient data. Why use a Poisson distribution here? A Poisson distribution deals with the average rate of occurrence.

### #3

## Re: Does anyone have any experience with statistical modelling?

Posted 13 June 2013 - 09:12 AM

Given that the average rate of an occurrence is goals/game then I assume that fitting to a Poisson distribution would be reasonable. Also, using the mean of a sample of goals per game should make it possible to calculate the probability of future goals/game of occurring. Also, I'm pretty sure that with a large data sample goals scored in a football match would not approach normality as it would definitely be skewed towards lower values. For instance, it is logical that a team is much more likely to only score 1 goal in a game (the left tail) than score an extreme in the right hand tail such as 10 goals.

### #4

## Re: Does anyone have any experience with statistical modelling?

Posted 13 June 2013 - 09:17 AM

The Poisson distribution is pretty clearly defined. What sort of specific help are you looking for?

As for your point on normality, that's just the thing. Your distribution will approach normality, with means around maybe 2-3, and standard deviations that would place 10 goals as an outlier.

Edit: To add about the Poisson distribution. It is used to answer questions along the lines of "there are on average five defective items in a shipment of 200. What is the probability than in three shipments, only 12 are defective." I feel like if you just want to predict the total number of goals in a game, a normal distribution will provide a better tool to do this once you have collected sufficient data.

As for your point on normality, that's just the thing. Your distribution will approach normality, with means around maybe 2-3, and standard deviations that would place 10 goals as an outlier.

Edit: To add about the Poisson distribution. It is used to answer questions along the lines of "there are on average five defective items in a shipment of 200. What is the probability than in three shipments, only 12 are defective." I feel like if you just want to predict the total number of goals in a game, a normal distribution will provide a better tool to do this once you have collected sufficient data.

### #5

## Re: Does anyone have any experience with statistical modelling?

Posted 13 June 2013 - 09:28 AM

I attempted fitting it to a Poisson distribution using a hypothesis test and concluded that it was almost certain that the data sample was not from a Poisson random variable. The normal distribution is symmetric and even with the data of all the football results in the world I am pretty sure that football results would not become a symmetrical distribution. I am really just looking for guidance on the method you would follow attempt to find the best model possible with my data sample.

Would you not use a Binomial distribution for a problem such as "there are on average five defective items in a shipment of 200. What is the probability than in three shipments, only 12 are defective."

Would you not use a Binomial distribution for a problem such as "there are on average five defective items in a shipment of 200. What is the probability than in three shipments, only 12 are defective."

### #6

## Re: Does anyone have any experience with statistical modelling?

Posted 13 June 2013 - 09:29 AM

Quote

The normal distribution is symmetric and even with the data of all the football results in the world I am pretty sure that football results would not become a symmetrical distribution.

I'm not trying to be argumentative, but yes it would. It does this every single time as enough data is collected. The Central Limit Theorem justifies this. So if your data clusters, the mean will be centered near that cluster. If an outlier is added, the mean may shift a little bit. The standard deviations account for the outliers. With ten elements in the sample, I would agree- don't use a normal distribution. With 25+, using a normal distribution becomes increasingly accurate.

This is really the model I would use.

Quote

Would you not use a Binomial distribution for a problem such as "there are on average five defective items in a shipment of 200. What is the probability than in three shipments, only 12 are defective."

The Poisson distribution is a special case of the binomial distribution. So yes.

### #7

## Re: Does anyone have any experience with statistical modelling?

Posted 13 June 2013 - 09:41 AM

OK I didn't want to cause a riot either I will take your comments on board and try to fit the data to a normal distribution and just to clarify I do have access to results of every fixture for the previous 20 seasons of English Premier League and lower 4 divisions so that should be enough Thanks for the advice.

### #8

## Re: Does anyone have any experience with statistical modelling?

Posted 13 June 2013 - 09:42 AM

Glad I could help! Good luck!

### #9

## Re: Does anyone have any experience with statistical modelling?

Posted 13 June 2013 - 10:10 AM

Do you think I should use the goal difference in each fixture rather than the home goals and away goals?

There is a property of the normal distribution that If X1 and X2 are two independent standard normal random variables with mean 0 and variance 1, then the their sum and difference is distributed normally with mean 0 and variance 2. Could I then make the assumption that a probability of more than 0 would indicate a home win and a probability less than 0.5 would indicate an away win. The problem with this obviously doesn't take into account a draw and there is also the problem of a teams most recent results being more relevant than results far in the past.

There is a property of the normal distribution that If X1 and X2 are two independent standard normal random variables with mean 0 and variance 1, then the their sum and difference is distributed normally with mean 0 and variance 2. Could I then make the assumption that a probability of more than 0 would indicate a home win and a probability less than 0.5 would indicate an away win. The problem with this obviously doesn't take into account a draw and there is also the problem of a teams most recent results being more relevant than results far in the past.

Page 1 of 1