Just another WordPress.com site

“Is it dishonest to remove outliers and/or transform data?”

Outliers are defined as a data point that is extremely different from the others that have been produced in the sample. By extremely, it is meant that the data is several standard deviations away from the sample mean and that is follows a completely different pattern from the other data points. They can be easily spotted through things such as a high degree of inconsistency across the board of participants. Outliers flaw the design of the experiment and effect the results produced, it also effects the write up of the test, for example, if a ten people read 20 pages an hour of some text book and one reads 200 pages per hour, the average pages/hour will be thrown off by the one person that seems to have way above average skills. It could even be the difference in the hypothesis being supported or not, but is it dishonest to completely remove them from the data?

Many scientists do not believe that they should be removed from the data… it effects the reliability of the experiment should other scientists wish to carry out their own versions, the results would simply not match the results of the initial test, it also makes it easier to tamper with data, scientists who see that the data will not support the hypothesis could easily remove certain data points in order for it to then be supported.

However, other scientists do believe that outliers should be allowed to be removed, and that this could even be done before the data is analysed, the participant could be removed from the experimental condition, should it be seen that they are not following the instructions carefully or that they fail to engage in the experiment at all. Not removing these participants affects the validity of the results. Cleaning the data before the final project write up ensures that data represents the construct properly.

In conclusion I believe that it is okay to remove outliers from the experiment should it be done before the data is analysed, through the observation of an unengaged and despondent participant, although I do not believe that it should be allowed to be removed once the experiment and write up has begun, this is because I believe it affects the reliability of the experiment too much and when other scientists wish to carry out the test again, the same results will not be sought.


Comments on: "Week 3- It is dishonest to remove outliers and/or transform data?" (7)

  1. I agree with the fact that removing outliers is risky business! You mention that removing outliers can mess up your data but do you think that by having outliers within your data set can also cause misleading results?. A research report by (Jason.W.Osbourne) states that “A participant may make a conscious effort to sabotage the research (Huck, 2000), or may be acting from other motives.” This can be a common problem for researchers, especially in the field of experimenting with young teenagers and their drug habits (for an example). It is safer to remove these outliers rather than posting a set of results knowing that some of the data is false.

    So i feel that removing the outliers is an essential part of research. And i agree that they should be moved before any analysing has begun.

    Really well written piece of work 🙂

    • Thank you 🙂 Yes i also agree that having the outliers in your data can cause misleading results. Also love the reference you’ve included, also think this is why psychologists have to be so careful about giving the full information out about the experiment before the participants take part (i know that this goes against ethical concerns) but if participants find out the exact reasons for the experiment then they may purposely mislead their answers in order to sabotage the results 🙂

  2. In your argument that some scientists believe that outliers should not be removed, because when the research is repeated the results will not match. I highly disagree with the point. In further research, the sample would be different and two possibilities could occur. 1. There are no extremes and therefore, no outliers. 2. There is an extreme result and therefore, there are outliers. In either case, if you were to agree that outliers should be removed, the end result would be “the same.” If the original data removed an outlier, then the overall results would be similar to the data which didn’t have an outlier… because the participant’s results they have used would again be similar. By leaving the extreme results in, I would have thought that would make the results unreliable.
    If this is the case, it doesn’t matter whether outliers are removed once analysis has begun. Surely, its harder to distinguish extreme results before you’ve analysed the results? If you remove the data, in its raw form, how do you truly know that there is a statistical difference, which makes it an outlier. Surely, you cannot tell until you’ve analysed the results and a graph and/or (not raw) data shows the data to be extreme.

  3. This is my blog for week 3, this needs to be commented on next week.. The statistics one is the one to be commented on this week, not sure if this matters or not though

  4. I agree with the fact that they can be removed if they completely disrupt the data but i also think that some times the outliers can help us see other things. so i think when considering removing them the researcher should first see what caused the outlier because it could be a genuine result but maybe due to something like the participant is from a different school therefore got higher than the students from another school which would then let the researcher see the difference between different groups in society and then that could lead to other findings and other experiments.

  5. Outliers can be caused by many different things and they are not all bad. They can be caused by: data errors, intentional or motivated mis-reporting, sampling error, standardisation errors, faulty distributional assumptions, or legitimate cases sampled from the correct population (Osborne & Overbay, 2004). So how you treat them should depend on why they happened. If they are due to a form of error they should be rejected from the data but if they are legitimate then they should be kept in and just played with so they don’t skew the results.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

Tag Cloud

%d bloggers like this: