Not signed in (Sign In)
  1.  
    There has been a lot of discussion about the "median" and "average"... I put this together to help clarify some misconceptions about how easy (or not) it is to manipulate Auctioneer's median buyout values.

    I have been working on the statistics calculations in Auctioneer to more accurately show the middle normal value, I am in the process of using Interquartile Range. To explain let's use the data set 3,5,6,7,9,10,12,14,14,35,50

    The average of the above set is the sum of the values divided by the size of the set:
    http://en.wikipedia.org/wiki/Average
    165/11 = 15 <- The average

    The median is the middle value in a distribution, above and below which lie an equal number of values.
    http://en.wikipedia.org/wiki/Median
    3,5,6,7,9,10,12,14,14,35,50
    ________^_____________ <-- 10 is the median value 5 values below and above

    The IQR (Interquartile range) is the difference between the third and first quartiles.
    http://en.wikipedia.org/wiki/Interquartile_range
    In our set quartile1 is 6.5 and quartile3 is 14 therefore the IQR is Q3-Q1 or 7.5

    Using a standard formula to detect outliers only values in the dataset that are more than
    Q1 - (1.5*IQR) or 6.5 - (1.5*7.5) or -4.75
    AND less than
    Q3 + (1.5*IQR) or 14 + (1.5*7.5) or 25.25

    Soooo... Our dataset would be reduced to
    3,5,6,7,9,10,12,14,14

    Now we can safely calculate the median of the reduced dataset which is:
    3,5,6,7,9,10,12,14,14
    ______^__________ <-- 9 is the median


    In summary, using the dataset above....

    Average = 15
    Median = 10
    IQR Median = 9

    Hope this helps!
  2.  
    Pinned, for all our sakes.
  3.  
    Posted by: Karavirs on Feb 28 2006, 06:44 PM
    I have been working on the statistics calculations in Auctioneer to more accurately show the middle normal value, I am in the process of using Interquartile Range.



    Hopefully your IQR calculations will eliminate absurdly small values also. On my server, one person is posting bid values of 1 copper on some valuable trade items to try and reduce bid values. (The auction is cancelled before a sale happens)

    I would like my Auctioneer to calculate the IQR Median value and use that for posting auctions instead of the median value it uses now. Are you working on enabling this ability? Do you have code I could insert somewhere into Auctioneer to use the IQR median?
  4.  
    Posted by: auctioneeruser on Mar 1 2006, 01:03 PM

    Hopefully your IQR calculations will eliminate absurdly small values also. On my server, one person is posting bid values of 1 copper on some valuable trade items to try and reduce bid values. (The auction is cancelled before a sale happens)


    Yes... IQR will drop extreme highs or lows. The values in the example dataset are small therefore the "floor" was negative. On high value items, extreme highs/lows will be dropped.

    Quote:I would like my Auctioneer to calculate the IQR Median value and use that for posting auctions instead of the median value it uses now. Are you working on enabling this ability? Do you have code I could insert somewhere into Auctioneer to use the IQR median?


    IQR Median calculations will be out in the next alpha release. IQR Median will replace all median calculations.
  5.  
    Posted by: Karavirs on Mar 1 2006, 12:14 PM
    Yes... IQR will drop extreme highs or lows. The values in the example dataset are small therefore the "floor" was negative. On high value items, extreme highs/lows will be dropped.



    IQR Median calculations will be out in the next alpha release. IQR Median will replace all median calculations.


    THats awesome! Look forward to seeing it come out.
  6.  
    Posted by: auctioneeruser on Mar 1 2006, 02:03 PM
    Hopefully your IQR calculations will eliminate absurdly small values also. On my server, one person is posting bid values of 1 copper on some valuable trade items to try and reduce bid values. (The auction is cancelled before a sale happens)

    I would like my Auctioneer to calculate the IQR Median value and use that for posting auctions instead of the median value it uses now. Are you working on enabling this ability? Do you have code I could insert somewhere into Auctioneer to use the IQR median?


    AFAIK, Auctioneer only calculate the Median for the Buyout prices. Bid values of 1cp should have no impact on the median. And if they put buyout value of 1cp, well, buy them :-).
  7.  
    The new IQR based calculations are available for testing in the unstable build 734.
  8.  
    Posted by: Karavirs on Mar 3 2006, 09:18 PM
    The new IQR based calculations are available for testing in the unstable build 734.


    I am not seeing any change in the suggested sell prices using version 734. Is it necessary to do something to activate IQR values? Do you need to do a new auction scan?
  9.  
    I don't think you got IQR right. Using your example:

    3,
    5,
    6, <- 1st quartile
    7,
    9,
    10, <- Median : 2nd quartile
    12,
    14,
    14, <- 3rd quartile
    35,
    50

    Quartile like a median cannot assume a value that isn't a part of the data set.

    Now the IQR is like you say Q3-Q1 giving an IQR of 8. Which moves the values of the outliers as follows:

    Values in the dataset that are less than
    Q1 - (1.5*IQR) or 6 - (1.5*8) or -6
    OR more than
    Q3 + (1.5*IQR) or 14 + (1.5*8) or 26
    Is outliers and should be removed

    Now the reason to remove outliers is to be able to get are more "correct" value when we calculate the mean. However when we remove outliers we do run the risk of removing significant values. In the terms of WoW it is widely know that some people do overprice a lot and some underprice, so it can be justified to remove the outliers. Now what we end up with is a data set which shouldn't include significant outliers, thus using the mean should be more correct than the median. The reason for this is that the median is used not because it is more exact than the mean, but because of the fear of outliers distorting the mean. Hope I havn't lost you here.

    Back to the example. The remaining value as those above -6 and below 26:
    3, 5, 6, 7, 9, 10, 12, 14, 14

    The mean is 8,89
    The median is 9

    Statistically the mean would be a more exact value to use. Especially considering that we have already removed the outliers that could distort it significantly.
  10.  
    Posted by: Sutekh on Mar 4 2006, 07:27 PM
    I don't think you got IQR right. Using your example:

    Statistically the mean would be a more exact value to use. Especially considering that we have already removed the outliers that could distort it significantly.


    I think Sutekh is right in most cases. After removing outliers, the mean would be the better number to have in most cases.

    My question is how to deal with smaller datasets that are the most susceptible to people intentionally manipulating the data. For example, 4,4,5,5,6,20,20,20. If I apply this right, the IQR would be 20-3=17 and 20+IQR wouldn't knock out anything. We would end up with a very distorted mean. The median would be 5.5, which wouldn't be terrible, but the ideal solution would be to knock out the 20's and deal with 4,4,5,5,6 for a median of 5 and a mean of 4.8.

    In tackling this problem, I'm wondering if there is a way to knock numbers out of the dataset based on their percentage difference from the median. For example, could we throw out anything with a value of 200% or 300% of the median? In the above example, it would have knocked out anything above 11 (or 17 at 300%) for the dataset and got us down to that 4 - 6 range that is the true range of legitimate values.

    The IQR approach is good for a lot of scenarios, but it seems that Auctioneer needs a solution that can specifically handle people intentionally trying to throw it off and can work through smaller datasets. I'm not sure that IQR would necessarily fix a lot of the screwey listings I'm seeing these days. When it is an item that is listed in large numbers (20+), things tend to even out. It is those lists of 5 to 10 where I see a lot of intentional and large manipulation.
  11.  
    Actually using to median or mean of a small sample (roughly below 30) doesn't really give a useful value. When you only have a few scans of an item you have no idea of how that item have been priced. Sure some people overprice items, but some underprice them. And with only a handful of seen items the value given by auctioneer really can't be certain.

    However... if we assume that the prices are normally distributed, then it would be possible to give a confidence interval of say 95%. In effect the confindece interval would say that with 95% security the mean price is higher than X and lower than Y. Of course the fewer scans you have of an item the wider the interval between X and Y. A confidence interval sacrifices the ability to give an exact figure in exchange for a greater certainty.
  12.  
    Well - 95% confidence interval is calculated based on the mean and standard deviation, which are both still affected by outliers if they're included. If you eliminate outliers based on IQR (pretty typical methodology as posted above - IQR * 1.5), then yes I think it could be useful. But with small datasets the confidence interval is possibly going to be so large as to not be useful.

    What are people really interested in, though? I'm a complete noob to Auctioneer so I'm not sure how trustworthy the standard "undercut median by 5%" is. Seems to me that on my high-pop server, you can often see over 150 stacks of thorium going for 3.2g to over 5g buyout. People only ever seem to buy the 20-40 stacks that are selling for less than 4g, even though the median may be 5g. In that case I think I would rather see Q1, min/max/range and mean & median to get a fuller picture of what's going on.

    Maybe I'm doing things wrong, or unaware of some key features.
  13.  
    Posted By: Imported (Rynas)What are people really interested in, though? I'm a complete noob to Auctioneer so I'm not sure how trustworthy the standard "undercut median by 5%" is. Seems to me that on my high-pop server, you can often see over 150 stacks of thorium going for 3.2g to over 5g buyout. People only ever seem to buy the 20-40 stacks that are selling for less than 4g, even though the median may be 5g.


    What you need in those cases is some way of gauging demand. Maybe it's something we could enter manually: I tell auctioneer that, say, 50 stacks of Thorium will sell per day, it sees 150 stacks, and sets the price on the assumption that the 33rd percentile will sell (median=50th percentile). Requires more info to be stored for each item, but could be set up only for specified high-volume items.
    • CommentAuthorLemandria
    • CommentTimeOct 13th 2007
     
    [quote][cite]Posted By: Imported (Rynas)[/cite]Well - 95% confidence interval is calculated based on the mean and standard deviation, which are both still affected by outliers if they're included[/quote]

    If you operate on unit prices, the effect is minimal.

    The market manipulators nearly always post up single items at 300%+ value (otherwise the auction fees eat them alive) in order to jack prices, because they know Auctioneer operates on medians.

    If you want to statistically handle outliers correctly, you need a) weighted means (using per-unit prices) and b) std dev. Over the long haul nearly all of those high priced single stacks are eliminated from the resulting average.

    If you use a z-score threshold, manipulators can only have a strong effect on items that do not stack (and this is expensive for them in the long haul, those auction fees do add up).

    Consider (data from my current server on one stackable item):
    1 @ 418c
    13 @ 923c
    20 @ 1000c
    20 @ 1000c
    20 @ 1000c
    20 @ 1000c
    20 @ 1000c
    1 @ 5500000c
    1 @ 5500000c
    1 @ 5500000c

    Weighted mean for this group is waaaaay too ridiculous, 141986.5c. The standard deviation is enormous, 872923.1

    But look at the z-scores (a z-score is number of standard deviations from the mean, a measure of how far out of range from the mean a given data point is)
    1 @ 418c (418-141986.5)/872923.1 -0.162
    12 @ 923 (923-141986.5)/872923.1 -0.161
    20 @ 1000 (1000-141986.5)/872923.1 -0.162
    1 @ 5500000 (5500000-141986.5)/872923.1 +6.138

    For a 95% confidence, any z-score less than -1.65 or greater than +1.65 must be considered suspect.

    So, after dropping the outlier entries from our obvious manipulator:

    1 @ 418c
    13 @ 923c
    20 @ 1000c
    20 @ 1000c
    20 @ 1000c
    20 @ 1000c
    20 @ 1000c

    mean = 985c stdev = 60c

    Looks like a pretty good stab at what the asking price should be, yes?

    Auctioneer is already doing something close to this, but using medians and quartiles. Personally, I'd go with a better statistical method, but it isn't a terrible model as-is.
World of Warcraft™ and Blizzard Entertainment™ are trademarks or registered trademarks of Blizzard Entertainment, Inc. in the U.S. and/or other countries.