Auctioneer Scanner Technical discussion: 'unique' IDs for entries in the image
  • Looking into what may be slowing the scanner down for large AuctionHouses, I've started looking at the ID we allocate to each auction.

    The ID - accessed as data[Const.ID] - is an Auctioneer specific 'unique' ID for each entry in the image. (Not to be confused with itemID!)

    To make them unique we essentially assemble a table containing all existing IDs, then when we want to allocate a new ID we hunt it for a gap.

    I'll post the code later, but it has a couple of flaws:
    It is pretty inefficient for large images, as it uses tremove to remove entries from the start of a very large table. This behaviour has been found to be the underlying cause of Stage 4 slowdowns, and I suspect this may be the main cause of massive freezes (or even game crashes) at the start of Stage 3.
    A given ID is only guaranteed to be unique in the current session. If an auction is deleted in the current session, its ID may be reused in the next session.

    Looking further, I don't believe that we actually use the ID anywhere.

    Personally I think we should just remove the ID entry.
    On the other hand, if people think there is a use for it, we should rewite the functions that allocate IDs to fix the noted flaws.

    Any comments?

    [edit]
    To clarify, the problems with tremove in Stage 4 were being worked on in a separate issue: http://jira.norganna.org/browse/ADV-649 - already being tested in r5453.

    but that fix pointed at something similar being a problem in this specific part of Stage 3 as well.
  • For reference, here's the current code:

    local idLists = {}
    function private.BuildIDList(scandata, serverKey)
    local idList = idLists[serverKey]
    if idList then return idList end
    idList = {0} -- dummy entry ensures that list is never empty and that counting starts from 1
    idLists[serverKey] = idList
    local image = scandata.image
    for i = 1, #image do
    tinsert(idList, image[i][Const.ID])
    end
    table.sort(idList)
    return idList
    end

    function private.GetNextID(idList)
    local nextId = idList[1] + 1
    local second = idList[2]
    while second == nextId do
    nextId = second + 1
    tremove(idList, 1)
    second = idList[2]
    end
    idList[1] = nextId
    return nextId
    end


    BuildIDList is called at the start of Processing, GetNextID is called for each 'New' auction.
  • Do we really care that they're sequential? If not, why don't we just take a guaranteed-unused key without caring what it is? In fact there are a lot of potentially-hazardous performance characteristics associated with this due to reallocation when messing around with holes in sparse arrays, so we may be better off just having a unique ID that we don't make any further guarantees about. For example we could,

    a. Have an ever-increasing key that resets to 0 on restart, but never revisits an old ID, even if it's removed, or
    b. Use #idLists+1 to determine the next available key (#idLists+1 is always guaranteed to be nil).

    The benefit of (a) is that it is, in the most general sense, very fast. The downside is that it creates a potentially sparse array that will never correct itself, resulting in a more likely use of hash tables over arrays internally.

    The benefit of (b) is that it might fill in holes periodically (no guarantees), resulting in the fastest possible indexing as a table with no holes is guaranteed to be an array. The downside is that a reallocation event might occur while filling in a hole, resulting in a sudden shift in where all this data is stored (performance penalty). But I think this should be rare.
  • Just realized that the above never got posted; sorry. Posted it now.
  • The purpose appears to be so that a module can save the ID to recognise a specific auction if it sees it again, even if the auction has changed (bid price, seller name filled in, etc.). Or possibly to hunt the image for an auction by its ID.

    The IDs get saved in ScanData across sessions. However the potential reuse of expired IDs means it is probably not useful for a module to keep records of IDs across sessions.

    I think originally there was just a counter with no reset, but this was changed at some point. The principle appears to be to allocate an ID that is not currently in use, but that doesn't get longer and longer in our save file.

    It doesn't need to be sequential (or even a number, but numbers are simplest), just needs to be not 'currently' allocated to another entry - with the definition of 'currently' being the sticky point.

    If we are only going to guarantee uniqueness in the current session, we should probably not store the IDs in the save file. Strip them out at save time and allocate a fresh set at load time (or possibly even only when the entry is accessed).

    If we want to save the IDs to be used across sessions, we should provide some guarantee such that a module knows when to discard any IDs it has saved. Preferably in some way that doesn't involve us hunting through lists of allocated IDs to find free ones, or without having to save separate lists of expired-but-recently-used IDs.

    Or again, do we really need this entry at all?
  • Now looking at just removing this field from the auction entry table.

    Initially will just set the field to 0, and remove all other code that modifies it in the scanner.

    At a later day will update ScanData to strip out this entry. There are several other fields that could possibly be removed, and we should look at doing them all at the same time: http://jira.norganna.org/browse/SDAT-19
Start a New Discussion

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

In this Discussion

privacy
norganna's addons network · tf2 warehouse · scrap warehouse · auctioneer addon · gatherer addon · addon forums · rdrct