Simulation Improvements


#1

I’ve been thinking through the issues that seem to appear on the simulation board and had some ideas for a way to possibly improve the matchups.

First, here is my current understanding of how matchups are formed:

  1. A player hits the “Simulate” button
  2. That player is assigned a match where one player from each side is selected at random.
    a. The selection is waited heavily toward newer code.
    b. The selection is made by selecting a random time then finding the newest code that is older than that time.
  3. A random seed is created based on the players involved and the time they submitted. This same seed is used every time these two players play against each other until one of them resubmits.

Issues that arise from this are:

  1. Some players having very few simulations if another player happens to submit soon after they did.
  2. New code is often matched against other new code, so a player ends up playing against the same player repeatedly.
  3. Repeatedly playing the same player results in a skewing of the results (positive if you win all the matches, negative if you lose them all)

Anyone that knows better, feel free to correct my misconceptions.

Here is my idea for a smoother simulation process

  1. A pair of queues are created for all matches to be simulated
    a. Priority Queue
    b. Secondary Queue
  • When a player submits their code a set of matches are added to each queue.
    a. 20 matches added to priority Queue - These are the top 20 players on the other side.
    b. 180 matches added to the secondary queue - These are the players ranked 21-200.
    c. Possibly delete any incomplete matches involving this player from the queues.
  1. When a player hits the “Simulate” button they are assigned a match from the priority queue. If the priority queue is empty, then a match is assigned from the secondary queue. If the secondary queue is also empty then a random match is simulated amongst players that are not in the top 200.

Improvements

  1. Every player should fairly quickly see results from their submission.
  2. Top players would constantly be getting feedback against new code.
  3. Matchups would rarely repeat

Concerns

  1. How big would the storage be for these queues? Would it become unmanageable over time? (Maybe we would need to clear the queue every once in a while, not sure how fast it would fill compared to how fast it would empty)
  2. Could the queues easily be purged of matchups with the current player (when a player has resubmitted)

What do you all think? @nick would this be a viable solution?


Possible Error for Ranked Ladder (Dungeon Arena)
Zero sum - extremely slow simulation
#2

Some thing I would like to suggest is that if your code is older it takes priority over the newer stuff because these people have to resubmit it (ME) and everyone’s code will get simulated

And there will be world peace(I wish)


#3

First your “Improvement #2”: Personally, I don’t care if the top people get their code run more often than anyone else, so long as it is fair to all (see fluidity (in the hidden section)). I do agree with the need for all code to be “tested” against the “hard-foes” but that is for the submitter’s benefit not for that of the “top people”.

Improvement #1 is sound :smile: and #3 seems likely (more entries should mean more likely).

My take on your “concerns”: The number of guaranteed simulations (submit --> +200 matches) greatly effects the “storage” and removal resource requirements. (maybe guaranteeing say 20 matches, 1 from each group of ten in the top 200, would lower the resource use and still fulfill the desire for good opponents to score against.)

On the proposed “200” line. As long as we have less than 200 entries (per side) then everyone gets a fresh run for every “submit”. Yeah!! all are happy, but there are no non-queued runs, and everybody is unhappy (except for the system $$s and resources) until they realize that there is no point in rerunning matches already decided, at which point Everybody should be Happy!! :heart_eyes:
(iow: If the seed doesn’t change, neither will the outcome.)

Stuff that only matters if we make it over 200 entries on a side

I have always been suspicious of the Bayesian algos used for ranking competitions (they do have issues, if only it were a perfect world)). So . . . how CoCo’s algo would be effected by this set-up.

Assuming the proposed set-up (yours) and that there are (many) more than 200 entries (currently only 107 “red” and 79 “blue” (on day three of 10)):

Fluidity:

  • How easy/hard is it to make it into (and out of) the top 200?
    (I’m not sure who gets more simulations? Those in the top 200 who get only one per other’s submit, or, those @202+ who get a diminishing %chance per entry but get that chance for every non-queued simulation…???..)
  • For example:
    How hard is it for #202, who only plays people ranked lower to get into the top 200?
    (I say 202, because I assume that #201 gets a simulation when #1-200 submit + the random % for non-queued simulations)

(there are those who would say that the last point is irrelevant, as they feel that if you aren’t upgrading your code, you don’t deserve to move up…)

Speaking of the “current issues” (#'ed 1-3), you addressed #1 directly, and #'s 2 & 3 indirectly via the idea that total randomness -should- remove those issues (and higher entry numbers should increasingly do that).


#4

i would suggest not to split in gorubs of 20 extremes and 180 rest. wouldnt it be better u first check the code in primary queue for the players “playerLevel + 20” so players who are nearly the same like his code?


#5

I agree with this. This site is more about teaching and learning than about catering to the most competitive top players. Rather than just playerLevel + 20, shouldn’t it be PlayerLevel +/- 20?


#6

@vlevo I agree that the idea of having the top 20 hit more often isn’t a big deal. My biggest reason for choosing top 20 was that would require very little calculation to get them. We could instead pick a random member of the top 10, then every 10th member would be added to the list after that, with the others added to the secondary queue.

Thinking through the math of it, if there are a total of 200 in each board and each of those 200 submits once a day, then 80000 matches would be generated each day. I wouldn’t think that those 80000 matches would take up much storage space. Of course with older ladders there are more people involved (Cavern Survival has over 12000 entries) though I’m sure much of the code is old and untouched for weeks if not months and only a few submissions are made each day.

I’ll admit that I don’t really understand the Bayesian algorithm for calculating ranks, so I don’t know how this would affect ranking calculations.


#7

Actually, this idea of “only” playing against the people near the old rank is one of the things I have against the way people run ranking systems (elo/Bayesian/etc).

  1. it is new code . . . it could be worse or light years better than the old code, to saddle it with matches at the old level doesn’t do anyone any good. Yes, when playing chess, people do usually improve gradually, but coding has a better chance of instant great improvement (fix a tiny bug, say ‘>’ vs ‘<’ and BAM!!) (or, of course, the opposite: great failure).

  2. winning against someone lower than you lowers your score. (I don’t know if coco suffers from this?)
    (only two (related) reasons for such a system . . . “so the high and mighty never have to sully themselves by having contact with the teaming masses” and “I had to scratch and claw my way to the top, so $#%, you will too”. Such a system creates or at least reinforces the “play with your own kind” idea.)

I wrote new code in one of the ladder games (here at coco) which (with only small changes) can beat every one of the top ten of the opponents. yet my game languishes near the old rank… (with few matches, but all wins and yes, with the current system most of the problem is the few matches, since I believe there is nothing about rank in opponent selection.(although, if opponent score effects my score then we are back with the “play own kind” problem even if the winner doesn’t go down since playing lower (or even equal) people probably doesn’t raise your score very much, see below.)

[edit: a day later] Oops, my bad. My code isn’t languishing near the old rank, it is below it.

Putting this in as an edit so as to not detract from fuzzicallogic’s reply.[/edit]

I just quickly looked up the idea behind the Bayesian/Elo type ranking algos, and it (very simply put) is based on expectation of winning. Since the lowly person has no chance of beating high person, when it actually happens there is a large change in score, lower the score difference lower the change and from the opposite side, if you are expected to win doing so makes little change in either score.

– My quick understanding –
So the effect, of the “playing top people” proposals, would be to yank you up to the top if you (probably) belong there or leave you around whatever score you were given as your “start” (your old score), after which the random matches shuffle you around to your new (not near top) rank. This seems like it would be good for those who make big improvements and has essentially no change (vs current system) for those who don’t. The “quick overview understanding” isn’t good enough to discuss what happens to those top guys you beat (iow, how far they fall vs the high people you didn’t play). For an understanding of that we’d need someone like @schmatz who (I believe) wrote coco’s ranking system.


#8

I’ve been thinking on this quite a bit, as well. It should be understood that the ranking system and matchmaking systems should not be inherently tied except for events in which ranking should be considered. That’s the way all competitive forums with ranking work and should work. This is why tournaments have different formats and differing reliance (but similar effect) upon ranking. Regarding this, there are some things that should be obviously avoided.

This should never happen. In contrast, losing against higher ranked players should never raise your score. If the ranking system does this, it should be reevaluated as the system is seriously flawed.

This should also never happen. If the code is superior, it should be ranked superiorly, period. Since that is happening, something obviously needs to change. Either a) the ranking system is flawed or b) the matchmaking is not providing opportunities for ranking improvements.

The Player in CodeCombat Ladders

Unlike most other competitive forums, the human player is not actually the competitor. It is actually the code. This means that every time the code is modified, it is a different player and therefore has no ranking. The human player is more like a coach for the code. The immediate goal is to get an approximate rank, so that they may get an accurate rank for weighing against all other players. This is because provisional ranks are highly volatile and detrimental to any ranking process. An additional consideration is that entry is automatic and unconditional, unlike other forums. This means that accurate ranking holds priority above all else.

On Player Ranking

The goal of the rank is to place the player where they are most likely to have a 50/50 win-loss ratio. The closer to this ratio, the more accurate the rank. This is for the benefit of the competitive community, not the player. This allows for faster (and more accurate) ranking of provisional players. It also provides feedback to the player (in this case coach) about their current capabilities and strategies. It should be noted that the ranking of provisional players is the most important of these two as there is little useful feedback if the ranks cannot be accurate.

On Provisional Matchmaking

Provisional Matchmaking occurs in any game system when there is currently no way to accurately gauge the rank of the player (code). In these cases, provisional players should be matched against a wide variety of ranked player skill levels before calculating rank. Essentially a good provisional matchmaking system climbs the ladder quickly until the first loss and then climbs more slowly with each loss. Once the losses balance with wins then normal ranking and matchmaking can occur. In other words, provisional players play against anyone. Matches with a provisional player do not re-rank ranked members. Their only purpose is to find the approximate rank of the provisional player so that regular ranking can occur.

On Matchmaking Modified Code

When new code is written, the goal is to rank that new code. Basing it upon the old rank would actually save time… however, that is only true if the matchmaker uses that information correctly. If it performs similarly to the old code, then normal ranking can continue. 40-60% win rate is appropriate as if ranking is accurate, this is what ranking actually leads to. If the new code is significantly more (or less) skilled, then the old rank is no longer fitting for the code and it should be provisionally ranked as quickly as possible.

Responding to Key Points:

Without the points above, you are correct. Using the points above, this could significantly improve things.

Taking into account the above: Your new code should be re-provisionalized because the old rank is obviously inaccurate. And accurate ranking should be a priority.

The reason for this is because otherwise rank can never be determined to be accurate. Anyone with a higher rank is expected to beat you. If the ranking system is viable, this means that there is never a 50/50 win-loss ratio. This would lead to all players becoming provisional.

This would be far more productive for quickly removing one’s own provisional rank. But once provisional ranking has occured, this is virtually useless as it will only reflect what the ranking system already knows. Once the provisional ranking occurs, the next step would be to play and rank as many other provisional players as possible. Once there are few or none [0-10%] provisional players, then matching ranked players against ranked players can and should occur,

Priority Queue should probably be to rank provisional players and secondary queue should be for those that have been ranked. Of course, there might be benefit for other queues and some load balancing, but again the purpose of the rank is to accurately reflect the skill of all players. So reducing the number of provisional players is much more important. Once the ranks are approximately accurate, revising the ranks is a simple matter of proximal pairing.

Agreed that this is a current problem. New code should rarely play new code, if possible. Otherwise neither player can ever be properly ranked.

#1 Their submission isn’t as important as their ranking. Their ranking is only important if it is accurate (i.e. few provisional players when regarding the conditions met above) Top players don’t matter as much as provisional players… #2:The top player already has their rank well established and should not be modifying their code often and only in little bits. #3 Absolutely true.

It should be very little calculation to get any ranked player that you want, once the system knows everyone’s accurate rank.

This is only true if the players are in control of their own matchmaking. Since there are no conditions upon entry and matchmaking, this should not be a concern.

I don’t know if this helps with perspective at all, but that was the goal.


#9

I just had this happen to me I got matched against myself on treasure grove(with the exact same code) and loss on my human side and won on my orge side


#10

You guys are so smart, I love it. I think all of this is even more needlessly overengineered as our current matchmaking system, though. There is enough of a surplus of ranking power that if we just had an even distribution of random matches, things would work great. Problem is that we don’t have an even distribution because we don’t currently have a fast way to query for a random session to play against.

This is the code that picks a random session to simulate a game for:

findRandomSession = (queryParams, callback) ->
  # We pick a random submitDate between the first submit date for the level and now, then do a $lt fetch to find a session to simulate.
  # We bias it towards recently submitted sessions.
  queryParams.submitted = true
  findEarliestSubmission queryParams, (err, startDate) ->
    return callback err, null unless startDate
    now = new Date()
    interval = now - startDate
    cutoff = new Date now - Math.pow(Math.random(), 2) * interval
    queryParams.submitDate = $gte: startDate, $lt: cutoff
    selection = 'team totalScore transpiledCode submittedCodeLanguage teamSpells levelID creatorName creator submitDate'
    LevelSession.findOne(queryParams).sort(submitDate: -1).select(selection).lean().exec (err, session) ->
      return callback err if err
      callback null, session

We find a random human session and a random ogre session and then they fight. The main problem is that the distribution is uneven: if someone submits one second after you submit, you are unlikely to get any games simulated, since your slice of the random submitDate timeline is very small. Meanwhile, if then it takes hours for anyone to submit after that other player, then a ton of people are going to be playing against her over and over as her game has a high likelihood of being picked.

We used to sort by -submitDate and then pick a random offset from 0 to 200, but it proved to be really slow, so we switched to doing it this way. @schmatz weren’t you saying that a Redis set gives nice and fast random elements like this? What’s a good way to do this just using our current tools of Node and Mongo?


#11

We can build similar in-memory data structures to support such operations in our app servers. We could preload the IDs of sessions to simulate into some array and randomly pick among them and refresh at some regular interval.


#13

Been 20 months since this was last mentioned, has anything ever happened with improving matchmaking? Seems like players that got a very high rank a long time ago don’t get their games reranked against newer code, so they stay at the top even though newer code could beat them. I’d love to take a crack at improving this, but I’m not sure what needs to be done still.