Duel of Wits Simulation

Fuseboy · December 27, 2012, 3:39pm

I’ve programmed a simulator to try to compare the effectiveness of various DoW scripts.

The core of it is an implementation of the DoW mechanics. The mechanics are actually quite complicated, and so after blundering about for a bit, I realized I would never have any confidence that I’d got them correctly unless I could unit test it all properly, which led two a two-stage process - when the simulator reveals two competitors’ actions, it produces a “resolution plan”.

Here is the resolution plan for Andy’s Point vs. Joe’s Avoid:


[com.trilemma.bw.scripting.VsTest@32f88377[
  tester=Andy
  testerPool=4D
  target=Joe
  targetPool=5D
  meet=[]
  exceed=[Joe loses MoS dispo]
  fail=[]
]]

Andy rolls 4D against Joe’s 5D, who will lose dispo equal to Andy’s MoS.

Here is the (more complicated) resolution plan for Andy’s Dismiss vs. Joe’s Rebuttal:


[com.trilemma.bw.scripting.StdTest@7b3aa36a[
  tester=Joe
  testerPool=2D
  target=Andy
  ob=0
  meet=[]
  exceed=[Andy loses MoS dispo]
  fail=[]
], com.trilemma.bw.scripting.VsTest@7166c37e[
  tester=Andy
  testerPool=6D
  target=Joe
  targetPool=3D

  meet=[]
  exceed=[Joe loses MoS dispo]
  fail=[]
], com.trilemma.bw.scripting.HesitateNext@2b76fbc2[
  target=Andy
]]

There are three effects happening independently - Andy’s 4D+2D Dismiss against Joe’s 3D defensive rebuttal; Joe’s offensive 2D against Ob 0; Andy hesitating next round whatever else happens.

Joe’s Obfuscate vs. Andy’s Rebuttal is also fun:


[com.trilemma.bw.scripting.VsTest@68b0019f[
  tester=Joe
  testerPool=5D
  target=Andy
  targetPool=2D
  meet=[]
  exceed=[Andy +1 Ob]
  fail=[Andy +1D,
    com.trilemma.bw.scripting.StdTest@7166c37e[
      ob=0
      tester=Andy
      target=Joe
      testerPool=2D
      meet=[]
      exceed=[Joe loses MoS dispo]
      fail=[]
    ]]
]]

Joe tests 5D vs. Andy’s defensive 2D. If Joe succeeds, Andy’s at +1 Ob to his next action. If Joe fails, Andy’s at +1D next volley. In addition, Andy gets to resolve his offensive dice, 2D vs. Ob 0, with Joe losing Mos dispo.

Fuseboy · December 27, 2012, 4:03pm

Once the resolution plan is worked out, it’s applied, which is a much simpler process.

This all happens in the context of a cage match between scripting strategies - an approach for deciding what to script. This is where the simulation is (so far) very naive - the scripting strategies are very simple. The ones I have in play are:

[ul]
[li]Point x3[/li][li]Dismiss x3[/li][li]Random Action[/li][li]Point or Rebuttal, randomly, with rebuttals half offensive/half defensive[/li][li]So-called AdaptiveRebut, which scripts Point x3 against higher-skill or equally matched opponents. Against lower-skill opponents, it scripts Rebuttals, with 1D defensive unless the skill advantage is 3D or more, in which case it’s a 50/50 split.[/li][/ul]

I instantiate three of each, with 4D, 6D, and 8D skill, respectively. (All have 11 dispo for the moment.) I then toss them all into a big cage match, where each fights every other competitor (including itself) 5000 times.

I’ve also got a very basic genetic approach going, where a random scripter is allowed to evolve weights that indicate relative preferences for the actions. For example:

Dismiss: 48.6% of the time
Point: 23.2%
Obfuscate: 11.8%
Rebut: 11.1%
Avoid: 3.2%
Feint: 1.0%
Hesitate: 1.0%

(For now, I’m only doing this evolving among skill-peers, and the result seems to be that there’s an overwhelming pressure for conformity, while the actual set of preferences can vary wildly - just so long as you’re not that different from your peers. I’ll get more sophisticated at a later date.)

Fuseboy · December 27, 2012, 4:21pm

The whole thing is naive in a few ways, some of which I’ve mentioned:

[ul]
[li]The scripting strategies are very simple - only one of them takes into account skill disparities, and none take into account the remaining disposition.[/li][li]There’s only one skill. A 6D competitor is assumed to be able to muster 6D between skill, stat and forks, regardless of the action.[/li][/ul]

Nevertheless, some early tentative opinions:

If you’re totally outclassed, Dismiss is a good choice. The Dismiss x3 script leads the pack of 4D contenders.

Point seems to be better than Rebuttal at minimizing compromise. Rebuttal is theoretically good at soaking up attacks, particularly if the enemy is inferior, but there are a lot of caveats:

Rebuttal is totally vulnerable to Feint. Against Obfuscate, Rebuttal rarely does any damage since you only test defensive dice in the versus test. Rebuttal is also double-penalized by Ob penalties (since you apply the Ob increase to both pools). I also suspect a slight discounting effect. If it takes me an extra volley to come out 0.12 dispo ahead of just scripting point, that may not be enough to make it worthwhile - the future is uncertain.

Paul_B · December 27, 2012, 4:32pm

Are you testing for various compromise thresholds as well? And how are you accounting for variations in Dispo pools?

I may not want to know the answers, as my head is already melting trying to parse all this out.

noclue · December 27, 2012, 5:49pm

This looks very cool!

One question, why is Joe’s rebuttal offense against an ob2?

Fuseboy · December 27, 2012, 6:29pm

Typo, my bad. It’s Ob 0.

So, there are lots of ways to look at the ‘best’ script, but I’m currently sorting on the basis of average remaining disposition. As for dispo variation, I’m treating disposition as an input. The scripting strategies don’t need to script until after the dispo is rolled.

For the moment, there are no strategies that take dispo into account, but it’s available - it’s possible to write a strategy that takes remaining dispo (and I assume this is vital for a truly great scripting strategy).

For these naive strategies of the sort I have now, dispo differences aren’t that interesting: they simply add linearly to the final result. It’s a bit like having a 100m race, but forcing one competitor to run further than another - the interesting thing is the speed they can achieve. For less naive strategies, I assume it will matter. Dismiss is an easier choice if there’s no next volley to hesitate on, for example.

noclue · December 27, 2012, 6:38pm

2nd question: is Andy throwing a Dismiss or a Rebuttal in the final example?

Fuseboy · December 27, 2012, 7:03pm

Quite right - it was a Rebuttal.

luke · December 27, 2012, 10:54pm

Just to be clear: is the Dismiss x3 hesitating after V1 and going again on V3 (presuming it doesn’t win outright on V1)?

Fuseboy · December 27, 2012, 11:10pm

Some results! This table shows the results of a cage match. All competitors started with 11 dispo, but had skills of either 4D, 6D or 8D. 5000 matches were run between every pair of competitors, and the results were tallied.

The results have been rolled up by strategy and skill advantage. The first row, for example, shows that in fights where you have a 4D advantage, you can expect to win with 5.1 points of dispo left. If you choose the ‘PointRebutScripter’ strategy, you’ll do slightly better with 5.9 dispo left.

Findings (which hold only if these guys are your opponents):

[ul]
[li]Generally, scripting Point x3 is the best way to win with minimal compromise.[/li][li]If you have a 4D advantage over your opponent, you can in a few 50/50 rebuttals for a very slight edge.[/li][li]If you’re at a disadvantage, script all Dismisses. Kick him in the nuts![/li][/ul]

Strategy	Avg. Dispo Left
4D Advantage	5.1
PointRebutScripter	5.9
PointScripter	5.7
AdaptiveRebutScripter	5.6
A:03.4% D:25.6% F:12.8% H:01.5% O:06.4% P:22.5% R:27.9%	4.7
DismissScripter	4.5
RandomScripter	4.2
2D Advantage	2.9
PointScripter	3.9
PointRebutScripter	3.6
DismissScripter	2.9
A:03.4% D:25.6% F:12.8% H:01.5% O:06.4% P:22.5% R:27.9%	2.8
RandomScripter	2.3
AdaptiveRebutScripter	2.0
No Advantage	1.5
PointScripter	1.9
AdaptiveRebutScripter	1.9
DismissScripter	1.7
PointRebutScripter	1.3
A:03.4% D:25.6% F:12.8% H:01.5% O:06.4% P:22.5% R:27.9%	1.2
RandomScripter	0.9
2D Disadvantage	0.7
DismissScripter	0.9
RandomScripter	0.7
PointScripter	0.7
AdaptiveRebutScripter	0.7
A:03.4% D:25.6% F:12.8% H:01.5% O:06.4% P:22.5% R:27.9%	0.7
PointRebutScripter	0.3
4D Disadvantage	0.2
DismissScripter	0.4
RandomScripter	0.3
A:03.4% D:25.6% F:12.8% H:01.5% O:06.4% P:22.5% R:27.9%	0.2
PointScripter	0.2
AdaptiveRebutScripter	0.2
PointRebutScripter	0.0
Grand Total	1.9

Fuseboy · December 27, 2012, 11:13pm

Yes. With this cohort, any [skill] disadvantage means a near-certain loss, so you claw out his eyes with a V1 Dismiss and cut your losses.

luke · December 27, 2012, 11:25pm

So the best strategy is P/P/P and hope your opponent runs out of dispo before you’re done?

Fuseboy · December 27, 2012, 11:36pm

In this group of kindergarteners, yes, unless you’re at a skill disadvantage, in which case Dismiss/Dismiss/Dismiss gives you a very slight edge. Actually, now that I poke at the data a little more, it looks like the Dismiss x3 was looking good mostly because it did better against the real numbskulls like random scripting. When I pull two of the worst strategies, Dismiss x3 doesn’t look good until you’re at a 4D disadvantage.

(This is a general problem of evaluating scripts this way, you can only really tell how good you are against the rest of the contenders. If the field is full of idiots, the clear winner may not be any good against anyone else.

I’m trying to figure out the best way to let others submit scripting strategies.

I can easily hard-code a static script of any length (e.g. Rebuttal, Ofuscate, Dismiss), so if you have one, please toss it in!

(I just noticed I haven’t implemented Incite.)

SeaWyrm · December 28, 2012, 2:36am

I’m hoping the next stage of this experiment will involve training neural networks.

Durand_Durand · December 28, 2012, 5:39am

We’ve found Dismiss a strong finisher when you’re opponent is on the ropes. So trying a script that attacks with Dismiss when the opponents disp is low at the start of the exchange would be worth a shot.

Fuseboy · December 28, 2012, 8:52am

Yes, I’ll throw in a contender like that. Perhaps something that estimates what the dispos will be after each action, and throws in a Dismiss whenever a) that would end the battle or b) the opponent is going to win.

Yes, I could! I’ve used the Weka library, though that’s entirely oriented around training classifiers - you train the neural networks by feeding them vast quantities of training data, rather than by having them evolve genetically. So it will take a bit of thought on how to construct a healthy diet.

In general, my goal is to try to reveal some rules of thumb that humans can use, rather than to produce a good AI competitor, but perhaps they would be a useful stepping stone.

Deliverator · December 28, 2012, 8:56am

Worth noting, though perhaps tough to account for: in actual practice the number of dice you roll is going to vary from action to action, even if you script, say, P-P-P or R-R-R, because of fluctuating ForKs and Help. Plus many Duelists don’t even get access to all the DoW actions (for instance people relying on Persuade), which does change the strategy significantly.

Matt

Fuseboy · December 28, 2012, 9:45am

Well, we have a new champion. Point x3, except on the volley where you suspect that either you or your opponent will be reduced to 1 Dispo (since 2D->1s on average), and script Dismiss in that volley. This has a tiny edge over Point x 3.

Paul_B · December 28, 2012, 11:17am

Right, this is why I’m super curious about taking various degrees of compromise into consideration.

I do think at the very least, this is a good mathy reminder as to why compromises have to hurt.

Fuseboy · December 28, 2012, 11:46am

Paul, can you explain what you mean about taking various degrees of compromise into consideration?