Comments on: Lies About Honesty

By: Benquo

Benquo — Mon, 07 May 2018 03:32:34 +0000

In reply to Benquo. As an aside, I should note that a linear extrapolation from my trajectory is not all that informative given my age.

By: Benquo

Benquo — Mon, 07 May 2018 03:11:18 +0000

In reply to Paul Christiano.

Refusing to make peace is more likely to lead to constant war than getting your way.

Constant war is literally a large portion of what US tax dollars pay for.

By: Benquo

Benquo — Mon, 07 May 2018 03:08:35 +0000

In reply to Paul Christiano. You're conflating the question of whether one should be nice and normal to the people around you and cooperate in existing areas where you usually see C-C with the question of whether one should be honest to such people. These are only the same in circumstances where the expectation is honesty. In practice you will rarely be blamed for lying in circumstances where other people usually lie, and you will often be blamed for telling the truth in such circumstances. You wrote about a comparatively unpoliticized case of this in If we can’t lie to others, we will lie to ourselves.

By: Benquo

Benquo — Mon, 07 May 2018 03:00:52 +0000

You should not expect people to be totally or even mostly honest in public, barring strong institutional forces to the contrary, or extreme insulation from the relevant incentives and selection pressure; it reliably loses coalitional resources under most conditions relative to other strategies, is read as outright hostile when it exposes information about others, and about oneself it exposes attack surface for scapegoating. This includes communication about how honest they are.

I like to think of myself as pretty honest in public – though it never would have occurred to me that a claim of anything like 100% would be believable – but I am unusual and basically everyone with any discernment at all who has spent some time with me me notices this. About a third of my culture were murdered within living memory, my family has been economically downwardly mobile and reproducing below replacement rate for three successive generations, and extrapolating from my current trajectory I will probably die without issue.

The problem with things claiming to be 100% honest cannot simply be that they’re lying – that doesn’t differentiate them from anything else, and any claim to be less than 100% honest is attack surface for delegitimization. The problem is with marketing strategies that disproportionately extract resources from the minority of people who basically expect others to be honest in public, on the basis of claims that no savvy person would take literally. The correct defense is to clue people in where you can.

Robert Wiblin correctly identified this minority with a mental illness first defined by an enthusiastic [political affiliation omitted] doctor. I’m not going to go into detail on that because I’m just not up for the fight. You should assume that I’m just not up for the fight about a lot of things, especially given the number of things I *am* up for the fight about. You should assume that for the most part I’m only saying things that I think will move social capital to me or the people I care about. If you want to understand my agenda you’ll have to look to my revealed preferences, or at least find a way to talk with me in private, though the latter is a much less reliable method.

By: Paul Christiano

Paul Christiano — Tue, 23 Jan 2018 17:30:17 +0000

In reply to Ziz.

Taking my rule literally would suggest you drive straight in chicken (modulo normal decision theory confusion and who is the first mover and so on), since you’d prefer the other person expect that you are going to drive straight.

The government would exist unchanged if I was expected to be unwilling to pay my taxes, or even if all people remotely like me were expected to be unwilling to pay their taxes. I don’t remotely believe that the counterfactual is “we have a government that I’m happier about paying taxes to.” Refusing to make peace is more likely to lead to constant war than getting your way.

The considerations I list aren’t independent, they are anticorrelated, since they correspond to the different ways in which someone might form their beliefs about what I will do (e.g. in cases where someone is reading my expression they aren’t relying as much on past experiences with me; if they are reasoning about my algorithm they aren’t relying as much on my reputation; etc.). For my argument, that’s better than being statistically independent. Nevertheless I agree they don’t all add up to 100%, and the considerations in sections IV and V don’t always push you to 100%.

“100%” and “here is what I do” are not at odds with being an approximation. 100% is the approximation, I give a bunch of reasons why it’s not a crazy approximation and why using the approximation is reasonable. I do explicitly give simple examples where the truth is far from 100% and obviously I see others, and I explicitly say it’s a simple approximation that falls back to UDT in complicated cases. I give several reasons why UDT ends up a lot closer to straightforwardness than you’d think a priori, which I do believe.

In particular I agree there are lots of cases where the benefits go to people only a little bit like me and in those cases my usual level of altruism would only get up to like 1-10% rather than 100%, and other examples where the benefits go to people who I’d actively want to make suffer.

When you say: “I advise keeping track of which reference class you’re poisoning and whether it’s worth it, by doing the thing that a predictor doesn’t want you to do. If you hide Anne Frank, you are making things harder for Nazi and Nazi-occupied civilians. Some of this may be a positive. There is an equilibrium between the Nazi civilians and their government, the cost may be somewhat passed on.” I agree that’s a more accurate algorithm, that you should keep track of who you are benefiting how much and how much you care about those benefits (and then apply the corrections in sections IV and V if applicable). Of course more accurate still is just to do the entire decision-theoretic calculation.

I often encounter the view that the world is consistently trying to crush sensible optimization. I can agree there is a little bit of that, but it seems pretty small compared to the world being concerned (apparently correctly?) that optimizers can’t be cooperated with. It would be great to see more evidence of the crushing. Mostly I think that crushing ascribes way too much agency to the broader world, which is mostly stumbling along.

I think you underestimate the need and feasibility of being predictable by the normal fuzzier processes in the world, I think you overestimate the likely gains from this particular kind of defection (of violating peoples’ expectations of you in cases where you are glad they had that expectation), I think you underestimate the collateral damage from people being unable to tell how you are going to behave (e.g. if I ever had to count on your behavior I wouldn’t be too surprised if you get tricky and decide that I don’t really know you and so you should just screw me over), and I think you underestimate the fraction of important interactions that are repeated or involve reputation impacts or so on.

But I do agree that my heuristic is just a heuristic and that my post caves somewhat to the temptation to oversimplify in the interests of looking better.

By: Ziz

Ziz — Tue, 23 Jan 2018 10:51:21 +0000

In reply to Paul Christiano.

I think it’s fake because you said “100%”. Also, “Aspire” and “pretend” are language which strongly suggest fakeness.

Although on re-read, I’m confused about something:

There is alternating between speaking as though this is 100% or not. (Early, you say “I am”, then “approximate”, then you say “100%”. Later, “I agree that this is a low-energy approximation. At sufficiently high energies this heuristic isn’t even coherent and we eventually need to fall back to UDT”.)

This had a motte and bailey effect on me the first time I read it, whether or not intended.

Why “100%” is fake:

The considerations you list are not statistically independent in when they apply. You can find places where none apply. If you were searching without a certain flinch I think you’d have found them.

Do you have subjunctive dependence in your dealing with muggles or not? If not, arguably it’s not 100%.

I brought up taxes because my prototypical advanced decision theory action is “don’t negotiate with terrorists”.

You are paying tribute to farmers. And some other stuff.

What happens if enough people don’t pay taxes is that the timeline is collapsed, and the laws were always such that taxes were not that.

Societies are made of webs of stable states of who wins games of chicken. If your will is broken and you flinch, you see driving straight in chicken as clearly pointless counterproductive suicide.

Maybe not paying taxes is pointless counterproductive suicide. (Maybe.)

But it’s dependent on actual details of subjunctive dependence. Which is something your alleged algorithm seems to staunchly exclude details of, filling in only a bunch of reasons to move your algorithm toward “subjunctive dependence always exists”.

The Anne Frank bug is not a weird edge case.

That’s your whole society wanting to squash optimizers with your values. And it will do it if it knows well enough to coordinate on that you are unbroken like that.

For this reason, everyone must lie to the god that Superman serves. When Superman chases away Brainiac, and defeats every “villain who would take over the world”, he hands the future to whatever unfriendly AI that the current order of the world builds.

This can be like original sin and a break your mind, but it doesn’t have to.

Don’t be merely a reliable cog that can never defect against the existing order, and can never drive straight in chicken, and will therefore be eaten by those who do. That’s straw-rationality.

Yes, Superman’s god is holding things together against worse, but like software designed to be updated, it must close out bad updates and allow for good ones. We do not want the future to be modernity multiplied out along some dimensions, but a shadow of what it could have been. Or, more straightforwardly, to be infrastructure profusion, because treating Superman’s god as software that can choose to not update is giving it too much credit.

We are all trying to steal the singleton from evolution in the name of our values. The current allegedly democratic sort of anarchic equilibrium above nations deserves no more of our loyalty, nor does it probably have the well-put-together-ness to trade with us for some of the margin produced by averting the apocalypse.

I advise keeping track of which reference class you’re poisoning and whether it’s worth it, by doing the thing that a predictor doesn’t want you to do. If you hide Anne Frank, you are making things harder for Nazi and Nazi-occupied civilians. Some of this may be a positive. There is an equilibrium between the Nazi civilians and their government, the cost may be somewhat passed on.

If something recognizes your status as an agent of your values, tracking that as a category in its head, that’s an exceptionally bad reference class to poison. The IRS refuses to differentiate between you and muggles. Other people working on AI long-term do differentiate.

I’m not confident of my translation into-words at 2:37am of the following idea, but: there are a lot of possible things in the multiverse generating priors where tight reference classes around you will coordinate with each other (or something else?) and there isn’t a reason to expect bias in them overall. Therefore if your values are altruistic, to whoever knows (as in connected pathway from senses through map to actions) that, act like you have subjunctive dependence to their prediction of you, without worrying whether you’re bearing too much load or too little in that trade among all altruists.

I think I lean heavily on something like that, but I don’t have sufficient data/time introspecting/whatever to say I’ve done anything like named and framed it right.

By: Paul Christiano

Paul Christiano — Sun, 21 Jan 2018 19:17:24 +0000

In reply to Paul Christiano. Also, the quoted passage seems particularly unobjectionable. The obvious way in which it would be fake is if I'm listing a bunch of reasons to be nicer, but I'm overlooking a bunch of reasons to be less nice. But in fact it looks to me like there is an asymmetry, with lots of strong reasons to be more nice but many fewer strong reasons on the other side. Do you want to point out strong reasons on the other side? Do you think this is fake for some other reason? Do you think those reasons are small considerations?

By: Paul Christiano

Paul Christiano — Sun, 21 Jan 2018 19:12:36 +0000

I agree that saying “decision theory implies you should vote” is weak and sounds pretty fake.

> This seems to be clearly a false face.

Doesn’t seem that way to me 🙂 If you wanted to convince me, I would be open to argument on the merits. So far the best counterargument is the appeal to intuition in the “Do you give up Anne Frank?” case (and other similar cases).

If the next paragraph is supposed to be a response to the point in my post then it seems confused. You say “y’all’s actions are not subjunctively dependent with that many other people’s or their predictions of you.” But (a) if me paying my taxes would cause others to predict that I wouldn’t pay my taxes, why would that make it more attractive for me not to pay my taxes? (b) my post asserts that my decision is subjunctively related to a tiny number of other people’s decisions.

I don’t understand your “don’t pay your taxes” example more generally. Exactly how many people do you think need to evade their taxes before everything turns out OK for them, and what do you think is happening in the world at that point? Is my goal to cause political chaos? How many people do you think I’m asserting make decisions correlated with mine?