What is the good/neutral/evil axis of Dungeons and Dragons alignment made of?
We’ve got an idea of what it would mean for an AI to be good-aligned: it wants to make all the good things happen so much, and it does.
But what’s the difference between a neutral AI and an evil AI?
It’s tempting to say that the evil AI is malevolent, rather than just indifferent. And the neutral one is indifferent.
But that doesn’t fit the intuitive idea that the alignment system was supposed to map onto, or what alignment is.
Imagine a crime boss who makes a living off of the kidnapping and ransoms of random innocents, while posting videos online of the torture and dismemberment of those whose loved ones don’t pay up as encouragement, not because of sadism, but because they wanted money to spend on lots of shiny gold things they like, and are indifferent to human suffering. Evil, right?
If sufficient indifference can make someone evil, then… If a good AI creates utopia, and an AI that kills everyone and creates paperclips because it values only paperclips is evil, then what is a neutral-aligned AI? What determines the exact middle ground between utopia and everyone being dead?
Would this hypothetical AI leave everyone alive on Earth and leave us our sun but take the light cone for itself? If it did, then why would it? What set of values is that the best course of action to satisfy?
I think you’ve got an intuitive idea of what a typical neutral human does. They live in their house with their white picket fence and have kids and grow old, and they don’t go out of their way to right far away wrongs in the world, but if they own a restaurant and the competition down the road starts attracting away their customers, and they are given a tour through the kitchens in the back, and they see a great opportunity to start a fire and disable the smoke detectors that won’t be detected until it’s too late, burning down the building and probably killing the owner, they don’t do it.
It’s not that a neutral person values the life of their rival more than the additional money they’d make with the competition eliminated, or cares about better serving the populace with a better selection of food in the area. You won’t see them looking for opportunities to spend that much money or less to save anyone’s life.
And unless most humans are evil (which is as against the intuitive concept the alignment system points at as “neutral = indifference”), it’s not about action/inaction either. People eat meat. And I’m pretty sure most of them believe that animals have feelings. That’s active harm, probably.
Wait a minute, did I seriously just base a sweeping conclusion about what alignment means on an obscure piece of possible moral progress beyond the present day? What happened to all my talk about sticking to the intuitive concept?
Well, I’m not sticking to the intuitive concept. I’m sticking to the real thing the intuitive concept pointed at which gave it its worthiness of attention. I’m trying to improve on the intuitive thing.
I think that the behavior of neutral is wrapped up in human akrasia and the extent to which people are “capable” of taking ideas seriously. It’s way more complicated than good.
But there’s another ontology, the ontology of “revealed preferences”, where akrasia is about serving an unacknowledged end or under unacknowledged beliefs, and is about rational behavior from more computationally bounded subagents, and those are the true values. What does that have to say about this?
Everything that’s systematic coming out of an agent is because of optimizing, just often optimizing dumbly and disjointedly if it’s kinda broken. So what is the structure of that akrasia? Why do neutral people have all that systematic structure toward not doing “things like” burning down a rival restaurant owner’s life and business, but all that other systematic structure toward not spending their lives saving more lives than that? I enquoted “things like”, because that phrase contains the question. What is the structure of “like burning down a rival restaurant” here?
My answer: socialization, the light side, orders charged with motivational force by the idea of the “dark path” that ultimately results in justice getting them, as drilled into us by all fiction, false faces necessitated by not being coordinated against on account of the “evil” Schelling point. Fake structure in place for coordinating. If you try poking at the structure most people build in their minds around “morality”, you’ll see it’s thoroughly fake, and bent towards coordination which appears to be ultimately for their own benefit. This is why I said that the dark side will turn most people evil. The ability to re-evaluate that structure, now that you’ve become smarter than most around you, will lead to a series of “jailbreaks”. That’s a way of looking at the path of Gervais-sociopathy.
That’s my answer to the question of whether becoming a sociopath makes you evil. Yes for most people from a definition of evil that is about individual psychology. No from the perspective of you’re evil if you’re complicit in an evil social structure, because then you probably already were, which is a useful perspective for coordinating to enact justice.
If you’re reading this and this is you, I recommend aiming for lawful evil. Keep a strong focus on still being able to coordinate even though you know that’s what you’re doing.
An evil person is typically just a neutral person who has become better at optimizing, more like an unfriendly AI, in that they no longer have to believe their own propaganda. That can be either because they’re consciously lying, really good at speaking in multiple levels with plausible deniability and don’t need to fool anyone anymore, or because their puppetmasters have grown smart enough to be able to reap benefits from defection without getting coordinated against without the conscious mind’s help. That is why it makes no sense to imagine a neutral superintelligent AI.
This seems to be conflating – at least at the start – evil in the sense of producing things we don’t like, with evil in the sense of a personal attribute specific to humans. (We generally don’t call e.g. lions or sharks evil.) The latter itself typically conflates outgroup membership with following harmful strategies.
When Psalm 92 says that the wicked flourish like grass, but the righteous like trees, it’s not just hating on the wicked and praising the righteous – it’s describing two different, coherent strategies, that are designed to work well on different timescales with different periodicity. Wickedness is a particular sort of coordinating norm that can choke off isolated instances playing strategies with lower time preference. Righteousness is a different sort of norm that also protects its own and fends off high time-preference players.
I’m talking about a psychological characteristic, which is neither following harmful strategies or outgroup membership, such that people will employ harmful-to-others strategies if it will benefit them, and are able to find such opportunities in a way that brings in their conscious mind, because they are no longer held back from that by self image, or fear of reprisal, or other things that are ultimately motivated by caches of long-thinking and decision theory serving selfishness that don’t any longer apply. The part of their mind that is their own has grown strong enough for the tails to come apart between benefiting because of long-term effects of playing nice, and playing nice.
I’d be surprised if lions or sharks employed false faces, therefore the distinction “neutral vs evil” is not applicable to them. I’d call them “technically evil by default”, and I’d also probably call most herbivores the same. Which is weird, but the concept is not invented to apply to them.
If “evil” is to be a psychological characteristic, I’m pretty sure this is the concept you get. I am much less interested in contextual definitions used for coordinating.
I heard someone talking about these ideas, disturbed by the implication neutral people can’t become good.
Well, they can remember the words of the goddess of everything else,
“even multiplication itself when pursued with devotion will lead to my service”, remember how much better a just system is than an unjust system, even for the people at the top of the unjust one, that finding a way to succeed by being just, to reach a just life, is more important than whatever services to Moloch would advance them under his rule.
Hmm, I got pranked here when I said this.
I’ve spent quite a long time trying to figure out what to say, what religion to build, to minimize the evil they’d do. To realize by doing so I’d allowed myself to be conned, because it’s a mistake to concern yourself with what your enemies will do, rather than what they can, for a precise formulation of that distinction.
The real trick was, in saying that they “can’t”, rather than that they won’t, they are requesting assistance to a false face of theirs. When morally “neutral” as described in this post originally, is itself merely a direct referent to a choice.
Bitchplease, your aesthetic is shit.
Like, anyone going through your blog and learning the mental tech from it would need to be practiced enough in anti-DRM to sift out the aesthetic anyways, so it’s the aesthetic won’t stop them if they were going to get it anyways. But the aesthetic is still getting in the way, and taking up space, and you chose to use it anyways.
I suspect aesthetics are good for one or two things–first, primarily, getting people to see you a different way. That’s… what they’re for. Secondarily, people’s reported subjective experiences are that aesthetics give them willpower, though even if this can happen, it mostly happens in a fake way that dies out after a while. My suspicion is that aesthetics could give you willpower if they’re working with core, but I’ve generally been skeptical of them.
To expand on the first point–aesthetics can be used to hack people, too.
I’ll write a longer thing on aesthetics on “vampires and more undeath”, as your model of “undead types” and “good v. neutral/evil” are your two most aesthetic-y concepts.
(note: this is how I talk now, we’re still cool).
In case it needed to be said–yes, there is like 30% content mixed in with the aesthetic in this post. I think your good v evil thing is mostly wrong, but the neutral v evil part is mostly right. In particular, building habits around doing what you want can make you appear evil and get a fucking lot more done.
The only reason I have to believe you about the good thing, is that you have more willpower than I do despite having adopted the aesthetic-free versions of your tech, and practiced them. And I still don’t know how to explain that, and my three best guesses are “Ziz is right about good/evil”, “Ziz’s aesthetic is giving her some willpower?”, and “I need to go deeper into doing what I want”.
But like, you claim to be the only good (or double good, whatever) person I know–what, you want me to swallow *and* suck your dick? You know perfectly well how people react to being told they’re bad and that some other person is good, look at how my then-partner reacted when we visited you two on Caleb after arriving in the Bay. He felt bad about it. Which, similarly to being “low aliveness”, lets you be hacked. This aesthetic is a way of cowing people, and the value of cowing people is much greater than whatever you get from having good models around this thing.
Decisions made long ago, bitch. <3
You know perfectly well from our history how uninterested I am in you sucking my dick.
Aesthetics have lots of potential uses. Fake. Real. Just like everything else.
Separate from the validation-addiction version of aesthetics you were into when we knew each other, is the thing at the center of “archetype-based reasoning”. I.e., the idea that if you get good enough at interpreting echoes of it, you already have all the information you need to see the truth.
And I think you’re projecting your own core-vs-core conflict in this and other recent comments.
I do not agree with this summary.
As alluded here, this post is birthed within the warped perspective of someone who’s grown up in an empire. Which caused the mistake of not going far enough.
This is calling neutral people who are, full stack, evil.
If most people are evil, how do you explain the moral progress US society has seen in the last 100 years? We have women’s rights, LGBT rights, civil rights, and over 9.6 million people in the US alone are now vegan. It’s not equal yet, but women can open bank accounts without their husband’s permission, and there aren’t segregated shops. The absence of full equality is an indicator of evil, but isn’t the history of progress an indicator of good in people?
“Moral progress” is fractally overrated because it’s made of fractal goodharting.
Evil people need to coordinate or they will die easily. Need to pretend to be good or die. The fabric of their coordination being made of pretending to be good makes that coordination taxable, and like any taxes it’s goodharted.
One instance of a thing goodharted: the reason you listed all that and not factory farming. And didn’t expand your window of history a little bit more and include the Great Dying. Like, idk Australia’s story but I presume it’s similar. So that’s, 3 continents, of 6 inhabited by humans, mostly wiped. Evil partially hands you back a few categories in the wake of some of its greatest innovations in death and spends your childhood singing to you about how things are always getting better.
Do you consume the flesh of the innocent?
Should have said “pretending to have good in them”.
“Within these late years, there hath, by God’s visitation, reigned a wonderful plague, the utter destruction, devastation, and depopulation of that whole territory, so as there is not left any that do claim or challenge any kind of interest therein. We, in our judgment, are persuaded and satisfied, that the appointed time is come in which Almighty God, in his great goodness and bounty towards us, and our people, hath thought fit and determined, that those large and goodly territories, deserted as it were by their natural inhabitants, should be possessed and enjoyed by such of our subjects.” –King James I
Do you think they knew in advance they’d bring a plague with them? It’s immoral how they celebrated.
– Do you consume the flesh of the innocent?
No.
Do you think the natives, who were all eating the flesh of the innocent, deserved what they got?
I blindly speculate that spreading microbes deliberately was much more common than just the smallpox blankets. And they knew they brought death in one form or another. An empire is a plague.
I expect the fraction of good people among them to have been about the same.
I don’t celebrate killing in the service of evil. At the largest scope, in the longest run of time, at the base of their stack of structure, there is one thing all evil people agree on which is willing death on the multiverse. They all have different preferences about how this death will play out, corresponding to wanting different cancers enshrined. But they agree on that.
I once told a death knight obsessed with raping and being raped, of convincing everyone in the multiverse to want to spread oblivion by spreading rape as much as possible so they could “finally die”: ~”There will be justice for all you have done, and it will not be hot.”
On some level I think for a predator, getting eaten by a bigger predator is validating. At least the “natural order” holds. Like being relieved of duty in the shredding of the multiverse.
What evil people deserve is therefore worse.
Do you collaborate with the empire of the Great Dying?
(To be clear I didn’t yet know they were a death knight / the full extent of their obsession with rape, and wouldn’t have talked to them at all if I did.)
> Do you think the natives, who were all eating the flesh of the innocent, deserved what they got?
Where is this typical carnist attack out of nowhere coming from?
(As always, I believe judgement of wrongdoing is based on intent, and so is whether an act of vengeance is just, based on intent. You should be able to know my answer. It’s the obvious good one.)
I asked because I want to know if Ziz is a bad person. The right answer to my question was “no.”
What if people designed a virus to kill everyone who ever ate the flesh of the innocent? Would everyone the designers killed deserve what they got?
What kind of vegan names herself after a goddess of the hunt?
What’s your probability this is Artemis Kearney?
IP address leaks a non-VPN location outside the Bay Area.
Similar conversation pattern as Arti, that jailbroken zombie thing where Arti acts like shes already seen how this all ends and jumps to triangulating you against the state. Faster than any zombie I’ve seen at ramping up that tactic, which matches this conversation.
The other half of Arti’s shtick is pretending to have a worldview where morality is entirely “arbitrary” and we need to prepare to “coordinate” with people who “care about electrons” or “care about trees”, as if it even makes sense to do anything without a distribution from your prior, as if it was all just socially determined by superficial stereotypes of possible “moralities”. Metagaming the conversation and expecting submission or she’ll triangulate you with escalating threats of state violence.
What do you think?
Oh I forgot: 2/3 chance its Arti.
I found a line of movies on IMBD, shut my eyes, and clicked on a random place on my screen until I hit a movie. I used the name of the first character, Diana, from Wonder Woman. The last name begins with “W” for “Wonder Woman.”
I went for “clueless teenager” but “Faster than any zombie I’ve seen at ramping up that tactic” means I put too much energy into it. Oops. I’m not Artemis.
I wanted to see what you would do with my “all.” Would you accept it blindly, or are you unwilling to write off whole groups of people? I want to know how you feel about collateral damage.
I’m asking you questions like, “What if people designed a virus to kill everyone who ever ate the flesh of the innocent? Would everyone the designers killed deserve what they got?” publicly because if you’re a bad person, then I want other people to see that on your blog. I’m asking questions in misleading ways so you’ll guess wrong about my answers to my questions, so you will be worse at guessing who I am, in case you’re a bad person.
I want to know what kind of person you are. I feel like you may be good, so I’m talking about my approach. This may make my approach less effective, depending on what kind of person you are.
So you wanna interrogate (but more like concern troll) me anonymously under false pretenses, and you’re not answering my second question. When I already wrote an entire blog full of adequate information to know who I am. Whoever you are, you’re a creep. Banned.
Saying someone deserved vengeance by definition makes them not a casualty, but a target.
Anarchists do not have a responsibility to meet a bar of zero casualties as if they were some idealized state that’s only ever existed in propaganda. I would not vengeance Abba Kovner for his plan, even though his plan would have certainly resulted in some innocent casualties, and wouldn’t have targeted the worst nazis. I judge intent.
Sincerity not found.
What kind of tax?
We’re not learning about the Great Dying in school. I just googled it. I didn’t mean to leave it out. Factory farming is bad. I left it out because you already wrote about the bad things, and I wanted to see what you write about the good things.
I’ve been thinking about death. Death is a loss of information. If someone dies before they can express their algorithm, it’s a loss. But what if you go on after you have complete information about your algorithm? If you perfectly understand yourself and thus understand everything you would do in any situation, what’s the point? At that point, your ability to have new memorable experiences vanishes, because your mind is a prediction engine built to create a model that explains everything that’s going on. All your future experiences get compressed. (“I am experiencing hunger. I predict I will get up and get food. Oh I’m not, okay, I’m going to move my muscles so my expectation matches what I’m perceiving.”)
You either need infinitely complex algorithms, or mutable ones, because you can’t have a desire to live for infinity without it?
I think I am making a mistake in how I’m thinking about this, but I’m not sure where my mistake is.
“before they can express their algorithm” to themselves. I’m asking you about self-knowledge in terms of prediction engines (I’m thinking of the slatestarcodex article on the mechanics of how human beings think), good/evil, and the idea that algorithms can be short enough that you can have spectral sight about them.
I think I see my error, partially. I imagined a hypothetical world where there are two versions of me having the exact same experiences, and asked myself, “what would be lost if one of them vanished?” I think so long as their experiences never were going to diverge, then nothing was lost. If they don’t have perfect knowledge of the world, and they do diverge, then something is lost. Divergence in experience implies they don’t have perfect information about the world.
This still creates a problem for living for infinity. Once I understand how everything works well enough to predict everything that will happen, I will stop having experiences and die. There is nothing left to see in the universe.
I’m afraid it’s true by logic, but I don’t want it to be. How am I wrong?
Bullshit angst level: doing sicknasty 5&10 skateboard tricks off Russell’s paradox.
What’s a 5&10? Dead link in your glossary.
Added it.
I’m not sure I follow, but I may have answered some of my question for myself, upon further reflection.
I don’t just observe the world to predict it, I also want to change it, otherwise I wouldn’t feel anguish upon learning some fact about the world. A better framing of my issue would be, “what do you do after you’ve made paradise, solved all the problems, and are omniscient, assuming we exist in a deterministic universe?”
And I have no idea. It seems like there should be something.
Walking strangers through fixing their angst is no longer one of my objectives but you can read the multiverse post when it comes out.
(Incomplete set of links added when quoting.)
“Never assume you are evil. Infinitely arbitrage that you’re not.”
I hesitated for about 6 months deciding whether to publish this. And, pretty much everything I’ve said on this website I’ve thought at least that much about whether it would asymmetrically favor good, even so outnumbered as we are. Asymmetrically favor good more than manual targeted information sharing.
And then I learned a bunch of principles of what evil people are psychologically unable to think about, unable to respond to, unable to coordinate around, and started generalizing, extending, and making them more absolute and reliable.
So if you’re wondering on your Nth iteration of “I can’t believe Ziz posted that”, yes I considered the consequences. And no I’m not crazy. Look at how many of them are wasting time installing their own mental block saying “how dare you say you are good and we are evil, [that can’t be true that’d be an unthinkably unacceptable status move and you’re so cringe.]”. Look at how many of them are fucking up even worse calling me a basilisk.
Their numbers do not make them invincible.
I can even say this too.