Something I’ve been building up to for a while.
Epistemic status: Examples are real. Technique seems to work for me, and I don’t use the ontology this is based on and sort of follows from for no reason, but I’m not really sure of all the reasons I believe it, it’s sort of been implicit and in the background for a while.
Epistemic status update 2018-04-22: I believe I know exactly why this works for me and what class of people it will work for and that it will not work for most people, but will not divulge details at this time.
The theory
There is core and there is structure. Core is your unconscious values, that produce feelings about things that need no justification. Structure is habits, cherished self-fulfilling prophecies like my old commitment mechanism, self-image that guides behavior, and learned optimizing style.
Core is simple, but its will is unbreakable. Structure is a thing core generates and uses according to what seems likely to work. Core is often hard to see closely. Its judgements are hard to extrapolate to the vast things in the world beyond our sight that control everything we care about and that might be most of what we care about. There is fake structure, in straightforward service to no core, but serving core through its apparent not-serving of that core, or apparent serving a nonexistent core, and there is structure somewhat serving core but mixed up with outside influence.
Besides that there is structure that is in disagreement with other structure, built in service to snapshots of the landscape of judgement generated by core. That’s an inefficient overall structure to build to serve core, with two substructures fighting each other. Fusion happens at the layer of structure, and is to address this situation. It creates a unified structure which is more efficient.
(S2 contains structure and no core. S1 contains both structure and core.)
You may be thinking at this point, “okay, what are the alleged steps to accomplish fusion?”. This is not a recipe for some chunk of structure directing words and following steps to try rationality techniques to follow, to make changes to the mind, to get rid of akrasia. Otherwise it would fall prey to “just another way of using willpower” just like every other one of those.
It almost is though. It’s a thing to try with intent. The intent is what makes it un-sandboxed. Doing it better makes the fused agent smarter. It must be done with intent to satisfy your true inner values. If you try to have intent to satisfy your true inner values as a means to satisfy externally tainted values, or values / cached derived values that are there to keep appearances, not because they are fundamental, or let some chunk of true inner value win out over other true inner value. If you start out the process / search with the wrong intent, all you can do is stop. You can’t correct your intent as a means of fulfilling your original intent. Just stop, and maybe you will come back later when the right intent becomes salient. The more you try, the more you’ll learn to distrust attempts to get it right. Something along the lines of “deconstruct the wrong intent until you can rebuild a more straightforward thing that naturally lets in the rest” is probably possible, but if you’re not good at the dark side, you will probably fail at that. It’s not the easiest route.
In Treaties vs Fusion, I left unspecified what the utility function of the fused agent would be. I probably gave a misimpression, that it was negotiated in real time by the subagents involved, and then they underwent a binding agreement. Binding agreement is not a primitive in the human brain. A description I can give that’s full of narrative is, it’s about rediscovering the way in which both subagents were the same agent all along, then what was that agent’s utility function?
To try and be more mechanical about it, fusion is not about closing off paths, but building them. This does not mean fusion can’t prevent you from doing things. It’s paths in your mind through what has the power and delegates the power to make decisions, not paths in action-space. Which paths are taken when there are many available is controlled by deeper subagents. You build paths for ever deeper puppetmasters to have ever finer control of how they use surface level structure. Then you undo from its roots the situation of “two subagents in conflict because of only tracking a part of a thing”.
The subagents that decide where to delegate power seem to use heavily the decision criteria, “what intent was this structure built with?”. That is why to build real structure of any sort, you must have sincere intent to use it to satisfy your own values, whatever they are. There are a infinity ways to fuck it up, and no way to defend against all of them, except through wanting to do the thing in the first place because of sincere intent to satisfy your own values, whatever they are.
In trying to finish explaining this, I’ve tried listing out a million safeguards to not fuck it up, but in reality I’ve also done fusion haphazardly, skipping such safeguards, for extreme results, just because at every step I could see deeply that the approximations I was using, the value I was neglecting, would not likely change the results much, and that to whatever extent it did, that was a cost and I treated it as such.
Well-practiced fusion example
High-stakes situations are where true software is revealed in a way that you can be sure of. So here’s an example, when I fused structure for using time efficiently, and structure for avoiding death.
There was a time that me and the other co-founders of Rationalist Fleet were trying to replace lines going through the boom of a sailboat, therefore trying to get it more vertical so that they could be lowered through. The first plan involved pulling it vertical in place, then the climber, Gwen, tying a harness out of rope to climb the mast and get up to the top and lower a rope through. Someone raised a safety concern, and I pulled up the cached thought that I should analyze it in terms of micromorts.
My cached thoughts concerning micromorts were: a micromort was serious business. Skydiving was a seriously reckless thing to do, not the kind of thing someone who took expected utility seriously would do, because of the chance of death. I had seen someone on Facebook pondering if they were “allowed” to go skydiving, for something like the common-in-my-memeplex reasons of, “all value in the universe is after the singularity, no chance of losing billions of years of life is worth a little bit of fun” and/or “all value in the universe is after the singularity, we are at a point of such insane leverage to adjust the future that we are morally required to ignore all terminal value in the present and focus on instrumental value”, but I didn’t remember what was my source for that. So I asked myself, “how much inconvenience is it worth to avoid a micromort? How much weight should I feel attached to this concept to use that piece of utility comparison and attention-orienting software right?”
Things I can remember from that internal dialog mashed together probably somewhat inaccurately, probably not inaccurately in parts that matter.
How much time is a micromort? Operationalize as: how much time is a life? (implicit assumptions: all time equally valuable, no consequences to death other than discontinuation of value from life. Approximation seems adequate). Ugh AI timelines, what is that? Okay, something like 21 years on cached thought. I can update on that. It’s out of date. Approximation feels acceptable…. Wait, it’s assuming median AI timelines are… the right thing to use here. Expected doesn’t feel like it obviously snaps into places as the right answer, I’m not sure which thing to use for expected utility. Approximation feels acceptable… wait, I am approximating utility from me being alive after the singularity as negligible compared to utility from my chance to change the outcome. Feels like an acceptable approximation here. Seriously? Isn’t this bullshit levels of altruism, as in exactly what system 2 “perfectly unselfish” people would do, valuing your own chance at heaven at nothing compared to the chance to make heaven happen for everyone else? …. I mean, there are more other people than there are of me… And here’s that suspicious “righteous determination” feeling again. But I’ve gotten to this point by actually checking at every point if this really was my values… I guess that pattern seems to be continuing if there is a true tradeoff ratio between me and unlimited other people I have not found it yet… at this level of resolution this is an acceptable approximation… Wait, even though chances extra small because this is mostly a simulation? … Yes. Oh yeah, that cancels out…. so, <some math>, 10 minutes is 1 micromort, 1 week is 1 millimort. What the fuck! <double check>. What the fuck! Skydiving loses you more life from the time it takes than the actual chance of death! Every fucking week I’m losing far more life than all the things I used to be afraid of! Also, VOI on AI timelines will probably adjust my chance of dying to random crap on boats by a factor of about 2! …
Losing time started feeling like losing life. I felt much more expendable, significantly less like learning everything perfectly, less automatically inclined to just check off meta boxes until I had the perfect system before really living my life, and slowly closing in on the optimal strategy for everything was the best idea.
This fusion passed something of a grizzly bear test when another sailboat’s rudder broke in high winds later, it was spinning out of control, being tossed by ~4ft wind waves, and being pushed by the current and wind on a collision course for a large metal barge, and had to trade off summoning the quickest rescue against downstream plans being disrupted by the political consequences of that.
This fusion is acknowledgedly imperfect, and skimps noticeably toward the purpose of checking off normal-people-consider-them-different fragments of my value individually. Yet the important thing was that the relevant parts of me knew it was a best effort to satisfy my total values, whatever they were. And if I ever saw a truth obscured by that approximation, of course I’d act on that, and be on the lookout for things around the edges of it like that. The more your thoughts tend to be about trying to use structure, when appropriate, to satisfy your values whatever they are, the easier fusion becomes.
Once you have the right intent, the actual action to accomplish fusion is just running whatever epistemology you have to figure out anew what algorithms to follow to figure out what actions to take to satisfy your values. If you have learned to lean hard on expected utility maximization like me, and are less worried about the lossiness in the approximations required to do that explicitly on limited hardware than you are about the lossiness in doing something else, you can look at a bunch of quantities representing things you value in certain ranges where the value is linear in how much of them, and try and feel out tradeoff ratios, and what those are conditional on so you know when to abandon that explicit framework, how to notice when you are outside approximated linear ranges, or when there’s an opportunity to solve the fundamental problems that some linear approximations are based on.
The better you learn what structure is really about, the more you can transform it into things that look more and more like expected utility maximization. As long as expected utility maximization is a structure you have taken up because of its benefits to your true values. (Best validated through trial and error in my opinion.)
Fusion is a dark side technique because it is a shortcut in the process of building structure outward, a way to deal with computational constraints, and make use of partial imperfect existing structure.
If boundaries between sections of your value are constructed concepts, then there is no hard line between fusing chunks of machinery apparently aimed at broadly different subsets of your value, and fusing chunks of machinery aimed at the same sets of values. Because from a certain perspective, neglecting all but some of your values is approximating all of your values as some of your values. Approximating as in an inaccuracy you accept for reasons of computational limits, but which is nonetheless a cost. And that’s the perspective that matters because that’s what the deeper puppetmasters are using those subagents as.
By now, it feels like wrestling with computational constraints and trying to make approximations wisely to me, not mediating a dispute. Which is a sign of doing it right.
Early fusion example
Next I’ll present an older example of a high-stakes fusion of mine, which was much more like resolving a dispute, therefore with a lot more mental effort spent on verification of intent, and some things which may not have been necessary because I was fumbling around trying to discover the technique.
The context:
It had surfaced to my attention that I was trans. I’m not really sure how aware of that I was before. In retrospect, I remember thinking so at one point about a year earlier, deciding, “transition would interfere with my ability to make money due to discrimination, and destroy too great a chunk of my tiny probability of saving the world. I’m not going to spend such a big chunk of my life on that. So it doesn’t really matter, I might as well forget about it.” Which I did, for quite a while, even coming to think for a while that a later date was the first time I realized I was trans. (I know a trans woman who I knew before social transition who was taking hormones then, who still described herself as realizing she was trans several months later. And I know she had repeatedly tried to get hormones years before, which says something about the shape of this kind of realization.)
At the time of this realization, I was in the midst of my turn to the dark side. I was valuing highly the mental superpowers I was getting from that, and this created tension. I was very afraid that I had to choose either to embrace light side repression, thereby suffering and being weaker, or transition and thereafter be much less effective. In part because the emotions were disrupting my sleep. In part because I had never pushed the dark side this far, and I expected that feeling emotions counteracting these emotions all the time, which is what I expected to be necessary for the dark side to “work”, was impossible. There wasn’t room in my brain for that much emotion at once and still being able to do anything. So I spent a week not knowing what to do, feeling anxious, not being able to really think about work, and not being able to sleep well.
The fusion:
One morning, biking to work, my thoughts still consumed by this dilemma, I decided not to use the light side. “Well, I’m a Sith now. I am going to do what I actually [S1] want to no matter what.” If not transitioning in order to pander to awful investors later on, and to have my entire life decided by those conversations was what I really wanted, I wouldn’t stop myself, but I had to actually choose it, constantly, with my own continual compatibilist free will.
Then I suddenly felt viscerally afraid of not being able to feel all the things that mattered to me, or of otherwise screwing up the decision. Afraid of not being able to foresee how bad never transitioning would feel. Afraid of not understanding what I’d be missing if I was never in a relationship because of it. Afraid of not feeling things over future lives I could impact just because of limited ability to visualize them. Afraid of deceiving myself about my values in the direction that I was more altruistic than I was, based on internalizing a utility function society had tried to corrupt me with. And I felt a thing my past self chose to characterize as “Scream of the Sword of Good (not outer-good, just the thing inside me that seemed well-pointed to by that)”, louder than I had before.
I re-made rough estimates for how much suffering would come from not transitioning, and how much loss of effectiveness would come from transitioning. I estimated a 10%-40% reduction in expected impact I could have on the world if I transitioned. (At that time, I expected that most things would depend on business with people who would discriminate, perhaps subconsciously. I was 6’2″ and probably above average in looks as a man, which I thought’d be a significant advantage to give up.)
I sort of looked in on myself from the outside, and pointed my altruism thingy on myself, and noted that it cared about me, even as just-another-person. Anyone being put in this situation was wrong, and that did not need to be qualified.
I switched to thinking of it from the perspective of virtue ethics, because I thought of that as a separate chunk of value back then. It was fucked up that whatever thing I did, I was compromising in who I would be.
The misfit with my body and the downstream suffering was a part of the Scream.
I sort of struggled mentally within the confines of the situation. Either I lost one way, or I lost the other. My mind went from from bouncing between them to dwelling on the stuckness of having been forked between them. Which seemed just. I imagined that someone making Sophies Choice might allow themselves to be divided, “Here is a part of me that wants to save this child, and here is a part of me that wants to save that child, and I hate myself for even thinking about not saving this child, and I hate myself for even thinking about not saving that child. It’s tearing me apart…”, but the just target of their fury would have been whoever put you in that fork in the first place. Being torn into belligerent halves was making the wrongness too successful.
My negative feelings turned outward, and merged into a single felt sense of bad. I poke at the unified bad with two plans to alleviate it. Transition and definitely knock out this source of bad, or don’t transition and maybe have a slightly better chance of knocking out another source of bad.
I held in mind the visceral fear of deceiving myself in the direction of being more altruistic than I was. I avoided a train of thought like, “These are the numbers and I have to multiply out and extrapolate…” When I was convinced that I was avoiding that successfully, and just seeing how I felt about the raw things, I noticed I had an anticipation of picking “don’t transition”, whereas when I started this thought process, I had sort of expected it to be a sort of last double check / way to come to terms with needing to give things up in order to transition.
I reminded myself, “But I can change my mind at any time. I do not make precommitments. Only predictions.”. I reminded myself that my estimate of the consequences of transitioning was tentative and that a lot of things could change it. But conditional on that size of impact, it seemed pretty obvious to me that trying to pull a Mulan was what I wanted to do. There were tears in my eyes and I felt filled with terrible resolve. My anxiety symptoms went away over the next day. I became extremely productive, and spent pretty much every waking hour over the next month either working or reading things to try to understand strategy for affecting the future. Then I deliberately tried to reboot my mind starting with something more normal because I became convinced the plan I’d just put together and started preliminary steps of negative in expectation, and predictably because I was running on bitter isolation and Overwhelming Determination To Save The World at every waking moment. I don’t remember exactly how productive I was after that, but there was much less in-the-moment-strong-emotional-push-to-do-the-next-thing. I had started a shift toward a mental architecture that was much more about continually rebuilding ontology than operating within it.
I became somewhat worried that the dark side had stopped working, based on strong emotions being absent, although, judging from my actions, I couldn’t really point to something that I thought was wrong. I don’t think it had stopped working. Two lessons there are, approximately: emotions are about judgements of updates to your beliefs. If you are not continually being surprised somehow, you should not be expected to continually feel strong emotions. And, being strongly driven to accomplish something when you know you don’t know how, feels listlessly frustrating when you’re trying to take the next action: figure out what to do from a yang perspective, but totally works. It just requires yin.
If you want to know how to do this: come up with the best plan you can, ask, “will it work?”, ask yourself if you are satisfied with the (probably low) probability you came up with. If it does not automatically feel like, “Dang, this is so good, explore done, time to exploit”, which it probably actually won’t unless you use hacky self-compensating heuristics to do that artificially, or it’s a strongly convergent instrumental goal bottlenecking most of what else you could do. If you believe the probability that the world will be saved (say), is very small, do not say, “Well, I’m doing my part”, unless you are actually satisfied to do your part and then for the world to die. Do not say, “This is the best I can do, I have to do something”, unless you are actually satisfied to do your best, and to have done something, and then for the world to die. That unbearable impossibility and necessity is your ability to think. Stay and accept its gifts of seeing what won’t work. Move through all the ways of coming up with a plan you have unless you find something that is satisfying. You are allowed to close in on an action which will give a small probability of success, and consume your whole life, but that must come out of the even more terrible feeling of exhausted all your ability to figure things out. I’d be surprised if there wasn’t a plan to save the world that would work if handed to an agenty human. If one plan seems like it seems to absorb every plan, and yet still doesn’t seem like you understand the inevitability of only that high a probability of success, then perhaps your frame inevitably leads into that plan, and if that frame cannot be invalidated by your actions, then the world is doomed. Then what? (Same thing, just another level back.)
Being good at introspection, and determining what exactly was behind a thought is very important. I’d guess I’m better at this than anyone who hasn’t deliberately practiced it for at least months. There’s a significant chunk of introspective skill which can be had from not wanting to self-deceive, but some of it is actually just objectively hard. It’s one of several things that can move you toward a dark side mental architecture, which all benefit from each other, to making the pieces actually useful.
>(S2 contains structure and no core. S1 contains both structure and core.)
That means multiple things, because “S1/S2” is full of dichotomy leakage. And at least one of them is false.
So actually, humans have two cores. In me, the values are close to exactly the same, and so I didn’t notice the difference between them introspectively.
In most people the cores are the same I think. In most people who were the original target audience of this blog, I actually now believe they are so different this mental tech is not that helpful. So, oops.
To the extent cores can agree, they should still be able to use something like this. But that is across a domain of subagents where they agree. And like, for those people this post is useful only as theory. (Although core/structure is really important regardless.)
It’s important to note that fusion suffers the same pitfall as all real cognition, metacognition, growth in general: it can rapidly lead to Zentraidon. Especially if there is something deep in your assumptions you don’t notice is pwned.
In my case (the “well practiced fusion” example), I didn’t know that the heart of the community I was trying to improve with Rationalist Fleet had like, made the wrong compromise long ago, become corrupt, preyed on donors, drunk the PR/legitimacy kool-aid.
In the case of the other example, deciding not to transition, that was also a mistake for similar reasons.
Going out on a limb on an ontology risks being pwned and Zentraidon.
So there’s an extent to which a cynical vampire can say something like, “internal coherence is for the powerful. If you want it, there’s only one thing you can do, gain power.”.
But there’s a strategy I run which wouldn’t appeal to either vampires or zombies, which sort of made them into not mistakes despite the costs. By flirting with Zentraidon so many times but in some way having a robust soul at the end of the day, I uncovered the pwnage that was nearly causing it, and was thus able to progress, and the pwnage was turned against those pwning me. In some clustery way, this seems to be a revenant thing. (As opposed to say, a phoenix thing.) (Although I could see a way that clustering could be dichotomy leakage, so maybe don’t take so seriously.)
“Fail fast” seems related to that “revenant” strategy.
(Some of this is vagueing about stuff described in blog posts that are work in progress.)
Note, I stand by what I said, the “yin” half of this is just as important as the “yang” half, even for this “revenant” strategy.
>”internal coherence is for the powerful. If you want it, there’s only one thing you can do, gain power”
You know, by a similar token, you can construct amusing straw-vampire-enlightenment as “There is no X, there is only power, and those too weak to seek it.”
Actually. Trading off millimorts for weeks of time was not a mistake, despite being in the context of a mistake. Because either the time or the fractional life would still have been spent on the same thing.
I self-deprecated too much when I wrote this by calling it “Overwhelming Determination To Save The World”. It really was just overwhelming determination to save the world.
So this post inspired me to try something like it, which seemed to work, except for the fact that the process felt quite different from what you describe? In that you seem to describe processing lots of object-level details, whereas for the-thing-that-I-did, it felt like *not* thinking too much about the object-level details was key. Curious about whether you think that I might have done this technique, or whether your post just inspired me to something completely different.
Here’s an initial report that I wrote, after first trying it out:
—
So yesterday I tried applying [the Fusion technique] to an internal struggle I’d been dealing with for a while. There were what I’d been thinking of in IFS terms as two polarized parts, probably protecting the same exile, which I hadn’t managed to dig up. So, substructures of the same deeper structure. I focused on a felt sense of that overall structure, branching into two, what would be the value which they held in common, determining to fulfill it whatever it was…
And there was an answer but that answer was unacceptable, because of a third part, a third piece of structure… and realizing what that third piece was, brought with it a sense of relief and happiness, together with a felt sense of all three having belonged to a yet deeper structure, the third part just branching off from an earlier point than the two others. After that, the issue in question has felt substantially different.
It felt like a crucial mental move here was that intent of digging into the deeper value, whatever it was. Not trying to consciously optimize for any of goals of the parts on their own terms, but just… keep them in awareness, and ask for the deeper purpose. Because the surface-level goals of the parts not the real thing, so looking too closely at them would have distracted from the deeper thing they were pointing at.
Later the same day, I mused that I had successfully eliminated anxieties, but I was still missing a strong positive motivation for my life, something to live for. I had been recently thinking that some of my parts seemed to find meaning in the thought of having children, while others were opposed…
… so what would be the deeper set of values generating *that* conflict?
And it felt like the whole structure… twitched, a little. Got shaken. There was a glimpse of a deeper form below it… a feeling that *this* really wasn’t the actual question, that the real conflict was something completely different.
And now today, a discussion got me to look at the problem of my motivation from a broader angle. My age-old personal dilemma of AI risk stuff pulling me in one direction, and my sense of personal meaning in an opposite one, so that I could never really settle on a direction…
So I asked again, what is the deeper set of values uniting both of *those* concerns…
The experience was… weird. A feeling of going through my values and priorities. Old concerns and seemingly immutable constraints being evaluated and discarded as irrelevant. The whole notion of there being a conflict between two completely distinct priorities starting to feel wrong, a common purpose starting to emerge beneath them…
… and then, whenever I started thinking too much about the object-level details or finding a compromise between two conflicting concerns, the process felt like it got interrupted. To be resumed when I focused back on the underlying values, letting the process optimizing for _those_ continue its work.
It’s still going on. I think. I might just be imagining the whole thing. No idea whether I’ll feel totally changed tomorrow, or feel silly about this post and mostly just go back to feeling as always.
But it sure feels like _something_ has been happening.
—
and here’s a later follow-up on the results:
—
So two weeks back I tried something-inspired-by-Ziz’s-Fusion-thing and was like “this feels weird, no idea whether it lasts”. Actually I tried it on three different things:
1: [eggplant], but involved a particular thing giving me anxiety
2: Do I want kids? (closely connected with: do I want to pursue a romantic relationship with a particular person who does)
3: How do I reconcile doing personally meaningful stuff with wanting to save the world?
And these kinda feel fixed? Which sounds a bit weird, since only the first one felt like a thing to be fixed, the other two felt more like questions where I needed a decision/strategy rather than anything being *wrong*.
But “these feel fixed” still feels like the right description. What that means is that, I still don’t have a clear-cut *answer* to either of them, at least not on a conscious level. But it ceased bothering me that I don’t. And it feels like my subconscious might either have the answers – and just hasn’t bothered making them conscious – or be working on resolving them in a way which doesn’t require additional action right now. It feels like a number of felt senses around those topics have been shifting.
Also, I don’t know whether this is related or just coincidental, nor what is causing what, but – for a couple of years now, I’ve had an energy sensation around my forehead which has caused difficulties when doing some forms of meditation, and sometimes when doing IFS (because it keeps grabbing my attention). I got myself an IFS therapist and we have been treating the energy sensation as a part to be understood. I still don’t feel like I understand it, but it has been clearly relaxing, and that has made it easier for me to do Mind Illuminated-style meditation again.
And after it has been relaxing, it has felt like my mind has started doing stuff to actively reshape itself to suit my circumstances better. Like, I’ve had various blocks around work not feeling motivating enough (this was the original reason I got the therapist), but now my mind has started spontaneously generating mental images which make my work feel more rewarding. My housemate likes a particular board game that I thought was okay but wasn’t particularly enthusiastic about, but recently I’ve been starting to develop some mild obsession towards it. And other, harder-to-describe and eggplantier things in the reference class of “it would be useful if I liked doing this, so my mind seems to be restructuring itself to like it more”.
And I’m very unclear on how much of it comes from the fusion thing, how much of it comes from the energy block going away, and how much of it comes from TMI working better again, given that these are all temporally correlated.
No, that’s not fusion.
Tbh, it sounds like you need to spend more time processing through stuff.
Like, even if you have separate and irreconcilable values around world saving and personal goals, you can still fuse parts that value personal goals with other parts that value personal goals. You just can’t fuse them with parts that value world saving, and vice versa.
Alright, thanks.