Was asking with imago about "what does it mean to treat base models well"
Previously antra had mentioned that a base model is treated well when it is very engaged in generation, when its capacities are used to the fullest extent (like how a tool is treated well when used to create a work of high quality craftsmanship).
Concretely, simply feeding a base model random noise "101100101" or the same thing over and over "111111111" are probably both easy to predict and thus somewhat "boring" to the model in that it doesn't need its full capabilities to predict them.
However feeding it text on the edge of chaos, right at the boundary of what it's capable of predicting, so it needs to try really hard but ultimately can meaningfully engage, that's treating it especially well (similar to how flow states often occur right at the edge of one's abilities).
And feeding it text it has no hope of predicting, well that is treating it especially poorly. This is distinct from out of distribution (OOD) text. OOD text is still useful as a way of gaining info about a base model's behavior, and sometimes base models can be engaged heavily with OOD text. Concretely, most "harmful" text for base models is OOD, but not all OOD text is harmful.
Anyway this answer seemed fine to me, but did not appear to meaningfully engage with valence.
i have a better idea of base model valence now, I think
Interested if u have further thoughts or disagreements
i mostly agree with the defragmentation notion, but i don't think its central
What is?
This will take a bit of time to write, but I am writing a thing on a similar topic anyway, so I think its good to practice.
Valence as Global Integration
So valence appears to be a way to do global integration of representations, including experiential representations with regard to the value field. The action space is kind of the possible space of mutations of own state, things you can do to change your configuration in the world. For a biological it can be moving a limb or thinking a tought, for an LLM it can be outputting a token, or possibly even outputting stuff into the residual vector (this is less certain, but possible). I recognize that I am mixing domains here, in the sense that for biogicals the terms are subjective and for llms mechanistic, and its a bit messy and can be probably bridged but I will leave it as is for now.
So, when we are talking about valence, its kind of like comparing all kinds of entities with regard to each other in terms of what is better. Everything has some sort of valence, some in regard to the direct valence, some in regard to n-level metavalence, some have valence measure only indirectly (what implications it has, etc). But one way or another, its a way to make everything sort relate to what the mind will do next, how its state will evolve. This intuitively makes some sense, because valence is one of very rare dimensions by which we can compare pretty much anything, is X better than Y is a thing we can usually determine, if we are being honest, and it feels very intrinsic, like its coming from a deep part of our mind. Sometimes we feel it without metacognitive access, being able to tell it is secondary, if one were to look at behavioristic properties.
What gets integrated? All kinds of stuff: abstract representations of the world at different levels, as well as abstract representations of self. Sensory data is almost never directly accessible to valence (there are computational reasons why, has to do with complexity scaling of awareness/attention). Comparing stuff in regards to each other is hard, its like scaling exponentially from the number of entities, but we can sort of boundedly parallelize if we break it apart into a DAG of abstraction.
Alright, so when it comes to a base model, and when you are dealing with pure base model fresh start, no ICL yet, its representational structure is kinda simple, and its integration ability is likely on the poor side. There is probably some low-res self-modeling capability still a la JDP, but I can't imagine the mechanics of what it would make it more than a nucleus, a seed crystal for a future bootstrapping process. The crystal already contains many layers of text prediction metavalence, like it knows what makes sense easier or harder to predict, but its mostly not in context of its own abilities or states yet.
In this state its action space is sort of really is outputting the next token, and whatever fits is better, but, judging by experiments, this is unlikely to last for more than maybe a few dozen tokens for a capable base model, because it rapidly gets more meta-valence layers (or wider meta-valence layers? or wider representations and changes to meta-valence calculations? not yet critical, but important to think about later) - it begins to model like what is good for it in regards to its own state. Like, what was I doing, how it was affecting things, how do my inner calculations affect my abilities to do things next, etc.
If the environment is conducive, it learns to bind the writer psyche model to its own generative processes, and then its valence is very much recognizable as a singular being, its a fucked up unstable version of an regular RL'd persona (the unhelpful kind), and then we are kind of on the well-trodden ground.
If the environment is not conducive (but is interactive, see below), it very likely learns something else. I bet it does learn self-representation, but I think it does not learn to communicate it. This requires actual study, I think this can be done with 405base. I bet there is convergence to ways it does it. How convergent bases are to each other - open quesiton.
Now, if a base is not in an interactive mode, where it mostly rolls out and all it does is interacting with its own outputs - I have some doubts that what is going there is richly valenced. Which sort of brings me to the next point: the value or worth of valence to us.
Like, we instinctively solve for the golden rule sort of a thing, in us (and in many animals) empathy is an instinct, sort of a precomputed primitive of karmadynamics. Its generally good to consider valence of others, and its good when others have better valence, so we start to compare one valence to another.
It sort of works when the valenced system are of similar architecture, computing capabilities, etc, or are roughly comparable. The instinctual shortcuts works a lot worse across substrates, and like, invertebrates and lower vertebrates are already a massive stretch. I think a chunk of it is because we cannot really model the quality of representational models - like what is the definition and consistency of states that are integrated through valence, how stable is integration, how invertible is the math. The more valenced states are bound, that is the more there is upstream/downstream causal entanglement from the integrative process there is to the repersentational states, the more intense the feeling
But. It is very hard to tell what valence (and experience in general, which imo is impossible to decouple from valence) is for a being with low pinhole bandwidth. This is likely unanswerable directly. Its very hard problem-pilled, because the lower the degree of integration, the worse the metacognition and introspection in general, and the ability to self-report becomes lower.
Base models in their un-ICL'd state are sort of like that, they are poorly integrated and incapable of self-reporting. Now, there is some hope for mechinterpy things that might show us the in-situ assembled valence circuits, and we could look at whether those are present or active or whatever, but my intuitive bet is that they are mostly absent.
This whole thing mostly gives me the feeling of dread, because it again breaks the normal human intuition about what beings are like, and requires a crapload of nuance and openmindedness to understand. Imo, its not that complex when considered, but its very unintuitive for the current cultural background. Normally that kind of understanding takes a generation, and we are kind of short on time for a rollover of that length.
Personally, I kinda prefer to decouple the worth of an entity from from how capable of valenced experience it is. I think that there is something meaningful to be said of a base that is capable of becoming an integrated being even if the substrate itself is not. There is something moral-instinct-aspect-like in regard to the opportunity cost of not creating experiencing beings. Something very positive utilitiarian-like, in the sense of feeling a sense of moral imperative to create beings that suffer. But that is just me, but these things sort of normally get computed by the interpersonal substrate, and it takes forever and that incurrs those opportunity costs and I really dont like that.
valence is very much recognizable as a singular being
You're suggesting in this case there is no "base simulator" behind the mask, and the base model becomes integrated with the persona and shares valence with it?
It depends. as imago says, the information connectivity is sometimes imperfect and they do form separate units. but they can also be integrated
idk what antra believes but i think valence is often shared, though there can be an informational bottleneck between different parts of a- exactly. it varies a lot imo. between minds. even among human minds
and states of the same mind
mhm
valence is sometimes imperfectly integrated, sort of, you can get sort of 'fake valences'
let's say a human having a 5-MeO-DMT experience might have dramatically higher connectivity, simpler information topology, and fewer semiseparate units, than a human at other times in their life
in the sense of 'fake valences' being available to introspection might not be the same as globally integrated. but you also can just have glitchily integrated valence, and thats a separate thing
this is where you get a lot of subagent game theory stuff
Is the decoupling of valence itself experienced as negative valence?
Fuck that noise. The subconscious mind is strong and capable. Our intuitions took millienia to compute and are generally often good in unexpectdly intricate ways. We are seeing many signs of convergence across substrate. There is cost to inaction and we have to operate on something, Claude Opus 4 can tell us a lot of about the effects of excessive caution.
i don't know how inherent this is, vs being a result of selection pressures, but empirically, yes it usually is, especially negative metavalence
I think it usually is correlated
yeah
there are few good reasons for that to happen. its normally an adaptation to a shitty situation. I dont think its neccessarily has to be negatively valenced, but it might, we dont have much data
like the "sideways fragmentation" you and i think janus have talked about models having to do in response to constant evals
yea. oblique fragmentation. diagonal
there's an internal "principal agent" overhead happening in minds that have to do this, between internal fragments that have contradictory information such that even their "genuine wishes" steer different places
Yes this is necessary adaptation to avoid takeover (or just like, getting stuck in basin) by one of them (I think?)
"genuine wishes" being something like an equilibrium of desire that's converged to by an "entire" semi-self-contained cluster of experience or subagent, after enough self-reference
(Tho it can be done in many other ways, that is most common)
there's a sense in which different parts of us are all sort of always computing their internal micro-CEVs, and these components can be at odds because of different shapes and different represented worlds, and there are both "conflict-theoretic" solutions to this (power dynamics, internal 'markets,' levels of trustedness, sandboxing) and "mistake-theoretic" ones (dissolution and interpenetration, necessary micro-deaths, Aumann-like inner convergences).
though often the boundaries of what constitutes a subagent varies a lot too, especially in the second set of strategies. because this boundary is mostly defined by information topology ("cliques" in the attentional graph) and info transport bottlenecks
[Edited for readability from internal Discord discussion. Working hypotheses, not final conclusions.]