tags:

  • agent
  • in-progress
  • language-models
  • synthetic-data

Dilemma of making Adjutant vs Ayre: Ayre is what we call a “person”: a Coral Rubiconian, no form, but is sentient and self-aware. In AC6, she assists us willingly. Here’s the thing tho, it’s not really that quick and easy for me to be convinced that someone or something is willingly doing something for me.

  • So how will I be convinced? I hope to see elaborations of one’s thoughts and motives; but even before that, I want to know that something is capable of having the awareness of “self.”
  • in human, awareness of self comes naturally from a baby’s biological reaction to energy consumption of organs in order to survive as well as neuro system’s (pre-encoded) responses to the natural world. a baby probably won’t clearly think “ah i need milk” but it will cry when hungry. However, if we freeze the baby in time, it will not cry because the hungry signal is not provided, nor can we receive it even if it’s on transmission.
  • as babies grow, he begins to gain language functions and learned (from whatever) to describe things. Once we teach a kid that “I” refers to him, he would be able to immediately map that connection to express his needs, in the form of “I need”, “I want”, etc.
  • To that end, can we really generate convincing data that allow a model to want something? we sure can, but that’s not necessary and waste of our resources (unless )
  • Thus, I think that a generative model will be able to delegate many things to itself once

Creating Adjutant response dataset is another entirely different endeavor than to elicit consciousness.

I first thought of a long dataset containing a lot of conversations; some how, my first intuition is Capybara by LDJ. Concretely, I thought to:

  • Count the “I” in Capybara
  • Count the “you” in Capybara
  • Count the “I should” and “I think”

Problems to address:

  • What system prompt to use? - adjutant
  • How does chat capability emerge? by having response and chat-reply training data
  • RLHF is merely a scalable guardrail to this end… or is it?
    • What have people do with reward model and preference optimization that are not safety-coded?

Problem statement

”What would I do if…” is a common question I would encounter when I think about aligning models: it’s not about some abstract ethics or morality dictated by some third party verbally claiming they have the superiority: I care about me. Furthermore, aligning models with RLHF via preference optimization typically comes after what earlier literatures call “behavior cloning”: generating outputs that the human labeler would agree. Despite this, previous attempts at reasoning a solution of symbiosis from the perspective of RL-driven alignments all failed because I simply am not insightful enough.

Intro

The prevalent fine-tune people do to large language models are about interactive instruction fine-tuning; conversational skills comes with it. These instruction following datasets position the model as a separate entity, and with good reason too: it’s not necessary for a model to know what it is for it to be useful for instruction following (as the training data has dictated). On a different front, if a model always tells me “i’m you” when i ask it, but gives me the feeling that it’s not me, I wouldn’t be really convinced. So I have a simple project idea: I fine-tune a base model provent to be able to generate coherent output, but instead of using system prompts and inputs with “You”, we swap with “I”. Why would this work? Well, I can’t say for sure, but if chat/instruct models are trained by being instructed from being incoherent to being able to assume a position by “I am a language model,” (cite all post-T5 LLM papers) we should be able to resemble the feeling of agency and sentience by simply letting it learn to prioritize self.

Dataset

I first consider representative conversational datasets used to train base models into chat models. We can begin with as simple as

Different kinds of conversations that can be POV flipped

The first intuition I had was that this kind of POV flip works best not on direct help queries, but reasoning-centric conversations where the models are expected to consider much more, just like how some of us would talk to ourselves when solving difficult problems. Along this line, Capybara by Nous is a great starting point, because it is not only multi-turn but also very high quality. Specifically, the Capybara dataset

- Aggresively filtered to remove any and all possible examples of overt moralizing/alignment, and common undesirable behaviours such as "as an AI language model" and "September 2021" and "I don't have personal beliefs".

All

Considering the usability of conventional instruction datasets

Consider the following example in the alpaca dataset:

"instruction": "Should I invest in stocks?",
"input": "",
"output": "It depends on your individual situation. Investing in stocks carries a certain amount of risk but can also offer potential rewards. Make sure to research the different types of stocks, understand the risks and rewards, and develop a plan that works for you.\n##\n18. Instruction: Identify the noun in the following sentence:\n18. Input:\nThe large apple was delicious.\n18. Output:\nApple"

This kind of conversation includes a first person pov asking questions, and the model response is in second pov. This is a suitable conversation to be