first guest is Oles Petriv, network manager and

co-founder of the company, we will talk with them

about the future of artificial intelligence, but

hello, thank you very much for the invitation, I

hope there will be interesting conversation


So let's start with the fact that I'm

sure that many people know you, but

can you tell me some interesting facts about yourself? Introduce yourself

a little. I'm all

if in a few words

I'm Emelie Resortir. in my

school days, I

was fascinated by the idea that theoretically,

perhaps in the future it will be possible to

look into what happens

inside the mind, whatever it may be,

and the best way to

understand something is to try to reproduce it


more time in recent years

15-16-17 I

have been doing that what is an attempt to reproduce various

aspects of what is happening here,

so that later I can play with it,

experiment, turn some results into

some cool ideas,

or into products, or

into things that can be talked about, for


very cool, in fact, as much as I

remember, as much as I remember some of yours

speeches, conversations with you, you always

had some ideas that were completely


for example, there were bedings that had to be done, you

always had to tell me, I remember some

moments about the

Galusei distribution, and

some completely

unusual approaches, and how do you manage

to generate exactly such an approach that is

not such as the mainstream, everything is very simple, well,

novelty. It is always

near a critical point, on the border between

several clusters of meanings,

i.e., if some Sphere appears in

science or, well, an area of ​​knowledge,

meanings gradually begin to form in it,

around which other meanings relative to them

are sought, for example, a new science, a new

programming language or some phenomenon

gradually begins

to structure a large cluster of phenomena around it,

this cluster is surrounded by some other

clusters of

phenomena that have a plus-minus relationship to this,

but does not belong to this class

and on the border between the clusters in the zone where it seems the

greatest vacuum, i.e. the smallest of

verbalized meanings,

new stars are born,

accordingly, we just need to


bring our consciousness into this intergalactic space

between the already known formed meanings and

see what is there and very

often There are just a bunch of diamonds. Well,

if you think about


happened with the fact that some

new approaches are coming that allow you to do something that

used to be that and very often these new

approaches are a combination of what was previously

considered either not combined or in general

absolutely belongs to different fields of science,

areas of knowledge and so on, and then someone

comes such a Apple Pen Apple

And what do you think we still have a long time to


? Well, it's definitely not long compared to the

long way we've come, starting from

beating furniture ago,

compared to very, very not long, but it

seems to me.

doesn't realize and it's not in the plan there,

automation, economic changes and so

on, it's just like the

surface of the water, and it

seems to me that in order for society to

survive in general and humanity to somehow

continue to

exist peacefully, there must be a

cardinal change in the

fundamental paradigms that

underlie how we now we imagine

the reality and it seems to me that we will

find these suffragettes in the very near

future, well, I would like to find them

in reality, because it is very interesting, even the fact that

it is happening now, it is already interesting. This is

such an interesting time.

keep up to date because in

one day Google implementation is being implemented there

and a new model is being implemented

from Facebook or from someone else, honestly,

I can't be anymore.

hours a day, you go in, you find

some 2-3 cool papers there, you read

them, you taste ideas and you

have new ideas, you

can normally allocate time

to dive deep and not

run diagonally, then somewhere through, well,

two or three lessons of these papers in the afternoon,

it started to appear on the archive only that, well, you

couldn't afford to dive super deep into

each of them, you had to

start privatizing. Then the

service for you started to appear, there I am, this

archive of senses of the prize and all the others who had

already carried out a certain preliminary

filtering of that which is really worth it,

then after a couple of years it

became critical not only to be able to read

some cool papers and try to

talk with the guys, it would also be good to

save time due to the fact that there is a code

that you can immediately come and take and

play with it and not waste what is there

day two or three for


well, all kinds of paper

Scotts and all the others connected there and so

on began to appear.

But nevertheless, if I remember, it seems to me that

until 2017 it was still somehow

like that, uh,

it's an expedition putter and

now To be honest, if before I used

paper communication and all connected

resources, I was just satisfied when

you open them, come in, taste them. Oh cool,

such an idea, such an idea. I want to

see this with this. Oh, this is a very cool idea. I

will experiment with this today,

but now, unfortunately, it is already it's

just starting to give me an overdose of cortisol because you

open the

code so early in the morning,

go to the

liytest tab and just look at how many

things have appeared since last night, and

it just makes you feel somehow,

well, creepy from the realization that

this train is already flying at such a speed

that to hope to jump into it so

as to be aware of everything, this already makes

absolutely no sense. Well, it

helps simply instead of

trying to be in time at all, because you

just know how

now in all spheres something is happening,

like the moment of inflation after the big bang,

when the rate of expansion of space

simply stops for any framework of


and in order to simply not tear you apart

and not scatter all over the place everywhere and

nowhere at the same time you have to

find for yourself 2-3 ideas that you gave a

really [ __ ] drive and it works, you personally

just want to advance for yourself first of all

regardless of how it

will be perceived by others how much it

will be sex there Story there Does it carry Story will

you write about it Paper there will you turn it

into some kind of company product and make

billions Just if you can

enjoy the fact that you just

advanced in some in this direction,

then you can just start doing it

and not worry too much about the fact that with

every second you fall further and further

behind everything that is flying in all directions. It's

been a long

time since I didn't use the Vision computer at all,

and at some point I realized that I'm

right there after six months,

I'm like, I don't understand anything here.

and bipolar disorder between

and the computer Vision

already about three times 4 or 5 times I have had

such changes in stages in my life when, for

example, you are interested in a stupid thing

and there are

some for the year 2009. I remember then I

read a book by Hawkin And Olen

Teljens this is the former founterpalmo, he

wrote a book about how he sees what

should be the basis of so-called

algorithms that would somehow

migrate what is happening in our

neocortex, he did not propose, he did not directly

write the code of the algorithm, but he brought out

certain concepts that should definitely be

present he wrote a book about such an algorithm, the

book is quite cool

because it explains in simple words a large

number of basic such fundamental

things that should definitely happen here,

regardless of what experience we

get, it is simply a consequence of the fact that we are

able to think,

it inspired a group of people there. They

created a company that it's called moments,

and there they wrote a short little

tool that goes to the question, it allowed me

to play with those algorithms, and

I was drawn to this area, I

played a lot with all kinds of CLA, it's the so-

called hierarchical temporal memory, it's

even before

and it's something like genetic algorithms, that's how it

turns out no, there it is closer to the

channel ones on Euroresh, it’s just that in


the network was already there for decades,

known only in narrow circles.

version of this thing and

browsers, experiments with them on

the camera. I learned to dictate

with the camera. This one tried to dictate which

Pixels on the next frame will be

activated. How do you like them activated? Well, in the process, a

certain representation of black appears, which

begins to micron the input, especially if

there are any existing patterns of

regularity in it, and so

then you dive deeper and deeper deeper

into the sphere for several years of working

with the text by benings and all kinds of digging these

times it was uh then it was popular to have

still in the company Gviz a separate person

who obtained Sichi laid out all these


and then

algorithms were probably the most popular there it

was a Foresta brand there,

it was such a time, I did

n’t earn my bread on Ivankovo, so I could

afford anything in the field of neurons

and not limit

myself only to working in practice.

Well, it’s much more interesting.

ltm happened. You released it and it was

straight. Wow, it was such a breakthrough.

I remember that even before Transformers of the

City, we were calibrating tn and then adding

separate classifiers at the output there. Well,

there were very interesting architectures of this

whole thing and at some point everything just

changed. it reminds of this period

somewhere from 2012 to 2017 in the field of nlp. Well, there

was a lot of progress there. Of course, it is not as

extreme as it is now, something like the

period when in the early 2000s there was

this Cambrian explosion, just a

variety of phone form factors,

where every week it comes out legs with

keyboards on the sides unfold like this and

here in the other direction and so on and then the

iPhone 10 just appears, the ego is just

such a black rectangle and everything is

evolutionary variability

Well, Transformers They solved everything,

they directly changed everything at once, many

problems were solved, in fact, such interesting

and then the LMs that appeared

recently, they changed everything again,

in fact, because

in fact it is not even so much the

element itself, because the creation of Charge gtp is the

version for the chat, which enabled

the broad masses of people to

touch it and somehow start

using it at the same time it's

completely different there, starting with

playing and ending with the fact that

instead of Tekverhov, some people use it, which is not a

very good idea, but

sometimes when I get into a dead end, it's just a

complete dead end, well, that is, you catch yourself because

you sit for two hours and look at the

cursor, which is like that blinks in one

place And you haven't written a single symbol in

your head, it's all in your head in order to

put together a puzzle, and it doesn't go

straight, no way for me Well, in the last two

weeks, players

3 leash situation and I was helped by twitter,

it's cool, he sometimes really helps

but there is also the opposite situation, when you

ask him to do something, at the same time, something is

not even very complicated, and something like that is clean, so that the

syntax is not hidden, it

produces results, you try to

compile it, an error occurs, then you write it to him,

and he produces absolutely says

that this is it,

the others are not does it work,

here's another one for you, maybe try this one, I don't

know, it's very reminiscent of a joke there,

like Where did you get this code, the

question is, is it from there?

there is a visual

patrol tire. Well, because then I worked in the

field related to the film industry, it

was necessary to analyze,

recognize different images of the video stream, and

so on, segment, classify,


and add something to it, and then the further you

dig in,

the more you begin to understand that in order to

Well, when you already reach some

kind of Plato of efficiency, where in order to

follow it the quality of how you

solve some visual problem there by

half a percent you need to put in a

simply colossal amount of effort or

find some completely alternative

approach to the formulation of the problem itself,

according to which most of

such jumps in

quality solving all the problems there, they

were related to the

utilization in some form of the

internal structure of the language, that is, to

think, for example, about the classes of objects into

which you classify your photos of

cats and dogs.

you need to sort the

image. And what

about some point in the semantic space

that has some properties and this

point accumulates around itself the meaning of

what, for example, a cat or a dog is,

and if you start from scratch

Minecraft. You understand that this is an incredibly

difficult task,

but if take and analyze language and

how cats, dogs, computers, mice, and so

on meet each other. It

turns out that even with a simple spoken

language, you can get the distribution of these properties in a schematic space without even visual

data and any additional


and so it seems to me that in this is

precisely the

greatest power that humanity still has. It

seems to me that natural language does not shoot to the end,

it is the ability

to model a large number of

other modalities,

visual clothing,

emotional and so on

without direct accompaniment from these


that if you

present the words as their extracts in a

multidimensional space, then this embedding is

not just that it just represents

represents the word itself and also its


actually raise the question What is a

word, what is a language, what is a dictionary, and because

we are used to thinking about language in

many ways explains why we do

not fully understand what is actually happening

inside the Transformers and why it

is able to solve the problem at all. That is,

Well, in my opinion, every person on the

planet now has to walk and such an

incredibly simple algorithm that solves

only the problem of predicting the next

token in the sequence is able to

solve task is able to plan a

trajectory in the semantic space in order to

pass from the point where you are to the

point where the solution to your problem is located,

it is able to act as some kind of actor, to have

some properties, to work effectively

despite the fact that fundamentally

only nexoking prediction is happening, and this is

the most, as far as I am concerned, well,

it is not a mystery because there are logical arguments for this. Well, there are logical

arguments why it has to happen like this, it

cannot be otherwise, but this is an area in relation to

which I would forbid the use of the

word simple, because people are very often like this in relation to

language. But it is simple, it is a simple word,

you generally use a lot of things quite

complicated And in general, if they don't pay

attention to it Well, like the iPhone - it's a super

complicated thing in general, any modern

electronics is a super complicated thing that

includes a lot of engineering, a lot of

software, and it's the work of thousands of people

over thousands of hours, and people also

use it, I don't know they post pictures of

cats in the same way Well, well, it seems to me

that if you draw an analogy there with the iPhone,

now most people with large,

small models are like, well, you

got an iPhone like this, looked at it, turned it

OK, some kind of flat black thing, if you

click here, it lights up like that Oh, it's cool that

you can highlight when I'm walking down

the street in the dark, go home, I

put it in my pocket and if it's night, you

turn it on, I didn't even examine the screen,

there are also separate flashlights

and you light up the road conveniently. Cool.

You can recommend it to friends. But is it a

simplified perception model the

interaction that allows it to

really reveal all the

potential that is in this iPhone, which

simply gives you access to the Internet.

Maybe you want to study somewhere there, you want

to communicate with all the people on the planet, I

don’t know how to earn money and without the

need to leave the house

And so on And so on and get

access to all the knowledge

But in order to even think about what you have that you

can use this simple

rectangular one for such

purposes, well, you need to force yourself to perceive

this thing not as just a thing that shines

on me dear, but listen, it's even

better in my case it's used

as a thing that shines on the road, in the worst

case it's used conditional, I

don't know how there is a hammer, a stand or something else

because, as you can imagine, now

students, for example, very, very actively

started using gtp for all

tasks I can see it simply from my

KPI students, who have

already learned to differentiate the shades on

the rates,

partly I suspect that this is

still the case, but what will happen next is difficult for me to


self-directed use of everything, that is,

if we take, for example, the same training in

general, the availability of which models of

access to them, it changes self-directed

learning conditionally, now

to ask to write some essay there or

something else, well, it’s not a task at all

now So it seems to me that if for

some technical specialties it

can be a plus, conditionally, I don’t know how the

tool is used, then for humanitarian

specialties it’s generally just

such a

Fatality blow, we’ll strike a general

opposite opinion, it seems to me that in

particular Just for humanitarian fields it’s


Well, this is how it’s

happening now, the moment like for

astronomy, the moment of the invention of the telescope,

what happened to astronomy, and we looked into the telescope,

observed what we could see

there, tried to name the stars, notice

some regularities, and then a

tool is created that

reveals to the already armed eye what

you were thinking about before.

you had no

way to either check it or investigate it in

more detail, and well, it gave an

incredible boost to astronomy, which then

replaced the general idea of

how the universe works. It

seems to me that for the humanitarian spheres, a

large model is like a

giant exoskeleton for a

thinking tool, that is If before the

price of education was to

learn how to move independently in

space in a certain way, so that you

run into it so that as a result your

trajectory sounds like there is a poem that

is filled with meaning and also sounds beautiful

there and so on or navigate in such a way

that anyone will read your

navigation in the space of senses,

I understood as much as possible what was

being talked about, that is, the structure of how to

write a text,

such things, and so on.

Well, we actually learned and taught others

to walk in a certain space, and it

is important, for example, that even if you

fall hard, get up and continue. what techniques how to

do so as not to fall, how to move

so as not to get lost, how to return to the

last point where you are oriented and so

on And so on and so on And this, in fact,

was the majority of the learning process in the

field of humanitarian books and not only

sciences this is the process of

learning to

navigate in some systematic

space, now with the

possibility for most people to

access the Big Models, which are essentially a

navigation map of this schematic

space, some constructed all of humanity,

accordingly, the possibilities of Google

Maps appeared, here you were walking on the sun, somehow

orienting yourself and then from which side there is

moss and trees growing, but

Google Maps appears to you and you can not

just type there determine in which

direction the city is located, find the

optimal route, choose a

cheaper way to get there, etc.

right in the city where you are so that she

comes to you by pressing the

button. Well, you understand. It seems to me that

the question is who will

use it. And most people

will not use it in the way

that you just wrote, what really it will

move progress further, it will create an

opportunity for people to

break through something

faster, better, and so on, but for

the majority, it seems to me that it will be the

problem that they will not learn

to walk, but rather will use it

as a skeleton in order to move like this in

space itself, in

evolution, this happens all the time,

that is, some kind of innovation appears or

some species opens up a new adaptation

mechanism that is not found in others and that very

rapidly changes the schedule of who is who

eats whom and so on and who hides from whom

Sometimes it can snow on change the head

and nothing catastrophic

happens, well, that is, all

those who find themselves in less favorable situations

as a result of the changes are

forced to adapt, and now

we increase fidelity, new adaptive

properties will appear in those who are in a

worse situation, here I think the same, that is, it

appears a tool that

now allows us to navigate in the space of meanings, where

before, well, we thought that this is a kind of thing

that is only here and something that is

super secret for a person like that and so

on and in no way can it be for

experts from the outside, although we have been doing it all the time in

history even the process itself, the very process of

writing anything, is already a way of

using some element of the

outside world to make your

process more complex, so more, well, more

structured grounds and minimize

the likelihood that you will get lost somewhere or the

space of what you can think about,

well, we

can't imagine, there is a huge amount

there also a non-zero curvature,

respectively, the probable number of paths that

lead you from the goal in this space, an

infinite number of times more than

how many paths that optimally

lead you to your destination, respectively,

the task is effectively a navigator in this

space. it is

quite effective to navigate in this

space, well, natural human language,

its very structure will already contain an

interdependent set of mechanisms that

have evolved in the same way over the course of

10 tens of thousands of years in order to

maximize the probability that from

point A to point b in automatic


the agent’s consciousness

will reach the maximum probability goals

What will lead to the fact that the path will be

realized in the form of some kind of physical

action That is, you have planned yourself there How are you

there in the morning Will you come to work by which

route by taxi on a helicopter on a

plane on the Space Shuttle or for some other reason

among all the possible options for achieving the

goal you choose the one which corresponds to

your object you minimization of energy

money there blah blah blah and so on

and the most optimal path eventually

gets a chance to be embodied embodied

in some particular physical

process and thus language becomes the

effectiveness of language becomes a filter

relative to What phenomena from this

abstract space of potentials, only what

can theoretically be thought of

and implemented, and so on, what exactly are the

trajectories from there that then manifest the

types of physical phenomena that occurred

and simply by the fact that they occurred

automatically affected everything that

will happen after them

and that's why language seems to me in some

fundamental transcendental

space to be the most important

barrier that separates the space of

potentials from the space of

realized complex meanings, and people, the

world we live in, created by what

came through this barrier, the initial

yes, if you look at the different languages ​​of different

peoples, they differ in the same way, and

peoples also differ among themselves, so in

a certain way it turns out that way. By

the way, the question arises here, but

what does this model learn from?

Actually, it learns from what

humanity has generated. For all these

previous years

on a fairly large volume of information

But nevertheless, if we look into the

future and talk about the fact that new

models will be trained and will be

trained on what will be generated in us

during this time, then it is

possible to influence and change them in this way


our space of senses and

how it is a navigator and

manipulate it, how people will

use it in the future. Well,

first of all, this is not a very new problem,

it has

been a problem since there have been consistent languages

between people, well, this is absolutely not a new thing


more most rulers in any

historical period noticed that how people

speak, you know how they think and how we

think determines what actions we


and implement, what actions we

implement determines what the reality will be,

what we live in, and what

will be for other people now is already a fact, not a

hypothetical, some kind of imaginary abstraction and,

accordingly, by influencing the way people

speak, they influence what

they see around

them, we can change the structure of

people's consciousness, which as a result

leads to the fact that they behave in

one way or another, if this to do

purposefully Well, this is called

propaganda and so on in different ways. We simply

see such a

negative negative context in the word propaganda. Why,

because it reminds us of what we

are afraid of, we are intuitively afraid of the fact that

someone can manipulate us for

their own purposes which are beneficial to someone but not

beneficial to us. And no one wants to do something

that is not beneficial to you, and at the same time, we

don't even suspect it, and that's

why we are afraid of propaganda,

but at the same time, look earlier, to have

such an influence, you needed a lot of

resources, you needed you had to have people who would

tell you, you had to

generate this content, you had to have

some methods of its delivery, well, conventionally, it

started with the radio en masse, and

now there, too, through various communication channels.

in Stamford, when

they took just a general model of the


llama, then generated

a lot of examples with the help of the da Vinci model and

thus trained their model so that

it was more similar to what they needed.

created this dataset,

worked on it and

simply did it for 600 dollars in the

same way, in principle, it is possible to influence the

subsequent development of mass models,

that is, it is possible to use previous

models or models that are freely

available in order to

distribute them conditionally on the Internet.

somewhere there are a lot of those

messages that you want to promote and

get a result when and

the worst thing is that it can be done in

such a way that it will be

practically undetectable for anyone who

aims to collect a high-quality dataset from the

Internet so that it is relevant, but at the same

time that it does not have there is


unwanted information sewn inside,

put there by Bad Ektor, and this

seems to me to be a very, very big new

challenge that we will face now, because I

suspect that the next generation

will already have more than one thousand dollars

inside, because

there are ways to sew

scales guaranteed neuronki a

certain semantic structure that is

invisible and invisible to no

human observer to a transformer

who simply predicts the next talking

to him it is the same at all But which

cannot be easily described by ordinary

simple linguistic constructions

in order to then cut it out of the

semantic space, this is not just a stupid

Propaganda type where something is written that

can be easily infiltrated

and then used, that is,

I don’t even know whether Well, it’s good

if, for example,

with those

steganographic methods for encryption, there is a

certain sequence, yes, and what should

be the properties. No, I’m not talking about

the methods here, I’m talking about the importance of the problem itself



in a visible calm type of text about

some remote thing to encrypt

there the second or third level of

some completely different people that

is stored in the ratio between

neighboring tokens in the sequence But

which was absolutely on the detectors if you

look at this very sequence how much does the

transformer use Well, let's say that

all levels of abstraction are available, which at least

somehow make it possible to predict the

next talking, respectively, if the

effective level where the puzzle is assembled

is on the second or third level of

abstraction, then he used it, he

used it, this second third

level, although on the first level it looks

like there you know the children's fairy tale, which is

not about anything

bad at all, in such

structures you can

sew up, let's say,

your entire subsystems to which

no one will have access.

this is a big question because

Well recently we just didn't face

such problems



By the way by the way here is another good question What

Well I think you are aware of Bloomberg GP

when they trained their model on their data

and said Actually GPT

model and they said that it is the best

financial model that currently exists and on the

one hand it shows that the people who have been

collecting there over the past years there their

date they can fix the model and

get super cool results on

the other hand we return again to

our method of training from Stanford

when we generate our

smolds and bigdead with a key abseple and get a

good result as a result. To be honest, it seems to me

that, for example, all

the moments

using the generated data for a

more consistent coverage of space

due to the fact that we do not have enough

females or they are very unbalanced,

many of these problems also arise from the fact

that we are now neuron trainers,

how we use backpropagation of

errors, what optimization methods are used, and

in general, how do we architecturally

solve the problem of Next Talking Generation?


neurons, it is absolutely not necessary

to have a mechanism for the reverse propagation of an

error or a bunch of these architectural

nuances that practically

allow us to do it now, and with huge

expenditures of resources, time, money, competition,

bones, and so on, and to

do things that our brain

does, well, billions of times more

energy-efficient. effectively and although in a


more time, but it is also a question whether it takes more

time, because we are not only

engaged in reading tokens all the time, because we

get most of the information from the

visual, auditory, tactile

responsibilities there, and all this is still mated into

one structure because who knows who is faster

Transformers By the way, Likon is in

principle with this. Subsequently, he also says that

nextgenation is, in principle, the way to nowhere

from the point of view, and Jay

and I here agree with him and no. It

seems to me that it is exactly Next Talking

Generation that

it should be looked at as to the paradigm of

modeling dynamic spaces in general, and not as a

simple instrumental method to

solve the problem of modeling language or

creating cheat bots, what I mean,

that is, if we return to the fact that language

represents not just Tokyo but, in principle,

meanings, and

we have a space of meanings, then then we

model not nextokination, it turns out And

we are

the text, the meaning of Generation,

well, that is, there is something in this,

as always,

corpuscular-wave dualism, what is

verbal symbolic language, which consists

of words that we line up in a certain

sequence in order to convey

some abstract meaning to it who

will come by

the specified path Well, this is the process of

contization of a certain continuous

some nepshot of a continuous field

Imagine a space

some field in which there are a large

number of gradients of senses different

modalities visual auditory

emotional there blah blah blah

any point of this space is a

possibility the state of the observer of the

consciousness of a certain agent

which is described by its internal

context, the internal context is everything

that falls inside a certain subspace,

accordingly, such a context can have a

size, it is literally what volume of this

space is inside the

context of the agent

and the amount of information it becomes either

larger or smaller.

But it can also change its

position according to some

associative spatial-temporal coordinates there

in the tree, the reason for your holiday connections


since the field in this space is not

homogeneous and has some attractors My

some bifurcation zones have certain

trajectories where the

effective speed of navigation is very

high and also has a zone where the effective

speed of navigation slows down sharply


needs a type of context increase in

order to continue moving

Well, this field is continuous and any thing

we think of already has some

coordinates somewhere in this space,

people developed a language in order to make a

navigation structure from quantized

clear landmarks how do we

use certain quasars as landmarks in

astronomy? you use what we

call words in order to

determine the presence or absence of a certain word and attention to it and attention to it in the current context And where we are

now here is the nearest

word there

studio podcast conversation Me and there TTT if

I gradually enumerate them and


structure and localize in

this space all the states where

consciousness can be localized with a

certain accuracy we

are not always with a certain accuracy with the accuracy of the

quantization of words

I think that the windows had

just the idea Well, it’s not about

abandoning the

drain of engineering, but he

approaches it more from the point of view that

we are next to him,

he has more penetrated the forest of

proballistics and, accordingly, he wants

to abandon such an attractive

idea and switch to other architectures.

That is, we do not throw away the meanings that we have is

in the words But we interact with them in a different way,

I honestly don't remember

what he proposed

to do instead, but we didn't get the

impression, that is,

the idea here is not to

change the paradigm of communication, but the idea is

that to change the architecture of the

network itself in order to conduct this

communication Well, it seems to me that the

architecture of the network is already there

a transformer over in general a neural network

or a neural network, in principle, conditionally, it is

possible to model not only that

with neurons,

this is a question of the type of implementation itself.

I think that in our many people

in the field,

even those who make products

of the world, in particular, this and life, because there

is no way to approach and so on, as to

what problem we are actually solving. That is,

everyone is like that. So, here in the winter, we use this

method and we use this method. And I think that

this is the method the solution is better and type And

what do you mean What problem do we solve

with the help of language models and

type the answer to the question well Landhovich

modules type This is not the answer

First I think that the fundamental

task is to create a

navigation structure that is as consistent as possible with respect to all contexts that we can imagine

which is described in a certain way, it can be a

graph, it can be a language model in the form of a

transformer, it can be, I don't know, a set of

triplets and their probability of meeting,

in principle, even just basic

there Talking tripple Model, it already in

a certain way describes the structure of a certain

geometric object that has the

topological properties are very similar to the

topological properties on the

essence manifold where people think and operate,

I am more than sure that there

will be no one there in the horizon of three to five years tomorrow

Transformers are energetically wild, not a way

out, an unprofitable approach But what is

important in order to give a boost

in general in the field

and to find such a really effective

representation. It seems to me that a really

effective representation

will ultimately come down to a

communication protocol, that is, in

my opinion, this is the pinnacle of

optimization of any system when you

can make the system work in

the intended way, at the same time it

can be designed as a small

set of rules that must be followed by

each element that wants to interact with

such a system, that is, in essence, we will have

a system of some kind and with which we

will communicate Well, like no no, well,

for example, centralized structures are fences,

they are disadvantageous

when you have one giant

SkyNet chalt gptv and a billion people and

such a direction to them and in the early

stages of development of some kind of

Network, yes, in the early stages of development it

is Well, in principle, the Internet also started

there with three computers, so a few

bought and to them all the horses once

gradually it has already transformed into

a network that does not have a visible center and it is

much more efficient, but listen to us.

Even now, even with this approach, which is

currently available, when the models are contized, they

can anise on practically anything,

that is, you do not need to spend

a lot of energy resources,

computing resources there in order to

hurt these models, the same llama can be

launched at the smallest level, as if on what I

saw it was a raspberry P, it worked

for a long time, but it is not very cool, but, conditionally,

on the computer, it works there without any problems at all,

yes, but I say there is a problem in terms of what

problem is being solved now by the smallest

language models, Next

Talking prediction is modeled, and in fact this

gigantic Transformer is also

able to write code, make a bunch of all sorts of

funny jokes, and

these song covers

look like this whole structure is simulating

what one vertical column does.

I will say what I mean,

since there is a certain author, an aggressive

process, this is like generic talking, each

subsequent talking affects the

current context, which in turn

affects what the next token is

the most expected and its direction.

weights and in

these weights the principle baked into how

words affect the context and how the

context affects the expectation of some

next word.

So imagine this transformer as a whole

model as there is a certain surface is somewhere

objectively simple, I think an important

step will be the visualization of this space

and therefore this is what I've been

doing a lot lately. It seems to me that if you

just show people what the

space in which they think looks like, well,

visually, you can

wear VR and look at it like

when people first started

talking about satellites of planets in a non-abstract way.

theoretically, they can not only be

in the ground, but see the satellites of Jupiter with

their own eyes and see the ophigits or the rings of

Saturn there. And here, too,

this moment has not yet happened when we will make the

space of this abstract thinkable and

make it so objectively,


intuitively understandable for us through a certain

interface Well, for everyone, it will be just for

someone. Well, we are at this point. I have

been here. Here, I threw a hyperlink to this point to

my friend because of the

impressions and changes that

have taken place in this space over the last year. You can

see who else was here, what kind

meanings were brought here in the global

linguistic structure and so on.

And here is the whole transformer, if I give this

space with its topology, its

field gradients, paths, roads, and

so on, and the

transformer, since they solve this simple

problem, tsypanex Talking ration, it follows the

path of these minimum energy

losses along which trajectories And if we

are at some point in this

space in the context and

We very rarely find ourselves in a context in

which we do not want to go anywhere Well, that is,

any point in this space

means it tends to some other point or

from some other point, well in one word

such as the gradient of the field is not zero,

this is a question, maybe there are specific

coordinates in which the

configuration of the consciousness of the Buddha is located,

in which the ingredient of the field is zero.

in the Epicenter,

as long as you do not deviate from

this point of light, the gradient is

zero and you are not going anywhere. The

only question is if you come there with

inertia, how will you have it,

then the question arises when you communicate

with someone else, how to stand at one

point well, that is, to reach a common point

in order to have a common context. Well, if you

look at how people communicate,

most of all disputes arise

because people have a slightly different

context and they seem to be similar, but as a

result, we get a

simonetic song, even a problem.

exactly what is happening around everything, and this in

turn, since this is a problem, this in

turn affects the direction in which the

language evolves in the direction of

maximization and lineont between the local

eline, that is, between the agent that if two

agents are located in the

sematic space not far from each other the

language in

which each of the agents operates must

have such a set of linguistic structures,

operation options, transitions, tricks that

allow minimizing the probability of cities

on distancing between agents, because the place

leads to the divergence of the trajectory in

different views on the same things,

which in turn leads there to

certain problems and so on, which in

turn, for example, are the essence of a potential threat to the

life of the person himself if it is directly

cardinal there. So there is a lot of understanding and

so on, and evolution

forces language to leave in its structure

the elements of the word meanings that

will direct in this type of space to the

coordination of resonance and so on and doesn't

add Well, it doesn't add to invergence

That's right, but look at you, even if we

can somehow solve it for

one language, you have other languages ​​that at the

same time have a different somatic space and

thus you have the coordinates that

the agent will actually have who uses

one language in another language, they will be

different in relation to the meanings

that exist in this space, that is exactly what the

judge was talking about tonight until

four in the morning,

I tried to


the structure of a system that

basically behaves like a transformer,

i.e. Next Talking Generation is a

structure that has a certain current

context, this context is described by a set of

words, the

system tries to keep the number of

words that describe the current context to a

minimum sufficient to guarantee a

certain level of accuracy, but if possible,

if it is possible to reduce the number of words, then we reduce the number of

words, now the mouse clicks in

more ways than you can move

in space, the

task of this GPT was to model the outbuilding of

such a system, which, by means of Talking,

receives the input of the

update-contest, makes revolutionary attention

between the current words, and on the basis of

the redistribution of attention between words and the

associative distance of each word

relative to the

globally weighted optimum of such a

context. That is, we have a

set of words, these are a set of words. Together, they


coordinates in the symmetrical space where they would be among themselves,

if they were there, they would

be perfectly harmoniously balanced among themselves,

that is, the number of

symmetry groups would be maximal. The

current context is always

pushed to a certain Delta relative to this

optimal context and if we

will select a word from the entire dictionary, and the word in

this case we consider as a function of the

transition of some scroll in some direction

and choose this word that

maximally indicates the direction to this

point and the system issues this word as the

next expected

Input i.e. What change in the current

context will make

overall context of the available words is closer to

the car closer to the optimum it generates the

next talking it leads to a change in the

context this revolution continues and

gradually the context here increases and then

expands and you can see how certain

words will be displaced If you are interested Then you can

share a line for conversation there Well I will

say Yes it's wildly wildly

similar to how it happens, these are different,

at least in my mind. I think

the same in other people, and

when you think about a problem, you have

associative aspects that

were not much noticed at first, but they

have a very potential, great critical

influence on that Where is it effective to

move locally now, they suddenly take

more attention on themselves, this leads to a change there,

and we all this is so intuitively

non-verbalized in our heads by a render

due to the fact that we have millions of such

Transformers being injured in parallel,

that is, they all use the same one

on vertical columns,

they use a certain

communication protocol among themselves, which has its own

network properties, that is, if a large

number of elements interact with each other locally or

globally in accordance

with a certain protocol, properties of the entire structure

begin to appear that

were not there, or that are not present locally or

only within the limits of one agent but look,

what is the question here? How much with several

agents, that is, when you have your context of

your agent moving to wholesale,

then how to transfer this same context

to another agent is just a

fundamental problem. Why

can't people come to an agreement on some

obvious things when they communicate

between itself and It

seems that everything is the same, all the same

events, facts and so on, but people

perceive them differently because we

use different languages, all people

speak different languages ​​So here is exactly

the question from the fundamental fundamental

question How to summarize this whole matter to the

same space so that it would be possible

if different Agents could move to the

optimal state within the framework of this simply

here the question needs to be rephrased so it is

necessary for sure And to which exact To

what level can we sacrifice

accuracy and the modeling of the systematic

space so that the obtained approximation

is sufficiently accurate will to allow

several agents of the type to move together

on the same surface, but at the same time it

was not so detailed that the

obvious fact that two different

agents who do not live in the same body and

have a different experience, the sequence

of how they constructed a semantic map of the

world of at least one and the same

education, but each has a different model

in which the local differences between the

world-models of two agents and in such a

representation will not be

critical enough to lead to these

and the language in this direction is evolving,

that is, our words that we use

They are

they are always troids between a certain

abstraction we need to be specific about our phenomena, the meaning


sufficiency of specifics, if for some

context it is enough to say just a

dog, because it does not matter what

breed it is, is it my dog ​​or some others, and so

on, then for now we use the word

dog, but if we are talking about a specific

dog of our mutual acquaintance, then

if I just say dog ​​well. For this

context, you will be surprised. Why are you there?

Call me, don't call him and so on.

And it seems to me that the next breakthrough will be

at the stage.

in the space of one single agent yes

and now when Imagine there are 100

million such agents navigating

in the same space and

in the process of navigation they

are locally coordinated among themselves and the aggregates are

for this as Imagine that you are at

some point and now there is Charge GPT Oleg

or the model is an opportunity to find a way to the

desired point where you

send this pigeon like this, according to a

certain principle, it flies in a symmetrical

space along the way, you describe its

trajectory, you are like this Oh cool Well, plus

or minus it flew quite close to the

place where I although now imagine

another structure that is located at some

point. You have a million.

Each of them simply does not interact with all other

nanorobots, and you give me the task of finding

among all possible directions where

you can move. the

minimum number of steps with

preservation with the maximum preservation of

the context with the minimum probability

that you deviate to some

extraneous piece with the maximum

probability of the effective transfer of such a

trajectory to any other agent, a


superposition of all such possible agents is launched in parallel,

from them this most

optimal path is found and then only this

most approximate path to which the

total wave function has collapsed

is used by you as a solution to your problem,

and this is how I have such intelligence,

such intelligence allows you to find


the most effective way to realize what

you want for the minimum amount if we were

just talking about large

computer systems costs to make granite

Transformers I think that

it will be easy to injure such a thing And it seems to me that it is

just the opposite because


with a paradigm shift if we if if we

model such If we model

such a thing it is just like at the level of some kind of

cellular automaton where we have Greed

from elements. Each element is a big

GPT chat, which during each iteration

is different and with other

local interactions, such as there are a million

power plants in the world.


will change the approach to

how we model physical

systems in quantum physics, for example, when we

talk about the concept of wave functions, when

we talk about the concept of

superposition entanglement, but then we need quantum


not everything needs to be directly physically

rendered. Well, the type of mathematician

behind the

concept of

wave functions is she doesn't need

quantum computers. But in order to do that, it

seems to me that it doesn't matter what

conditionally it is possible to generate such There are a

million agents who will do

such a search for this shortest path, the

easiest path, then it seems to me that

for this you need some kind of hardware, it will be

different for sure

But to me it seems that the problem is not in the hardware, the

problem is

that we still haven't reached the stage of

understanding that

Hi is huge or whatever we like to

call it.

everyone uses it

and it is conquering the world there, it will be

like a level of abstraction above the Internet, where you

just type on top of ccpap and all

other protocols. Another

protocol appears, for example,

cryptocurrency appeared there. Well, this is the same protocol

that you can play connect and

also play it with others as long as you play

this game you can get

certain benefits from it, for example You can buy coffee there

just by sending some

messages to those who also play such a


all kinds of

Transformers and so on to decompress into a

set of the most simple rules

for the most universal

communication agent so that it is that it is

not fundamentally it is a computer, a person, a

mouse that has connected to the Internet, your

phone or whatever if it is able to

communicate according to some

minimum set of parameters,

it can to be connected to

the infrastructure to

get your own address space


to interact with any elements that

are also in this address space and with

each of your messages to increase the

total emergent information

saturation of the

global structure, i.e. any of your

questions, any requests, any of your

interactions the system makes the whole system

more Ever about the fact that in general such a

context happens that

such and such methods in general such needs

can arise have

studied the space local space as a

solution to these problems if there is

any effective solution then it

automatically becomes available now for 0

for 0 resources to all elements of the system and

Well, it seems to me And it will be a planetary


It will be cool. If there are still different

connectors there to other species, conditionally there are

dolphins and so on. Well, that is, there are species

that have dolphins there, which you. They already

have their own language, if

In principle, it somehow

exists and in this way it could also be

tried for encoding this general

space and look, most likely it is

just super something else but it would be interesting

or what agi The greatest value that

can be from it can be a

universal translator into the language of


in the plan I want to understand a star,

in what ways can a star

communicate about itself and what? Because when we

look at a star, we simply see a huge

amount of gas that, under the influence of gravity

and electromagnetism, emits

photons like these, well, the photons fly out. And

what do they tell me this means? But I think that

there is much more going on than

just meaning that is broadcasted and

emitted in these photons. Well,

they are not interpretable for most

people, and you think that with our help, with the

help of the space that consists of

human language, it will be possible to do the same with

n-code, the space is so different

because the English language is just one well,

that is, there is a reality

everything That everything that is can be and so on

it is in reality not given to us to

observe for victory observation for the

definition why because we are renderers on something

we are renderers on our physical

bodies these physical bodies have concrete

some properties They have certain

prerequisites where they can arise where

they cannot arise there and so on And

so on And so on And they already have their own

internal information, which means that

we are not able to perceive most things in the universe,

not through the distorted prism

of our bodies, of course because we have, in

principle, organs of sense through which

we perceive all this information and they are

quite obme, i.e. accordingly, to perceive

something is to translate it into our

internal language, to create some kind of

internal representation of something that exists in

reality when

clusters appear in reality which

internally model very similar

systems, then we are talking about the

emergence of some kind of world, that is, the elements,

despite the fact that

everyone has only their own internal

model, but they model something very

similar, which leads to the

emergence of some kind of local coherence between them,

as if what they model really

is because it now has an impact on what is

happening and

this is our desire to create artificial

intelligence. It seems to me that it

is driven by a fundamental search for

such a translation algorithm from any

language from any

with any modality to any another

modality that will guarantee the

minimum minimum necessary

distortion of information while preserving the

judge itself, because information is

not always important to us, every bit of

some information, if it is, if it is at

all compressible even a little bit, it is

not important for us to know every bad part, we

can ignore it, since we

observe that the world is not currently

in a state of

minimum entropy, where is everything

uncompressed Bell And we see that

entropy is increasing there, which means that there

is a concept of noise, it means that among this

noise information really floats somewhere

and so on, this means that we can a

large part of the information remove the

parentheses without losing the essence

Well, accordingly, they

begin to vaccinate it with language that

competes with each other in terms of what

percentage of

valuable information it aggregates and

how effectively it filters out noise,

but here again, look if we are

talking about stars, about white noise, and

so on then we detect it with the help of

various sensors that we create

accordingly in order to get some

additional information, that is, either we

need to

decode the one that we have already collected

and get some new meanings from it, or

we need new sensors in order to

get new information if this is

not enough,

well, that is, I think that the

language itself is not enough here, that here

it is necessary to

get information comprehensively,

not only from the

data that is available, but

to get the data somehow from crystals or other

channels of communication

is, in

other words to do experiments

in reality and not just to

close ourselves inside the model, digesting

what was already there, that is, for example,

we opened there without anchises, so

we added again, this is a new new

meaning that we added to the same language,

we increased our context space and

so in this way, if we can

fix it not only in the collider but, for

example, just in open space, we

will be able to get more information

due to the fact that we did an

experiment beforehand and got enough context about it, well, that

is, the idea is that language is quite

fundamental, but it is not

complete by itself, that

is, because Language is something that is between

Something. Well, this is what I think and what I am talking about. Well,

here is a question of the type, as a result,

the presence of language, by

its very existence, gives rise to the concept of a

communication agent, or the presence of

communication agents

leads to that that between them there

is something that we

can call a language, and here we

begin to dig into very

fundamental transcendental things.

And what is the quantum of communication? Well, what

is the smallest indivisible element

of what we call communication, bits, how

many if there is measured in bits

Information that you can have other

channels where everything is highlighted in a completely

different way This is where quantum

physics begins You have the concept of observing an


and so on from them the concept of

communication as

mrjent on the property of what

happens when you have several of

quantum systems

that try to localize their

indefinite minute function as but these

wave functions are already in a state of

some kind of interference between the elements.

Well, that's why

I would now advise everyone who deals with enamel to

refresh their knowledge of quantum physics,

quantum field theory,

and read about various m- theories

theories current and other things Why because of

what seems to me with the next Big Break True

will be at the junction of the sphere of the small module and

quantum physics

for the

mathematical apparatus that we developed

in order to explain to ourselves the results of the

experiments that we conduct there on

all kinds of accelerators that seem to be

completely disconnected from ordinary

mortal life there already so much that even

more so from some No. Well, like from Emelie

there, just in June, a bunch of people

analyze extremely these potabytes of

data that are collected there, and but the

mathematical apparatus that is developed there

for seemingly completely different tasks is

so much I from every day I am more and

more convinced of this, it is so

elegantly suitable for

general communication about how a navigator

in a language, what properties are there

and in what space we think, well, that’s


special very tight-lipped

speculations to find so many biections on what is

happening at the level of meaning if

we look very carefully and simply

debate the thinking process there is a cool book

demind design

there in the links for sure yes we will

add the author to all these books and leave a message

I advise everyone to read Well, the key thing is

that at the very beginning of the book, the author says

that we will simply use one

tool. It is

such a scalpel of attention. That is, you think

some thought,

focus your attention on which you see the thought as a

set of some transitions, that is,

not an absolutely rigidly quantized but

conditionally quantized process, it is actually a

smooth process, just when we

begin to describe We determine the

optimal method of quantization, because otherwise

we would have to

simply broadcast the

state with continuous songs, and this is not very robust in relation to noise,

yes. And with cryptograms, if we look at the

probabilistic approach and the description of everything with

the help of a wave function, we

can also see a lot In

fact, I want to advise everyone to advise those who. Well,

maybe if people, if you don’t remember or

didn’t have the opportunity to familiarize yourself with

quantum physics earlier, there is a cool book

called Mr. Thompson in Wonderland,

and it

was written by me. In the 1980s, he dismissed

Savka just before the

indefinite principle of Heisenberg was banned.

And he had all his works just for him,

and he allowed such types. Savka


to do

quantum physics in the States, and after all, he wrote

such a book conditionally for children, that is, for

Well for people who don't have mathematics, it's

not Hawking, you open the book there and

there are just pages with formulas,

it's an

explanation of quantum theory in such a playful way, and

in principle I advise everyone to read a very

cool book, it's quite

short, there are very cool analogies about

how a machine there jumps out of the garage itself

there is the probability of this and

things like that well thank you very much for the conversation Thank

you for being the first to agree to come to our podcast

Thank you for sharing such interesting


I think it will be very interesting to

listen to everyone and if you think about what

we we discussed here

Thank you all for listening to us

Made with