Geo Carncross: how to program computers (kos)

this is a transcript of this video: Geo Carncross: How to program computers (kos)

Intro

Geo Carncross

* Big: 1-6TB lzma'd logfiles
  per day from all over the world!

* Fast: Deadline is 50msec

* 100% Uptime is normal for us.

Hello, my name is Geo and I work for a company called Telemetry. We are ad server. Basically web serving for money. We get paid to physically deliver video ad. So if our systems don't work correctly, we don't get paid. My aims are motivated by programs that work. We do a lot of programming and most of it's not in k but today I want you to look at k. Because today I want to talk about code bloat.

Our logfiles are money!

(we do other things. like figthing bad guys committing ad fraud,
 and everyone in my office is enjoying a beer right now...

 let me know if this is something you want to know more about)

---

* I'm part of the reason ads play fine, but your Youtube stutters

How to program computers - code bloat

* Give it a meaningful definition

* Don't be distracted by other,
  informal uses of "code bloat"

* Work the problem as it appears

I define code bloat strictly as redundant code inside a source tree. Features that nobody wants, and whether a compiler can identify and eliminate redundancy or other topics. Code bloat costs a lot. For one, it makes the source code big which makes binaries big and that makes them slow. But multiple programmers working on a large code base cannot know exactly what each other is doing and thinking, so there is bound to be some duplication of effort. So code bloat is obviously bad, but it might be inevitable. However it might not. I am going to tell you about kOS.

kos is a small team (4 people)
kos is tiny

* k is arthur's language; something like scheme+apl
* kdb is a database faster than popular dbms by a factor of 1000x
* z is a gui and window manager. icons, mouse and font
* kos kernel is modesetting. memory, vmm, isr and filesystem

kOS is a graphical operating system with dynamic window management.
A programming language called k.
A programmer's editor for k.
A file manager.
This presentation tool.
A database design for big memory queries involving trillion row datasets.

The binary image that is what boots is around 90 kilobytes and that does not use any {packers?} to achieve that. While this may not impressed everyone, it does not impressed (because) you didn't understand what I just said.

kOS is written in k, C and assembly. These lines here:

one system                  kos is tiny
all devices
                             * k is 300 lines of C across 5 files
                             * kdb adds only about 100 lines of C
                             * z adds another 60 lines of C
                             * kos kernel is 100 lines of code/assembly.

that I'm referring to fill my screen the way words on a book fill a page. This provides some significant advantages by making programs as readable as a book and it contributes to why kOS does not contain much redundancy.

I do not however have much time to go into k in general/in detail. But I say the interpreter is very simple and it uses basic table based dispatch. No fancy assembly dispatch or jumping or dynamic recompilation. Does not do any subroutine threading or any program optimisation. Doing a simple loop in k versus C, I would expect every C compiler to optimise this triviality.

kos is how to program computers

  for(i=0;i<1000000;++i)        i:1000000(1+)/1
  real  0m0.008s                real    0m0.191s
  user  0m0.003s                user    0m0.185s
  sys   0m0.002s                sys     0m0.003s

But k won't. Yet you've probably heard of k or kdb at some point, googling, maybe you're googling it now... There are benchmark hiding away in various places on stack and whatnot: kdb is notoriously fast.

     msec
k       1
sql  6400
sqlA 4200

 kdb: select last bid by symbol from quote where date=d,sym in S

sql
  select sym, last(bid order by time) from quote
  where date=d and sym in (select sym from S) group by sym

sqlA
  select sym,bid from
    (select sym,bid,row_number()
        over(partition by sym order by time desc)
        as row from quote where date=d and sym in(select sym from S)
    ) AS q where row=1

So the question comes up: "Why is k fast?". I have said that I think, we should be asking "why is X (or whatever x is) why is it so slow?". And I don't think this is a restatement just by inverting things to be clever. I think it's- I kind of honesty that we don't generally have as programmers about why is our program slow? Why is a program slow? Why is a large commercial relational database made by a company that begins with O slow? And I wish we had time to discuss it further.

* Unnecessary code is a bug and we try to remove it.

* Slow programs are the programmers responsibility.

* Bugs are mistakes we programmers make;
  they are not inevitable or ok.

* The programmer is responsible that the program is used
  correctly.

* "O" complexity ignores that computers are made of hardware

  * Arrays might be O(1) when small but O(log n) when bigger
  * B-trees are a terrible data structure for storing data
  * Hashtables aren't remotely close to O(1)

* "Best practices" are a straw man that make having
  real discussions about making systems difficult.

  * That includes syntax hilighting
  * ...  and object-oriented programming
  * ...  and what is "clean code"?
  * ...  and unit-tests
  * ...  and debian packages
  * ...  and puppet/chef/etc

* Programming correctly is hard!

* Can you write an algorithm shorter or faster than ...

  If so I'd like to talk to you: PUTITYYEUOP

But I want to show you some non-trivial k source code that I did not write. I want to show you how I produce code. And I want to show you how I minimise the amount of redundant code that I produce.

This

c::a$"\n";b::0,1+c;d::(#c),|/-':b;h:{c[x]&y+b x};i::x,j-b x:b'j;l:{h/0|x&d-1}
j::'|k;K:{k::2#0|x&*|c}:s::1&s|i-w-2;D:{s::0|x&d-w};px:{D s*w'x};Wx:{D s*4*x}
J:{K$[S;|\k[0],x;x]};lx:{J j+x};ux:{J l i x,0};hx:{J l 1*d'x};mx:{J l s*_x%F}
kx:kr:{U,:,(-\k[0],#x;'k_a};e[k]x};kb:{S[-/k;K j-1 D;];kx"\"
e:{a::?[a;x;y];K(*x)+#y};cz:{S[#u;e/ u;]};cc:{9'*k_a};cx:{kx cc'};cv:{kx@9''}
f:"\";O::aS,f;cf:{cg f::0'"\"};cg:{K(g,1#g)1+g'k}
t::{a x;q[x]|/7 4 4*~(k'x;2!(,/g)'x:^(j 9'a)?x:h\:/s+!:'1+w));q::n 9'a
A:(r::a::S[#x;1:x;,"\n"];n::x|    u::()};co:(A#0'"\"};cs:{n::S[#n;n;e'"\"]1:a;r::a}
A@'x;w::_W%F;x::(r-a)_"'",n;y::F'1.s;z::(2'(0;W;F's;F'd);1'0,t)

[presentation video was blurry, expect mistakes in the code transcription]

is Arthur's editor. So the syntax here is pretty simple, [audience laughs]. No, seriously 2+2 is four. -6 negates six. It's very easy. Semicolon separates things. This double colon thing that I want to look at first is interesting 'cause it creates a view. It's something that doesn't exist in a lot of programming languages. I am not actually aware of any programming languages that actually exists and it's not like an SQL view. It's actually kinda like it what Excel does. It's a memoized function, I guess... that keeps track of its dependencies automatically. So this expression right here only gets reevaluated whenever whatever dependent variables change. [explains Arthur's e.k editor].

As we can see k is a very dense language and there is a lot here. When I first program, I start with something that I want to do, the first thing that I did here in k -my first day programming, learning k- was to deal with the fact that my Mac does not have any home or end keys on it. I use Emacs so I'll actually add ctrl-a and ctrl-e. Now to do this, let's look at the documentation for the windowing interface.

[reads the documentation and writes the implementation.]

what else?

So, is code bloat inevitable? Code written like this definitely violates what a generally considered "best practices" everywhere. One can write a tiny hello world that runs fast and without bugs. And it's generally accepted that this doesn't extend to larger programs. I hope that I've demonstrated that this is false.

kOS is not ready for users. I don't know when it will be and I don't really want to talk about that. But I do want to talk about programming and specifically how to program and I hope that by giving you this view of kOS we can have a conversation about how to program. That's a little different. There turns out to be a lot of usefulness that follows from this idea that I'm pointing to here... whatever this is [shows the screen running kOS]. And I am certain that because it's so useful that this maybe evidence that we as a human race don't actually understand programming very well. It's because of that the term best practices so suspicious to me, it's actually banned where I work, the term. I mean you have to justify even if everybody else is doing it.

Just some final notes, I get asked about something called kona that maybe is easier to find on the internet. It is an open source re-implementation of an old version of k. I don't know very much about it. I know that it's bigger and slower than k. I know that it does not contain kdb. And so probably does not represent k very well if you get it looking to learn k, this is not going to help. If you want to try k, you can just download the 32bit version kdb from kx's website. And just push backslash \ to get from q to k. And I've covered about as much as I can in the time that has been available to me. So if anybody wants to talk about programming, get in touch. Last four lines by the way,

n:0;p:0:"p.md";co:{p::0:0'""}
c::(&"#"=*:'p),#p;a::*(1 0+c[n,n+1])_p;x::($n),p c n
z::{1'((x**F),0;a[x])}'!#a;
lx:{2'(0;W;7);n::|/0,&/(-2+#c),n+x}

are the source code to the presentation tool that you are looking at there [audience slightly laughs]. Thank you [applause].