Browse Source

Starting to design closures, small refactor

master
Gil Mizrahi 9 months ago
parent
commit
7d5cd9c2a2
7 changed files with 148 additions and 32 deletions
  1. +1
    -0
      build.hs
  2. +51
    -10
      design.org
  3. +9
    -11
      src/gc.c
  4. +50
    -0
      src/types.c
  5. +26
    -3
      src/types.h
  6. +4
    -4
      src/utils.c
  7. +7
    -4
      todo.org

+ 1
- 0
build.hs View File

@@ -1,5 +1,5 @@
#!/usr/bin/env stack
-- stack --resolver lts-16.26 script --package shake --package bytestring --package cereal --package hspec --package process


-- initial version taken from https://shakebuild.com/manual


+ 51
- 10
design.org View File

@@ -56,18 +56,27 @@ such as booleans.
The heap contains all other kinds of objects. We differentiate between
two kinds of objects: boxed compound objects and byte arrays.

The heap object has two parts, the info of the heap object and the actual data.
The heap object has two parts, the ~info~ of the heap object and the actual ~data~.

The info uses 2 bytes to represent the following:
The info uses ~2 bytes~ to represent the following:

1. The first 14 bits represents the size of the object -
for byte arrays this is in bytes, for boxed compound objects this is
in 64-bit words.
2. when the 15 bit is set, the heap object represents a byte array,
otherwise it is an array of stack objects.
3. the LSB is used by the garbage collector to mark live data.
1. The LSB is used by the garbage collector to mark live data.
2. The next 3 bits represent the type of the object:
- 000 - an array of 64-bit stack objects. Then the next 12 bits represent the array size.
- 001 - a bytearray. Then the next 12 bits represent the numbers of bytes in the array.
- 010 - A closure. Then the next 6 bits represent the current number of applied arguments,
and the next 6 bits represent the total number of arguments.

Then the actual data is either an array of bytes or an array of stack objects.
| type | 12 bits | 3 bits | 1 bit |
|-----------------------+---------------------------+--------+-------|
| array of heap objects | array size (64-bit words) | 000 | GC |
| bytearray | array size (bytes) | 001 | GC |

| type | 6 bits | 6 bits | 3 bits | 1 bit |
|---------+----------------------+------------------------+--------+-------|
| closure | total number of args | current number of args | 010 | GC |

Then the actual data is represented after that accordingly.
*** Opcodes
Opcodes in BIP use 1 byte. Some opcodes will have data after them, some will not.

@@ -135,7 +144,39 @@ We separate gen0 (often called the nursery) and gen1 for a few reasons:
2. gen0 getting full is probably a decent indication to when we should
start garbage collecting.
*** Function calls and closures
TODO
We add another stack to our virtual machine. A call stack. The call stack will store locations we need to continue from after executing a function.

We add a few new instructions: ~CALL~, ~JMP~, ~RET~, ~APPLY~, ~CLOS~ and ~UNWRAP~.

- ~CALL~ - Adds the current IP (+1) to the call stack and changes the IP to <pos>
- ~JMP~ - Changes the ip to <pos> without changing the call stack
- ~RET~ - Pops the call stack and sets the IP that popped up as the new IP
- ~CLOS <num>~ - Takes an address (top of the stack) and creates a closure expecting <num> arguments
- ~APPLY~ - Add an argument to the closure on the top of the stack
- ~UNWRAP~ - Load off the arguments and the IP of a closure onto the stack

Some rules:

- The first item on the call stack should be and IP of a ~HALT~ instruction.
- The function should be responsible for cleaning the stack of its arguments.
- ~CLOS~ should also clean the stack of its closed arguments.
- In order to use tail call optimizations, compilers should identify tail positions and use JMP instead of CALL.

**** Closures
A heap object

***** heap info:

| 6 bits | 6 bits | 3 bits | 1 bit |
|----------------------+------------------------+-------------+-------|
| total number of args | current number of args | (tag) 010 | GC |

***** data:

| 8 bytes | N bytes |
|---------+---------|
| <IP> | <args> |

** Code structure
Currently we only have a few files in our project:



+ 9
- 11
src/gc.c View File

@@ -32,9 +32,8 @@ void gen0_gc(struct VM* vm) {

// sweep / copy
for (int i = 0; i < GEN0_SIZE; ++i) {
if (vm->gen0[i]->info & IS_MARKED_TAG) {
// clear is_marked - @TODO refactor
vm->gen0[i]->info = ((vm->gen0[i]->info >> 1) << 1);
if (is_gc_marked(vm->gen0[i])) {
clear_gc_marked(vm->gen0[i]);

if (vm->gen1p >= GEN1_SIZE) {
gen1_gc(vm);
@@ -70,7 +69,7 @@ void gen0_gc(struct VM* vm) {
void gen1_gc(struct VM* vm) {
for (int i = 0; i < GEN1_SIZE; ++i) {
// clear is_marked - @TODO refactor
vm->gen1[i]->info = ((vm->gen1[i]->info >> 1) << 1);
clear_gc_marked(vm->gen1[i]);
}
mark(vm);

@@ -88,7 +87,7 @@ void gen1_gc(struct VM* vm) {

// copy / sweep
for (int i = 0; i < GEN1_SIZE; ++i) {
if (vm->gen1[i]->info & IS_MARKED_TAG) {
if (is_gc_marked(vm->gen1[i])) {
gen1_temp[gen1p_temp++] = vm->gen1[i];
} else {
free(vm->gen1[i]);
@@ -120,12 +119,11 @@ void gen1_gc(struct VM* vm) {
// Recursively mark live heap objects
// We could probably find a faster way to do this but this is enough for now
void mark_pointer(HeapObject* obj) {
if (obj->info & IS_MARKED_TAG) {
if (is_gc_marked(obj)) {
return;
}
// set is_marked @TODO - rector
obj->info |= IS_MARKED_TAG;
if (obj->info & IS_BYTEARRAY_TAG) {
set_gc_marked(obj);
if (is_bytearray(obj)) {
return;
}
uint16_t size = getHeapInfoLogicalSize(obj->info);
@@ -133,7 +131,7 @@ void mark_pointer(HeapObject* obj) {
StackObject data;
for (uint16_t i = 0; i < size; ++i) {
data = *(StackObject*)(&(obj->data[i*sizeof(StackObject)]));
if (data.integer & IS_INTEGER_TAG) {
if (is_integer(data)) {
} else {
mark_pointer(data.pointer);
}
@@ -143,7 +141,7 @@ void mark_pointer(HeapObject* obj) {
void mark(struct VM* vm) {
// mark
for (int i = 0; i < vm->sp; ++i) {
if (vm->stack[i].integer & IS_INTEGER_TAG) {
if (is_integer(vm->stack[i])) {
continue;
} else {
// When we'll have heap object that can pointer to other objects, we'll chase these pointers.


+ 50
- 0
src/types.c View File

@@ -0,0 +1,50 @@
#include "types.h"

// Stack Objects //

bool is_integer(StackObject obj) {
return (obj.integer & INTEGER_TAG) > 0;
}
bool is_pointer(StackObject obj) {
return (obj.integer & POINTER_TAG) == 0;
}

// Heap Objects //

uint8_t get_heap_object_tag(HeapObject* obj) {
return (obj->info >> 1) & 0b111;
}

bool is_bytearray(HeapObject* obj) {
return (get_heap_object_tag(obj) & BYTEARRAY_TAG) > 0;
}
bool is_heap_array(HeapObject* obj) {
return (get_heap_object_tag(obj) & HEAPARRAY_TAG) > 0;
}
bool is_closure(HeapObject* obj) {
return (get_heap_object_tag(obj) & CLOSURE_TAG) > 0;
}

// Closures

uint8_t get_closure_argsize(HeapObject* obj) {
return obj->info >> 10;
}

uint8_t get_closure_applied(HeapObject* obj) {
return (obj->info >> 4) & 0b111111;
}

// GC

bool is_gc_marked(HeapObject* obj) {
return (obj->info & GC_MARKED_TAG) > 0;
}

void clear_gc_marked(HeapObject* obj) {
obj->info = (obj->info >> 1) << 1;
}

void set_gc_marked(HeapObject* obj) {
obj->info |= GC_MARKED_TAG;
}

+ 26
- 3
src/types.h View File

@@ -2,6 +2,7 @@
#define TYPES_H

#include <stdint.h>
#include <stdbool.h>

#define DEBUG 0
#define USE_ASSERTS 1
@@ -12,10 +13,15 @@
#define GEN1_SIZE 32
#define STACK_SIZE 1024

#define IS_MARKED_TAG 1
#define IS_BYTEARRAY_TAG 2
// For heap objects
#define GC_MARKED_TAG 0b1
#define HEAPARRAY_TAG 0b000
#define BYTEARRAY_TAG 0b001
#define CLOSURE_TAG 0b010

#define IS_INTEGER_TAG 1
// For stack objects
#define INTEGER_TAG 0b1
#define POINTER_TAG 0b0


typedef struct HeapObject {
@@ -46,4 +52,21 @@ struct VM {
struct HeapObject* temp_ptr0;
};

bool is_integer(StackObject obj);
bool is_pointer(StackObject obj);

uint8_t get_heap_object_tag(HeapObject* obj);
uint8_t get_stack_object_tag(StackObject* obj);

bool is_heap_array(HeapObject* obj);
bool is_bytearray(HeapObject* obj);
bool is_closure(HeapObject* obj);

uint8_t get_closure_argsize(HeapObject* obj);
uint8_t get_closure_appliednum(HeapObject* obj);

bool is_gc_marked(HeapObject* obj);
void clear_gc_marked(HeapObject* obj);
void set_gc_marked(HeapObject* obj);

#endif

+ 4
- 4
src/utils.c View File

@@ -10,7 +10,7 @@ void fprint_stack(FILE* fp, unsigned int size, StackObject* stack) {
}

void fprint_stackobj(FILE* fp, int verbosity, StackObject stack_obj) {
if (stack_obj.integer & IS_INTEGER_TAG) {
if (is_integer(stack_obj)) {
fprintf(fp, "%ld ", (stack_obj.integer >> 1));
} else {
if (verbosity) {
@@ -31,7 +31,7 @@ void fprint_heap(FILE* fp, unsigned int size, HeapObject** heap) {
}

void fprint_heapobj(FILE* fp, HeapObject* heap_obj) {
if (heap_obj->info & IS_BYTEARRAY_TAG) {
if (is_bytearray(heap_obj)) {
fprintf(fp, "[%.*s] ", getHeapObjectSize(heap_obj), ((char*)(heap_obj->data)));
} else {
uint16_t size = getHeapInfoLogicalSize(heap_obj->info);
@@ -51,7 +51,7 @@ uint16_t getHeapObjectSize(HeapObject* ptr) {
}

uint16_t getHeapInfoSizeInBytes(uint16_t info) {
if (info & IS_BYTEARRAY_TAG) {
if ((info >> 1) & BYTEARRAY_TAG) {
return (info >> 2);
} else {
return (info >> 2) * sizeof(StackObject);
@@ -59,7 +59,7 @@ uint16_t getHeapInfoSizeInBytes(uint16_t info) {
}

uint16_t getHeapInfoLogicalSize(uint16_t info) {
if (info & IS_BYTEARRAY_TAG) {
if ((info >> 1) & BYTEARRAY_TAG) {
return (info >> 2);
} else {
return (info >> 2);


+ 7
- 4
todo.org View File

@@ -81,14 +81,17 @@ I don't exactly know what I'm doing, but I have a rough idea. So please be patie
- New opcode: Index <index> will put the <index> element in the
heap object on the stack
**** DONE General design doc
**** NOW Add tests for Cons and Indexing
**** NOW Add Functions and Closures
- [ ] We need to change the way we represent heap object and add an additional type: closure.
- [ ] We need to add a few new instructions, ~CALL~, ~JMP~, ~RET~, ~CLOS~, ~APPLY~, ~UNWRAP~.
- [ ] We need to add functions and an entry point, a program is a list of functions instead.
*** Next
**** TODO Add tests for Cons and Indexing
We can represent lists, arrays and complex objects. Let's write some tests.
**** TODO Add tests for pointer chasing in GC
**** TODO Add Jumps and Conditional Jumps
**** TODO Writes some comments and documentation
Go over the code, find non-obvious things, move things around, write comments.
*** Next
**** TODO Add Jumps and Conditional Jumps
**** TODO Add Functions and Closures
**** TODO Add QuickCheck Tests
**** TODO Standard library functions
- int-to-str, concat, change print to only print strings


Loading…
Cancel
Save