Install the Software

To demonstrate the message of this book, we use the BareLogic tool.

What to Instakk First

essential	what	notes
✔	bash	e.g. inside VSCode, or in googlecode, or on a terminal in Linux or Mac, or the WSL for windows
✔	a good code editor	e.g. vscode, or nvim (but consider the merits of something very lightweight like micro)
✔	Python	version 3.13 (or later)
✔	Git
✔	Gawk	or awk, version 5 or later
❌	htop	or some other cpu minitor
❌	plylint	or some other static linter
❌	docco	or some other simple documentation generator

Install code and data sets

Now you need some code and sample data sets

In a new directory, chack out our test data

mkdir newDir # call it anything you like
cd newDir
git clone http://github.com/timm/moot
# do not change directories. Goto step 1

Go to https://github.com/timm/barelogic and click the Fork button.

Clone your fork to your local machine:

git clone https://github.com/timm/barelogic.git
cd barelogic

Fetch all branches and check out the v0.6 branch:

git fetch origin
git checkout -b v0.6 origin/v0.6

Create a new branch for your changes.
```
git checkout -b my-feature
```
Test your install (see below)
Now you can start working. Make your edits in this directory. Frequently, push your changes to your on-line repo
```
git add . # add waht ever is new
git commit -m "Describe your changes"
git push origin my-feature
```

Test your install

Are you running Python3?

python3 --version

You should see Python 3.13 (or higher).

Is your data in the right place?

cd barelogic/src
make stats

If this works, you should see something like this:

    x      y   rows
------ ------ ------
     3      1    197 ../../moot/optimize/config/wc+rs-3d-c4-obj1.csv
     3      1    197 ../../moot/optimize/config/wc+sol-3d-c4-obj1.csv
     3      1    197 ../../moot/optimize/config/wc+wc-3d-c4-obj1.csv
...

If this test fails, check you have installed gawk and that the data is in the right place; i.e. from the src directory, ../../moot/optimize

Is your active learner working?

cd barelogic/src
python3 -B bl.py --quick | column -s, -t

This will run an experiments. 30 times it will run the active learner using 8,16,20,30,40 samples, then statistically compare the results. Best results will be marked with an a; second best b. and so on.

#['rows'   'lo'     'x'   'y'   'ms'   'b4'        40          20          16          8           'name']
[398       '0.17'   4     3     5      'c 0.56 '   'a 0.17 '   'a 0.21 '   'a 0.24 '   'b 0.26 '   'auto93']

Notice that everything is "a" from 16 samples and up. This is to say that there was no win here above 16 sampples.

Is your tree learner working?

cd barelogic/src
python3 -B bl.py --tree

This test does the active learning (with 32 samples) then builds a tree from the labeled data.

auto93.csv
o{:mu1 0.556 :mu2 0.265 :sd1 0.162 :sd2 0.064}
 d2h  win    n
---- ---- ----
0.50   -4   32
0.43   13   27    Volume <= 350
0.39   25   19    |  Model >  80
0.30   45    2    |  |  origin == 2 ;
0.36   31   13    |  |  origin == 3
0.29   49    2    |  |  |  Volume <= 85 ;
0.37   28   11    |  |  |  Volume >  85
0.36   31    7    |  |  |  |  Volume <= 91
0.36   31    5    |  |  |  |  |  Model >  81 ;
0.36   30    2    |  |  |  |  |  Model <= 81 ;
0.39   23    4    |  |  |  |  Volume >  91
0.39   24    2    |  |  |  |  |  Volume <= 107 ;
0.40   21    2    |  |  |  |  |  Volume >  107 ;
0.52   -8    4    |  |  origin == 1
0.44   12    2    |  |  |  Volume >  232 ;
0.60  -28    2    |  |  |  Volume <= 232 ;
0.55  -15    8    |  Model <= 80
0.52   -8    6    |  |  Volume <= 119
0.43   13    3    |  |  |  Clndrs >  3 ;
0.60  -28    3    |  |  |  Clndrs <= 3 ;
0.64  -38    2    |  |  Volume >  119 ;
0.86  -92    5    Volume >  350
0.84  -88    2    |  Volume <= 440 ;
0.87  -95    3    |  Volume >  440 ;

Here:

n is the number of rows in each branch;
; denotes a leaf;
d2h is the distance of the mean score of the rows in each branch to an optimal zero point (so lower numbers are better)
win normalizes d2h as 100 - 100 * int(1 - (d2h-min)/(mu-min)) (so higher numbers are better and 100 is best).

Is your tree learner working, on all data sets?

cd barelogic/src
time make trees

This will run the active learner (with 32 samples), then the tree learner, on all data sets. On my machine (with 10 coy cores) this takes under a minute. The thing to check here is that there are no crashes.

How good are those trees?

This final test will launch 250 Python processes. So shut down everytime else you are doing before trying this. And it freezes your computer, just do a reboot.

cd barelogic/src
time make aftersReport

This takes five minutes to run n my 10 core machine. If it works, then it prints a little report showing how good are the trees learned from 2,30,40,50 samples (selected by active learning) at selecting for good examples in the unlabeled space.

In that second round sampling, if you all 20 additional samples then:

samples + additional 10 30 50  70  90
-------   ---------- -- -- -- --- ---
     50     20       67 94 98 100 100
     40     20       70 91 97 100 100
     30     20       71 89 96 100 100
     20     20       72 86 95 100 100

    256      1      -14 33 68  78  92
    128      1       22 56 71  85  97
     64      1       12 55 70  79  95
     32      1       15 43 56  76  95
     16      1      110 31 46  67  89
      8      1        7 15 28  38  58

This report says that (say) after 64 initial samples, the 1 more from the tree, these tries select for examples in the unlabeled space that are 70% (median) of the way to optimal. Which is pretty amazing.

Further, in terms of predicting future labels, there is little win after 64 initial samples.

Pull requests

If you do something really cool, or if you fix a bug in my code, I will ask you for a pull request

On GitHub, go to your fork and click "Compare & pull request".
Set the pull request’s base to timm/barelogic, branch v0.5, and compare it with YOUR_USERNAME/my-feature.
Add a descriptive title and message, then click "Create pull request".

Forgotten Bits: A Programmer's Guide to SE + AI Wisdom