A blog about software and making.

Feature Extraction

Talk about how to turn raw data into features to use in learning and modelling and a short presentation on linear modelling.

  • Adding new features that describe how the data changes over time (ex: position/velocity/acceleration/jerk).
  • Figure out what variables are important to deduce dimensions. Fewer dimensions reduce error.
  • PCA - capture variance in a new vector that maximizes variance.
  • Choose components based on proportion of variance (How much variance does this data account for?)
  • PCA may make things worse! There may be too many relationships between variables and we don’t want to lose any.
  • Over-fitting - Using too much local data that doesn’t account for variance. The model becomes fitted to the data you are seeing instead of the relationships between variables.
  • The log function can be used to separate data.
  • Linear modelling - Fit a line to minimize the amount of error. Best if the error is normally distributed (most errors are zero).

Meetup Event

Expert Android Developers Speak

Presentations on Kotlin and using JNI for android development.

  • I really like Kotlin’s idea of separating nullable and non-nullable references. It seems similar to the option/maybe monad idea (apologies if you design type systems and languages).
  • The gist of the JNI presentation seemed to be that you really don’t want to use and the extra layers will probably make your code slower. It’s a way of using C/C++ libraries and not a way to increase performance.

Meetup Event

Coverage Is Not Strongly Correlated With Test Suite Effectiveness

No great surprises here. I think people’s time is better spent testing the common paths through their code and capturing important functional requirements. I found the ideas in Functional Core, Imperative Shell and There Are Only Two Roles of Code were helpful in figuring out how to spend my test writing budget.

We found that there is a low to moderate correlation between coverage and effectiveness when the number of tests in the suite is controlled for. In addition, we found that stronger forms of coverage do not provide greater insight into the effectiveness of the suite. Our results suggest that coverage, while useful for identifying under-tested parts of a program, should not be used as a quality target because it is not a good indicator of test suite effectiveness.

Study

Odds & Ends - December 2015

Thoughts, terms, and ideas I’ve come across over the last few months.

  • Tail call benefit - Callee is using the same stack space as caller which reduces memory pressure and makes it more likely that chunk of the stack will stay in the CPU cache.
  • Cores are black boxes that instructions enter and leave from in a sequential manner. Inside the black box instructions can run in any order.
  • Out of order execution
    • Fetch multiple instructions each cycle and decode into µ-ops.
    • µ-ops are put into the re-order buffer (ROB) where they can be processed out of order if data is ready.
    • To prevent pipeline stalls speculative execution runs all the conditional branches in parallel until the core figures out which branch to take.
  • Hyperthreading - Provides two virtual processors that share a reorder buffer. Provides the core with more data during general workloads to keep execution units busy.
  • Pipelines improve performance but can cause nightmares if they stall!
    • CPUs run best when instructions and data are in order.
    • Keep data in order, adjacent, and consecutive to prevent data stalls.
    • Don’t jump around because it won’t be able to predict where you want to go.
    • Take comparisons out of loops to prevent mispredicted branches. For { If { … } } => If { For { … } }
  • Think about the period of time when we are waiting for followers to acknowledge a write as your uncertainty window. As followers acknowledge the write we are increasingly certain that our write has been captured and eventually we can advance our uncertainty window.