David Forshner's Blog

M101P MongoDB for Developers

Good course on how to build APIs with python and mongoDB.
Does a good job explaining the sweet spot where document databases work well.
I’m not sure about the long-term viability of document databases. It seems like RDMS vendors are just going to adopt document database features.

Odds & Ends - September 2014

Thoughts, terms, and ideas I’ve come across over the last few months.

Errors are values.
DRY - Is having a single source of truth not avoiding copying and pasting code.
When you cross a service boundary things that appear to be the same may have a different context and different data store. Avoid sharing code between boundaries that have different contexts (Microservices).
Repository Pattern - Layer that exists between business logic and data store. Isolates your code that interacts with your data store in one place.
Decorator Pattern - Enabling a chain of behavior determined by composition not inheritance.
A bad test breaks in response to any change in production code without verifying correct behavior.
Map - Go over a large data set without mutating it.
Reduce - Aggregate or merge results
Scope - Where variables and functions are accessible and in what context they are being executed.

Closures - Expressions (usually functions) that work with variables set within a certain context. An inner function referring to local variables of its outer function.

function add(x) {
  return function(y) {
    return x + y; // When returned this closes over the value of x = 5
  }
}
var add5 = add(5);
console.log(add5(3)); // 8

Enumerable - Pull-based. The consumer pulls from the producer.
Observable - Push-based. The producer pushes new values to the consumer.
Functional Reactive Programming
- Properties - Values that change over time. Every property is a function f(t) that gives a value at a given moment in time.
- Functional - Compose together functions to create complex behavior. Functions can have time dependant relationships.
- Immutable - Values are something that happened in the past so they need not change.
- Event streams - Events at a particular point in time. Capture event changes in a discrete manner. Operators (map/filter/reduce) create new streams out of old streams (no mutation).
- Switching - Change the system in response to events. A stream of streams (meta-stream). Ex: stream URLs, map to requests, return responses in future stream of promises.
68-95-99.7 Rule - 68% of values at one standard deviation, 95% at two, and 99.7% of values at three.
Fit indexes in RAM if possible. Index on the hash of a string instead of the string itself.
Use cache as write-back to do batched database writes to the back-end.
Use locking to make sure that when the cache expires the database doesn’t get slammed with multiple copies of the same query.
You can decouple a sender and receiver with the command pattern and observer pattern but to decouple them in time you will want to use an event queue.
Keep data structures flat and linear (array > linked list). Every pointer you have to chase to find data adds a likely cache miss while flat arrays can be pre-fetched by the memory system.
- Adding/Removing values from the middle of std::vector is faster than std::list if the elements are POD types and no bigger than 64 bytes (one cache line). Lists have too many cache misses.
- For larger POD types, non-POD types, or if you already have a pointer into the list then std::list will win.

Google I/O Extended 2014

14-06-25

I attended the Google I/O Extended conference at Google’s KW location. The main theme this year seemed to be about getting the rest of the world online.

Event Notes

Currently there are 2.6 billion people on the internet, but there will be 5 billion in the future.
Free and open information access can lead to regime change.
+4% GDP due to internet access.
It’s not just search. We have to drive the information by making it relevant and fun.
Information access -> open communication -> drives change by allowing silent groups to have a voice.
~300 people at the KW office.
Most effective ad length on YouTube is 12 seconds maximum.
google.com/design - Unified design guidelines that are driven by material design.
Minimize interactions with the device by showing relevant data first. Use sensors to understand the user’s current context.
Maps Features: Fast, Accurate, Easy
One Android SDK for all platforms: TV, phone, car, watch, etc.
Chromebook Features: Speed, Simplicity, Security
Google Drive encrypts your data in both during transit and storage.
Google cloud
- Allows small teams to run big operations.
- Cloud.Debug allows live debugging on servers
- Request tracing to see all service requests
- Can set custom alerts on metrics
Cloud DataFlow
- Same code for both batch and streaming
- Parallel data pipelines.
- One pipeline for both batch (ETL) and streaming (continuous analysis).
Blink - Chromes rendering engine
Cross Platform Design
- Link
- Material Metaphor: Shared experience, Shared knowledge, back story for design, pre-defined information.
- Ex: Paper has a long history, exists in the world, is tangible.
- Magical Material - Not for the sake of artifice but for information. Ex: on touch the surface rises to the finger.
- One typeface with various sizes and weights.
- Use a color hierarchy to focus the gaze.
- Material lives at the same scale as the device and expands to fill space.
- Obey physics so no teleporting.

Odds & Ends - June 2014

14-06-01

Thoughts, terms, and ideas I’ve come across over the last few months.

There is no such thing as “plain text”. Text is binary data plus an encoding scheme.
- When someone says “plain text” always ask “what encoding?”
- Try to use UTF-8 if possible.
- Try to always make encoding explicit in code. Don’t rely on the platform/system default encoding. String text = convert(byte[] byteArray, Encoding.UTF);
- Glyph - Visual representation (a).
- Code Point - Numeric representation in character set being used (65).
- Code Unit - Binary representation determined by the encoding scheme (0100001).
Slow Database Table Scans - Query requests a row or range of rows and there are no available indexes to support it. This causes a sequential scan from to bottom of the table.
- Any routine query that takes more than second will be a problem eventually.
- A table scan might not be noticeable if the table is small enough to fit in memory or disk cache but as it grows it will increasingly be pulled from disk (slow). This also adds to the disk contention with any other running queries.
Database Concurrency Contention - Too many users competing for the same resources. Sometimes caused by table scans.
- Database locks up resources waiting to serve requests.
- Transactions lock row/tables for writes and other users cannot read from those rows.
- Transactions that span too many rows cause problems if a query has a table scan or update. Resources are locked for the duration of the transaction so try to limit it to under 1 second.
- There is a limited number of connections to the RDMS available and it’s possible to have them all blocked waiting on resources.
Slow Database Writes - As the table grows in size they often show a “hockey stick” curve for write speeds.
- The typical culprit are table indexes. This is especially true when there are multiple indexes on a single large table.
- B-Tree requires more computational and disk resources as they index tree grows in size.
- Limit indexes to what is required and use caution when data gets big.

M101P MongoDB for Developers

Odds & Ends - September 2014

Google I/O Extended 2014

Odds & Ends - June 2014

Blog Ingredients

Tag Cloud

Tags