Thoughts, terms, and ideas I’ve come across over the last few months.
Errors are values.
DRY - Is having a single source of truth not avoiding copying and pasting code.
When you cross a service boundary things that appear to be the same may have a different context and different data store. Avoid sharing code between boundaries that have different contexts (Microservices).
Repository Pattern - Layer that exists between business logic and data store. Isolates your code that interacts with your data store in one place.
Decorator Pattern - Enabling a chain of behavior determined by composition not inheritance.
A bad test breaks in response to any change in production code without verifying correct behavior.
Map - Go over a large data set without mutating it.
Reduce - Aggregate or merge results
Scope - Where variables and functions are accessible and in what context they are being executed.
Closures - Expressions (usually functions) that work with variables set within a certain context. An inner function referring to local variables of its outer function.
1 2 3 4 5 6 7
functionadd(x) { returnfunction(y) { return x + y; // When returned this closes over the value of x = 5 } } var add5 = add(5); console.log(add5(3)); // 8
Enumerable - Pull-based. The consumer pulls from the producer.
Observable - Push-based. The producer pushes new values to the consumer.
Functional Reactive Programming
Properties - Values that change over time. Every property is a function f(t) that gives a value at a given moment in time.
Functional - Compose together functions to create complex behavior. Functions can have time dependant relationships.
Immutable - Values are something that happened in the past so they need not change.
Event streams - Events at a particular point in time. Capture event changes in a discrete manner. Operators (map/filter/reduce) create new streams out of old streams (no mutation).
Switching - Change the system in response to events. A stream of streams (meta-stream). Ex: stream URLs, map to requests, return responses in future stream of promises.
68-95-99.7 Rule - 68% of values at one standard deviation, 95% at two, and 99.7% of values at three.
Fit indexes in RAM if possible. Index on the hash of a string instead of the string itself.
Use cache as write-back to do batched database writes to the back-end.
Use locking to make sure that when the cache expires the database doesn’t get slammed with multiple copies of the same query.
You can decouple a sender and receiver with the command pattern and observer pattern but to decouple them in time you will want to use an event queue.
Keep data structures flat and linear (array > linked list). Every pointer you have to chase to find data adds a likely cache miss while flat arrays can be pre-fetched by the memory system.
Adding/Removing values from the middle of std::vector is faster than std::list if the elements are POD types and no bigger than 64 bytes (one cache line). Lists have too many cache misses.
For larger POD types, non-POD types, or if you already have a pointer into the list then std::list will win.
I attended the Google I/O Extended conference at Google’s KW location. The main theme this year seemed to be about getting the rest of the world online.
Event Notes
Currently there are 2.6 billion people on the internet, but there will be 5 billion in the future.
Free and open information access can lead to regime change.
+4% GDP due to internet access.
It’s not just search. We have to drive the information by making it relevant and fun.
Information access -> open communication -> drives change by allowing silent groups to have a voice.
~300 people at the KW office.
Most effective ad length on YouTube is 12 seconds maximum.
google.com/design - Unified design guidelines that are driven by material design.
Minimize interactions with the device by showing relevant data first. Use sensors to understand the user’s current context.
Maps Features: Fast, Accurate, Easy
One Android SDK for all platforms: TV, phone, car, watch, etc.
Chromebook Features: Speed, Simplicity, Security
Google Drive encrypts your data in both during transit and storage.
Google cloud
Allows small teams to run big operations.
Cloud.Debug allows live debugging on servers
Request tracing to see all service requests
Can set custom alerts on metrics
Cloud DataFlow
Same code for both batch and streaming
Parallel data pipelines.
One pipeline for both batch (ETL) and streaming (continuous analysis).
Thoughts, terms, and ideas I’ve come across over the last few months.
There is no such thing as “plain text”. Text is binary data plus an encoding scheme.
When someone says “plain text” always ask “what encoding?”
Try to use UTF-8 if possible.
Try to always make encoding explicit in code. Don’t rely on the platform/system default encoding. String text = convert(byte[] byteArray, Encoding.UTF);
Glyph - Visual representation (a).
Code Point - Numeric representation in character set being used (65).
Code Unit - Binary representation determined by the encoding scheme (0100001).
Slow Database Table Scans - Query requests a row or range of rows and there are no available indexes to support it. This causes a sequential scan from to bottom of the table.
Any routine query that takes more than second will be a problem eventually.
A table scan might not be noticeable if the table is small enough to fit in memory or disk cache but as it grows it will increasingly be pulled from disk (slow). This also adds to the disk contention with any other running queries.
Database Concurrency Contention - Too many users competing for the same resources. Sometimes caused by table scans.
Database locks up resources waiting to serve requests.
Transactions lock row/tables for writes and other users cannot read from those rows.
Transactions that span too many rows cause problems if a query has a table scan or update. Resources are locked for the duration of the transaction so try to limit it to under 1 second.
There is a limited number of connections to the RDMS available and it’s possible to have them all blocked waiting on resources.
Slow Database Writes - As the table grows in size they often show a “hockey stick” curve for write speeds.
The typical culprit are table indexes. This is especially true when there are multiple indexes on a single large table.
B-Tree requires more computational and disk resources as they index tree grows in size.
Limit indexes to what is required and use caution when data gets big.