20240426
Last updated
Last updated
These are my notes from PGDay Chicago 2024! They may or may not be useful for you! They are here mostly as a reference for me. :)
Time dimensions: This is something beyond common sense in the real life, however, we need them
GiST has exclusion constraints - solves the temporal primary key problem for periods
Fully temporal queries are still tricky (we don't know how to write them!)
The refinements of manipulation semantics
INSERT
UPDATE
CORRECTION
INACTIVATE
DELETE
Effective vs assertive time: our knowledge of what will be (effective), where that knowledge is current as of a certain time (assertive)
"I am little bit like electron, everywhere and nowhere"
In real life, there are very few objects that are constantly changing. Storage costs correlate with changes that occur, as is natural.
EXPLAIN output illustrates the execution of the query
Deepest elements of the query are executed first, naturally - see the deeper elements of the returned tree to understand the formation of the cost total
width=20 : bytes consumed cost=0.0..1.0 : startup cost, total cost time=0.0..1.0 : startup time, total time loops=2 : total number of executions of the node
cost is figurative, an arbtirary unit. bigger number means it's harder to run. don't compare cost between queries; cost scale is only consistent within EXPLAIN results for a single query
seq scan is a literal sequence scan, one row at a time. can sometimes be cheaper than index scans.
bitmap index scan involves two indexes, choosing the result of a boolean operation between two index scans (could be the same index twice, or two different indexes)
nested loop: for each element from a scan, do another scan based on that element
hash join: creates an in-memory hash table from the results of another scan, then scans the keys of that hash table
merge join: sorts two data sets, merges them
(aside: "LLibran"? the "ll_" from Hettie's talk?)
tools
pgadmin
explain.depesz.com - expert
explain.dalibo.com - visual tree thing, send to Matt
pgmustard - beginner-friendly
google this: "explaining the unexplainable" (depesz or something)
EXPLAIN ANALYZE, EXPLAIN BUFFERS (woah what is buffers)
hypopg: "what would be the estimated cost of this query, if this index existed?"
index selection should account for write overhead, not just query efficiency
"costs are arbtirary units. they do not represent milliseconds or any other unit of time. instead, they are anchored to a single read of a sequential page, which costs 1.0 unit." -Tadeas Petak, paraphrasing pg docs
suggestion: EXPLAIN all table queries, collect all recurring scans
Index Write Overhead = the estimated size of an index write (in bytes), based on the index definition, divided by the size of the average table row
Optimization: (1) Find a good solution to a problem; (2) How? Heuristics; Exact methods (MIP, CP, etc.)
whitepaper: A Constraint Programming Approach for Index Selection in Postgres
1. Minimize the costs, resulting in x 2. New rule: the cost cannot be higher than x + 10% (tolerance!) 3. Maximize the performance, while respecting cost + tolerance
INSERT INTO test SELECT *, 'testdatatestdata' FROM generate_series(1, 100000);
EXPLAIN (GENERIC_PLAN) SELECT * FROM test WHERE data = $1;
remember HOT updates are an option, but only for unindexed data
Harvard University Implicit Bias test
"You can't compete with me — I want you to win, too."
hubspot: 90% of women suffer from imposter syndrome
Deprogramming: Delete, Delegate, Delay, Do
allyship: use one's power, position, or privilege to uplift others
advocacy, mentorship, amplification
"I want to be a part of the people that make meaning"
w3.org/International/questions/qa-personal-names
linuxmafia.com/kb/Essays/marriage.html
"this is what happens if you think you can model famliies with a database schema" -- see: sony family plans
ultimately, the takeaway is to only include operationally necessary elements in your data model, and only require what is strictly necessary for operation
Under-model. Under-collect. Over-communicate. Focus on the core need.
DOB is very discoverable btw - not useful for authentication
extensions used in toast recovery: pageinspect, pg_visibility, pg_surgery
pgmustard.com/docs/explain
UNION, EXCEPT, and INTERSECT default to being unique. add ALL to get everything.
things Isaac doesn't know:
window functions
ROLLUP()
MATERIALIZED
can I use MEMOIZE to replace the loose index scans that the Mechanic shop scheduler is using?
a "project" you do is pronounced differently than your intent to "project"
"tid" and "ctid" refer to toast ID
TRUNCATE is a DDL, not a DML, and as such isn't supported by EXPLAIN
custom scan providers
HNSW for pgvector
money_type is text:
postgres queries support hints??????
create index (user_id, id, money_type) - Index Scan Backward
create index (user_id) INClUDE (id, money_type)
Hugging Face: open-source ai models
you need to choose the right model for your use-case
don't be afraid. just learn, and enjoy your journey. I scratch my head, but we all do for our day-to-day thing. AI is no big thing. we can hack it.
JSON_TABLE in pg17
mechanic: time to start storing task results as a json array of objects, probably
(years ago) I chose postgres - I was a rebel.
today, I am proud to use postgres. I am proud to be a rebel.
in memoriam: Simon Riggs
"when analyzing satellite data, he discovered that he had a passion for databases"
keynote at pgconf europe 2023: "PostgreSQL: The Next 20 Years"