20240426

These are my notes from PGDay Chicago 2024! They may or may not be useful for you! They are here mostly as a reference for me. :)

Hettie

Time dimensions: This is something beyond common sense in the real life, however, we need them

GiST has exclusion constraints - solves the temporal primary key problem for periods

Fully temporal queries are still tricky (we don't know how to write them!)

The refinements of manipulation semantics

  • INSERT

  • UPDATE

  • CORRECTION

  • INACTIVATE

  • DELETE

Effective vs assertive time: our knowledge of what will be (effective), where that knowledge is current as of a certain time (assertive)

"I am little bit like electron, everywhere and nowhere"

In real life, there are very few objects that are constantly changing. Storage costs correlate with changes that occur, as is natural.

Drako

EXPLAIN output illustrates the execution of the query

Deepest elements of the query are executed first, naturally - see the deeper elements of the returned tree to understand the formation of the cost total

width=20 : bytes consumed cost=0.0..1.0 : startup cost, total cost time=0.0..1.0 : startup time, total time loops=2 : total number of executions of the node

cost is figurative, an arbtirary unit. bigger number means it's harder to run. don't compare cost between queries; cost scale is only consistent within EXPLAIN results for a single query

seq scan is a literal sequence scan, one row at a time. can sometimes be cheaper than index scans.

bitmap index scan involves two indexes, choosing the result of a boolean operation between two index scans (could be the same index twice, or two different indexes)

nested loop: for each element from a scan, do another scan based on that element

hash join: creates an in-memory hash table from the results of another scan, then scans the keys of that hash table

merge join: sorts two data sets, merges them

(aside: "LLibran"? the "ll_" from Hettie's talk?)

tools

  • pgadmin

  • explain.depesz.com - expert

  • explain.dalibo.com - visual tree thing, send to Matt

  • pgmustard - beginner-friendly

google this: "explaining the unexplainable" (depesz or something)

Lukas

EXPLAIN ANALYZE, EXPLAIN BUFFERS (woah what is buffers)

hypopg: "what would be the estimated cost of this query, if this index existed?"

index selection should account for write overhead, not just query efficiency

"costs are arbtirary units. they do not represent milliseconds or any other unit of time. instead, they are anchored to a single read of a sequential page, which costs 1.0 unit." -Tadeas Petak, paraphrasing pg docs

suggestion: EXPLAIN all table queries, collect all recurring scans

Index Write Overhead = the estimated size of an index write (in bytes), based on the index definition, divided by the size of the average table row

Optimization: (1) Find a good solution to a problem; (2) How? Heuristics; Exact methods (MIP, CP, etc.)

whitepaper: A Constraint Programming Approach for Index Selection in Postgres

1. Minimize the costs, resulting in x 2. New rule: the cost cannot be higher than x + 10% (tolerance!) 3. Maximize the performance, while respecting cost + tolerance

INSERT INTO test SELECT *, 'testdatatestdata' FROM generate_series(1, 100000);

EXPLAIN (GENERIC_PLAN) SELECT * FROM test WHERE data = $1;

remember HOT updates are an option, but only for unindexed data

Karen, Stacey

Harvard University Implicit Bias test

"You can't compete with me — I want you to win, too."

hubspot: 90% of women suffer from imposter syndrome

Deprogramming: Delete, Delegate, Delay, Do

allyship: use one's power, position, or privilege to uplift others

advocacy, mentorship, amplification

"I want to be a part of the people that make meaning"

Christophe

w3.org/International/questions/qa-personal-names

linuxmafia.com/kb/Essays/marriage.html

"this is what happens if you think you can model famliies with a database schema" -- see: sony family plans

ultimately, the takeaway is to only include operationally necessary elements in your data model, and only require what is strictly necessary for operation

Under-model. Under-collect. Over-communicate. Focus on the core need.

DOB is very discoverable btw - not useful for authentication

Derk

extensions used in toast recovery: pageinspect, pg_visibility, pg_surgery

Bruce Momjian

pgmustard.com/docs/explain

UNION, EXCEPT, and INTERSECT default to being unique. add ALL to get everything.

things Isaac doesn't know:

  • window functions

  • ROLLUP()

  • MATERIALIZED

  • can I use MEMOIZE to replace the loose index scans that the Mechanic shop scheduler is using?

  • a "project" you do is pronounced differently than your intent to "project"

  • "tid" and "ctid" refer to toast ID

  • TRUNCATE is a DDL, not a DML, and as such isn't supported by EXPLAIN

  • custom scan providers

Jonathan

HNSW for pgvector

Shane

money_type is text:

  • postgres queries support hints??????

  • create index (user_id, id, money_type) - Index Scan Backward

  • create index (user_id) INClUDE (id, money_type)

Rekha

Hugging Face: open-source ai models

you need to choose the right model for your use-case

don't be afraid. just learn, and enjoy your journey. I scratch my head, but we all do for our day-to-day thing. AI is no big thing. we can hack it.

David

JSON_TABLE in pg17

mechanic: time to start storing task results as a json array of objects, probably

Drako

(years ago) I chose postgres - I was a rebel.

today, I am proud to use postgres. I am proud to be a rebel.

Jimmy

in memoriam: Simon Riggs

"when analyzing satellite data, he discovered that he had a passion for databases"

keynote at pgconf europe 2023: "PostgreSQL: The Next 20 Years"

Last updated