Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

(Timescale person here)

Just an observation that the parent post was the unordered "set" nature of tables, and not strictly about ASOF joins.

I should point out there is an alternative reason for this in TimescaleDB (and many other databases): to support loose-time-order writes (or even data backfill), where records don't always arrive in perfect timestamp order.

So some of these engineering decisions are to balance tradeoffs across insert vs. query rates.

In TimescaleDB, for example, we write data to time intervals in arrival order, not timestamp order, in order to achieve higher writes rates. On the other hand, we support the automated asynchronous rewriting of data to reorder data after it reaches a certain age, according to arbitrary columns.

This can be used to reorder data precisely in timestamp order. It's also highly useful and commonly used for composite reorderings, such that you can then reorder on (some_id, timestamp DESC), such that, within a time interval (chunk in TimescaleDB), data belonging to each some_id is collocated on disk, and within each id, it's sorted on disk again by timestamp. (Here, some_id might be a unique hostname, device_id, metric_id, stock ticker symbol, etc.)

These allow you to take much faster queries for SELECT * from table WHERE some_id = foo and time > X and time < Y.

We also take a similar approach when performing asynchronous columnar-style compression, where we "segment" together many records belonging to the same id into a single internal row, then order those records in array style within each row, sorted by some column (typically timestamp) before applying a type-specific compression algorithm to then. Again, all metrics then belonging to some_id are collocated, ordered by timestamp as an array on out-of-line pages so that only the requested columns need to be fetched from disk. And this gets hidden behind a SQL view that makes it look like you are still interacting with a standard row-based table.

Anyway, totally appreciate that we haven't prioritized ASOF JOINs, as our experience is that these typically come up in more specific financial use cases, rather than the vast majority of time-series use cases.

And we also find that our "gapfill/LOCF" functionality can be used to bucket-aggregate data with different ids for related behavior. That is, it's a data modeling choice to store different ticker symbols in different tables (and then ASOF JOIN), and not necessarily the only way to achieve the desired goals.

https://docs.timescale.com/latest/api#add_reorder_policy https://docs.timescale.com/latest/using-timescaledb/compress... https://docs.timescale.com/latest/api#time_bucket_gapfill

And thanks clatonjy for your use!



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: