Metabase tips and tricks.
flexible date bin'ed dashboards
Your consumers want to view the same data over different timeframes You want to build a dashboard once to prevent definition drifts.
E.g Total Payments by Day, Total Payments by Month, etc
We handle this with a few tricks that the data consumers understand without issue.
Helper Model
We defined a Helper Model with the time periods we offer for analytics.
We're lucky that we don't need super granular stats and can just offer down to hour
.
You could have custom periods but you will need to replace date_trunc
with date_bin
in the examples below and take care to read caveat 1.
SELECT
time_period
FROM (
VALUES
('hour'),
('day'),
('week'),
('month'),
('quarter'),
('year'),
) t(time_period)
Filtering the result set
In any query that requires the ability to date bin we use one of the following as part of the query definition
Use alongside date field filter
This works great for simple queries where you are filtering on a single date field as the date field filter has a bunch of useful settings like inclusive/exclusive end, ability to exclude days of the week etc that are often requested by our consumers
SELECT
date_trunc({{time_period}}, created_at)::date AS time_period,
SUM(amount)
FROM payments
WHERE
1=1
[[AND {{payment_created_at_field_filter}}]]
GROUP BY time_period
When you can't use the field filter
Metabase doesnt support date field filters on thee virtual tables created by CTE's, and there are times where you want to filter multiple tables to the same time range.
In those cases we add a number of periods
filter that defines the how far to look back over instead
SELECT
date_trunc({{time_period}}, created_at)::date as time_period,
sum(amount)
FROM payments
JOIN users on payments.user_id = users.id
WHERE
payments.created_at >= date_trunc({{time_period}}, NOW() - ({{num_periods}} || ' ' || {{time_period}})::interval)
AND
users.last_sign_in_at >= date_trunc({{time_period}}, NOW() - ({{num_periods}} || ' ' || {{time_period}})::interval)
GROUP BY time_period
Caveat 1: Quarter
date_trunc
works with quarter
but interval
doesnt.
It would be a lovely addition1 but for now we add a case statement that turns 1 quarter into 3 months. We're Ok with a slight inaccuracy here.
Use a similar technique in you have custom time_periods.
You need to do some math to make the multiplication understandable. e.g num_periods = 4
& time_period = '15 minutes'
you want to multiple by 15 before converting to an interval.
payments.created_at >= date_trunc(
{{time_period}},
CASE {{time_period}}
WHEN 'quarter' THEN
date_trunc('quarter', NOW()) - ((({{num_periods}})::integer * 3) || ' month')::interval
ELSE
NOW() - ({{num_periods}} || ' ' || {{time_period}})::interval
END
)
Caveat 2: Incomplete periods
We include data from the incomplete current period but we can change this by adding a less than version to complement each filter
SELECT
date_trunc({{time_period}}, created_at)::date AS time_period,
SUM(amount)
FROM payments
payments.created_at >= date_trunc({{time_period}}, NOW() - ({{num_periods}} || ' ' || {{time_period}})::interval)
AND payments.created_at < date_trunc({{time_period}}, NOW())
The majority of our data analysis is done on historic data, but of course you can:
- Flip the signs to only look forward
- Use it with between and
NOW() + num_periods
&NOW() - num_periods
to look equally in both directions - Define both
backward_periods
&forward_periods
for use in the above if they must be unique
Optional targets
You'll probably have a tracking board, daily or monthly that the leadership team love to watch.
They will also want to compare past periods to the current one without going to a new dashboard. Eg (looking at today vs the big 1 day sale last week)
Combining COALESCE
with optional filters and you can inject SQL defaults.
This gets around the fact that raw date filters can't be set to today
.
Something like this works
with filtered_payments as (/* some query */)
SELECT
COUNT(*)
FROM filtered_payments
WHERE
date_trunc('day', filtered_payments.created_at) = date_trunc('day', COALESCE( [[{{date_to_look_at}} ,]] NOW() ))
Percent function
To remove some error prone casting and coalesce'ing related to calculating percentages, especially over counts, we added a function to make it more readable
CREATE OR REPLACE FUNCTION public.percent(portion numeric, total numeric, round_level integer DEFAULT 1)
RETURNS numeric
LANGUAGE sql
IMMUTABLE PARALLEL SAFE STRICT
AS $function$
select
CASE
WHEN total = 0 THEN 0
ELSE ROUND(100 * (portion / total), round_level)
END
$function$
It replaces the following, while also handling counts that return 0
SELECT
ROUND(100 * cast(count(*) FILTER (where amount > 100) as float)/cast(count(*) as float)) AS old_big_payment_rate,
percent(
COUNT(*) FILTER (where amount > 100),
COUNT(*)
) AS new_big_payment_rate
FROM payments
Commonly accessed questions
We do periodic cleaning of the available dashboards; we'll archive things that havent been accessed in the last 6-12 months to keep the number of questions and dashboards down. The below will give you the commonly accessed queries, flipping or changing the order to last_execution will show you the ones that aren't often accessed. Use an Anti-Join to find the reports that haven't been run at all in the period you're checking.
with card_data as (
select
card_id,
COUNT(*) as execution_count,
MAX(started_at) as last_execution
from query_execution
where
started_at > NOW() - '12 months'::interval
GROUP BY 1
)
select
report_card.id,
report_card.name,
CONCAT('https://data.example.com/questions/', report_card.id),
card_data.execution_count,
card_data.last_execution
from report_card
join card_data on card_data.card_id = report_card.id
where
report_card.archived = false
ORDER BY card_data.execution_count DESC