Daniel Wakefield

Metabase tips and tricks.

Posted on

flexible date bin'ed dashboards

You want to avoid duplicating questions when your data consumers want different timeframe views over the data.

E.g Total Payments by Day, Total Payments by Month, etc

We handle this with a few tricks that we can apply easily that the data consumers understand without issue.

Helper Model

We defined a Helper Model with the time periods we offer for analytics. We're lucky that we don't need super granular stats and can just offer down to hour but you could use date_bin to support things like 15 minutes if needed.

  SELECT
    time_period
  FROM (
  VALUES
    ('hour'),
    ('day'),
    ('week'),
    ('month'),
    ('quarter'),
    ('year'),
  ) t(time_period)

Filtering the result set

In any query that requires the ability to date bin we use one of the following as part of the query definition

Use alongside date field filter

This works great for simple queries where you are filtering on a single date field as the date field filter has a bunch of useful settings like inclusive/exclusive end, ability to exclude days of the week etc that are often requested by our consumers

SELECT
  date_trunc({{time_period}}, created_at)::date AS time_period,
  SUM(amount)
FROM payments
WHERE
  1=1
  [[AND {{payment_created_at_field_filter}}]]
GROUP BY time_period

When you can't use the field filter

Metabase doesnt support date field filters on columns inside CTE's, and there are times where you want to filter multiple tables to the same time range.

In those cases we add a number of periods filter that defines the how far to look back over instead

SELECT
  date_trunc({{time_period}}, created_at)::date as time_period,
  sum(amount)
FROM payments
JOIN users on payments.user_id = users.id
WHERE
  payments.created_at >= date_trunc({{time_period}}, NOW() - ({{num_periods}} || ' ' || {{time_period}})::interval)
  AND
  users.last_sign_in_at >= date_trunc({{time_period}}, NOW() - ({{num_periods}} || ' ' || {{time_period}})::interval)
GROUP BY time_period
Caveat 1: Quarter

date_trunc works with quarter but interval doesnt. It would be a lovely addition1 but for now we add a case statement that we're happy with if the question actually requires this size of window. This is how you would handle using date_bin('15 minutes', ...) as well, you'd have to do the math in order to move back over the right number of the periods.

payments.created_at >= date_trunc(
  {{time_period}},
  CASE {{time_period}}
    WHEN 'quarter' THEN
      date_trunc('quarter', NOW()) - ((({{num_periods}})::integer * 3) || ' month')::interval
    ELSE
      NOW() - ({{num_periods}} || ' ' || {{time_period}})::interval
  END
)

Image

Caveat 2: Incomplete periods

We include data from the incomplete current period but we can change this by adding a less than version to complement each filter

SELECT
  date_trunc({{time_period}}, created_at)::date AS time_period,
  SUM(amount)
FROM payments
  payments.created_at >= date_trunc({{time_period}}, NOW() - ({{num_periods}} || ' ' || {{time_period}})::interval)
  AND payments.created_at < date_trunc({{time_period}}, NOW())

The majority of our data analysis is done on historic data, but of course you can flip the signs, or add both complements so the query both before and after NOW() if your data has a more leading than lagging profile.

Optional targets

You'll probably have a tracking board, daily or monthly that the leadership team love to watch. They will also want to compare past periods the current one without going to a new dashboard. Combining COALESCE with optional filters and you can inject SQL defaults.

This gets around the fact that raw date filters can't be set to today.

Something like this works

with filtered_payments as (/* some query */)

SELECT
  COUNT(*)
FROM filtered_payments
WHERE
  date_trunc('day', filtered_payments.created_at) = date_trunc('day', COALESCE( [[{{date_to_look_at}} ,]] NOW() ))

Percent function

To remove some error prone casting and coalesce'ing related to calculating percentages, especially over counts, we added a function to make it more readable

CREATE OR REPLACE FUNCTION public.percent(portion numeric, total numeric, round_level integer DEFAULT 1)
  RETURNS numeric
  LANGUAGE sql
  IMMUTABLE PARALLEL SAFE STRICT
AS $function$
    select
        CASE
        WHEN total = 0 THEN 0
        ELSE ROUND(100 * (portion / total), round_level)
        END
$function$

It replaces the following, while also handling counts that return 0

SELECT
  ROUND(100 * cast(count(*) FILTER (where amount > 100) as float)/cast(count(*) as float)) AS old_big_payment_rate,
  percent(
    COUNT(*) FILTER (where amount > 100),
    COUNT(*)
  ) AS new_big_payment_rate
FROM payments

Commonly accessed questions

We do periodic cleaning of the available dashboards; we'll archive things that havent been accessed in the last 6-12 months to keep the number of questions and dashboards down. The below will give you the commonly accessed queries, flipping or changing the order to last_execution will show you the ones that aren't often acessed. Use an Anti-Join to find the reports that haven't been run at all in the period you're checking.

with card_data as (
  select
    card_id,
    COUNT(*) as execution_count,
    MAX(started_at) as last_execution
  from query_execution
  where
    started_at > NOW() - '12 months'::interval
  GROUP BY 1
)

select
  report_card.id,
  report_card.name,
  CONCAT('https://data.example.com/questions/', report_card.id),
  card_data.execution_count,
  card_data.last_execution
from report_card
join card_data on card_data.card_id = report_card.id
where
  report_card.archived = false
ORDER BY card_data.execution_count DESC