Digging into Postgresql and DEV
Earlier I was reviewing the draft Community Wellness badge pull request; and with my head deep in SQL these days, I thought I’d give a go at crafting a query to create this logic.
The following query finds the user IDs and weeks since today in which a user has written at least two comments that don’t have a negative moderator reaction.
-
user_id
: The user’s database ID. -
weeks_ago
: The number of weeks (since today) in which we’re grouping comments -
number_of_comments_with_positive_reaction
: How many positive reaction comments did they have for theweeks_ago
.
SELECT user_id,
COUNT(user_id) as number_of_comments_with_positive_reaction,
/* Get the number of weeks, since today for posts */
(trunc((extract(epoch FROM (current_timestamp- created_at))) / 604800)) AS weeks_ago
FROM comments
/* Only select comments from the last 32 weeks that
don't have a negative moderator reaction */
INNER JOIN
/* Find all comments in the last 32 weeks */
(SELECT DISTINCT reactable_id
FROM reactions
WHERE reactable_type = 'Comment'
AND created_at > (now() - interval '224' day)
/* Omit any comments that got a negative moderator reaction */
EXCEPT
SELECT DISTINCT reactable_id
FROM reactions
WHERE reactable_type = 'Comment'
AND created_at > (now() - interval '224' day)
AND category IN ('thumbsdown', 'vomit')) AS positve_reactions
ON comments.id = positve_reactions.reactable_id
INNER JOIN
/* Find the users who have at least two comments in the last week */
(SELECT count(id) AS number_of_comments,
user_id AS comment_counts_user_id
FROM comments
WHERE created_at >= (now() - interval '7' day)
GROUP BY user_id) AS comment_counts
ON comments.user_id = comment_counts_user_id
AND comment_counts.number_of_comments > 1
/* Don’t select anything older than 224 days ago, or 32 weeks ago */
WHERE created_at > (now() - interval '224' day)
GROUP BY user_id, weeks_ago
The above query creates multiple rows per user id. Which is fine, but if you want to loop through things, you’ll need to bust out some temporary variable magic.
I was wondering if I'd be able to get this down to one query. With the help of some folks at Forem, I wrote the following query aggregates that information for you; but you need to do some assembly work.
The columns are:
-
user_id
: The user’s database ID. -
serialized_weeks_ago
: A comma separated list of the weeks in which we had comments. -
weeks_ago_array
: An array of integers that is the non-string representation ofserialized_weeks_ago
; we want to see how ActiveRecord handles this array of integers. It's a the simpler version of theserialized_weeks_ago
. -
serialized_comment_counts
: A comma separated list of the number of comments.
The first number of the serialized_weeks_ago
maps to the first number of the serialized_comment_counts
. And you get one row per user.
SELECT user_id,
/* A comma separated string of "weeks_ago" */
array_to_string(array_agg(weeks_ago), ',') AS serialized_weeks_ago,
/* Will active record convert this to an array of integers? */
array_agg(weeks_ago) AS weeks_ago_array,
/* A comma separated string of comment counts. The first value in this string
happens on the week that is the first value in serialized_weeks_ago */
array_to_string(array_agg(number_of_comments_with_positive_reaction), ',') AS serialized_comment_counts
FROM
/* This is the same query as the first example query */
(SELECT user_id,
COUNT(user_id) as number_of_comments_with_positive_reaction,
/* Get the number of weeks, since today for posts */
(trunc((extract(epoch FROM (current_timestamp- created_at))) / 604800)) AS weeks_ago
FROM comments
/* Only select comments from the last 32 weeks that
don't have a negative moderator reaction */
INNER JOIN
/* Find all comments in the last 32 weeks */
(SELECT DISTINCT reactable_id
FROM reactions
WHERE reactable_type = 'Comment'
AND created_at > (now() - interval '224' day)
/* Omit any comments that got a negative moderator reaction */
EXCEPT
SELECT DISTINCT reactable_id
FROM reactions
WHERE reactable_type = 'Comment'
AND created_at > (now() - interval '224' day)
AND category IN ('thumbsdown', 'vomit')) AS positve_reactions
ON comments.id = positve_reactions.reactable_id
INNER JOIN
/* Find the users who have at least two comments in the last week */
(SELECT count(id) AS number_of_comments,
user_id AS comment_counts_user_id
FROM comments
WHERE created_at >= (now() - interval '7' day)
GROUP BY user_id) AS comment_counts
ON comments.user_id = comment_counts_user_id
AND comment_counts.number_of_comments > 1
/* Don’t select anything older than 224 days ago, or 32 weeks ago */
WHERE created_at > (now() - interval '224' day)
GROUP BY user_id, weeks_ago
) AS user_comment_counts_by_week GROUP BY user_id
I am eager to share these Postgresql approaches as they can help circumvent running lots of smaller queries. I also had the chance to pair up with two folks to make sure we wrote the correct logic and it was performant enough.