If you use GROUP BY and ORDER BY as shown below, the query will take a long time (3000-5000ms), but if you delete either of them again, it will finish in about 40ms
.
The reason for using GROUP BY
is to avoid duplication of dataUsing DISTINCT
is very time consuming, so we try to use GROUP BY
.
SELECT "videos_video"."id"
FROM "videos_video"
LEFT OUTER JOIN "videos_video_tags" ON ("videos_video"."id" = "videos_video_tags"."video_id")
LEFT OUTER JOIN "videos_tag" ON ("videos_video_tags"."tag_id" = "videos_tag"."id")
WHERE ("videos_video"."is_public"
AND "videos_video"."published_at" <= '2021-12-20 03:22:05.731220+00:00'
AND ("videos_video"."title" &@~ 'word'
OR "videos_tag"."name" &@~ 'word')
AND "videos_video"."published_at" < '2021-12-13 11:30:00+00:00')
GROUP BY "videos_video"."id"
ORDER BY "videos_video"."published_at" DESC
LIMIT 20;
EXPLAIN
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=53084.58..53084.64 rows=21 width=24)
-> Sort (cost=53084.58..53091.26 rows=2671 width=24)
Sort Key: videos_video.published_at DESC
-> Group (cost=52744.50..53012.57 rows=2671 width=24)
Group Key: videos_video.id
-> Gather Merge (cost=52744.50..53007.00 rows=2226 width=24)
Workers Planned: 2
-> Group (cost=51744.48..51750.04 rows=1113 width=24)
Group Key: videos_video.id
-> Sort (cost=51744.48..51747.26 rows=1113 width=24)
Sort Key: videos_video.id
-> Hash Left Join (cost=27707.20..51688.16 rows=1113 width=24)
Hash Cond: (videos_video_tags.tag_id = videos_tag.id)
Filter: ((videos_video.title &@~ 'word'::character varying) OR (videos_tag.name &@~ 'word'::character varying))
-> Parallel Hash Left Join (cost=27485.37..49853.70 rows=556770 width=133)
Hash Cond: (videos_video.id = videos_video_tags.video_id)
-> Parallel Seq Scan on videos_video (cost=0.00..11396.61 rows=120572 width=117)
Filter: (is_public AND (published_at <= '2021-12-20 03:22:05.73122+00'::timestamp with time zone) AND (published_at < '2021-12-13 11:30:00+00'::timestamp with time zone))
-> Parallel Hash (cost=16712.83..16712.83 rows=557083 width=32)
-> Parallel Seq Scan on videos_video_tags (cost=0.00..16712.83 rows=557083 width=32)
-> Hash (cost=152.48..152.48 rows=5548 width=29)
-> Seq Scan on videos_tag (cost=0.00..152.48 rows=5548 width=29)
(22 rows)
Also, when I try to run EXPLAIN ANALYZE
, I get the following error message I can’t get the result.
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lostAttempting reset: Failed.
Is there something wrong with the SQL statement?
By removing either GROUP BY
or ORDER BY
, the index will be used.