Site icon Wellness Fox

Using GROUP BY and ORDER BY, the query takes a long time to execute

If you use GROUP BY and ORDER BY as shown below, the query will take a long time (3000-5000ms), but if you delete either of them again, it will finish in about 40ms.

The reason for using GROUP BY is to avoid duplication of dataUsing DISTINCT is very time consuming, so we try to use GROUP BY.

SELECT "videos_video"."id"
FROM "videos_video"
LEFT OUTER JOIN "videos_video_tags" ON ("videos_video"."id" = "videos_video_tags"."video_id")
LEFT OUTER JOIN "videos_tag" ON ("videos_video_tags"."tag_id" = "videos_tag"."id")
WHERE ("videos_video"."is_public"
       AND "videos_video"."published_at" <= '2021-12-20 03:22:05.731220+00:00'
       AND ("videos_video"."title" &@~ 'word'
            OR "videos_tag"."name" &@~ 'word')
       AND "videos_video"."published_at" < '2021-12-13 11:30:00+00:00')
GROUP BY "videos_video"."id"
ORDER BY "videos_video"."published_at" DESC
LIMIT 20;

EXPLAIN

                                                                                                          QUERY PLAN    
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=53084.58..53084.64 rows=21 width=24)
   ->  Sort  (cost=53084.58..53091.26 rows=2671 width=24)
         Sort Key: videos_video.published_at DESC
         ->  Group  (cost=52744.50..53012.57 rows=2671 width=24)
               Group Key: videos_video.id
               ->  Gather Merge  (cost=52744.50..53007.00 rows=2226 width=24)
                     Workers Planned: 2
                     ->  Group  (cost=51744.48..51750.04 rows=1113 width=24)
                           Group Key: videos_video.id
                           ->  Sort  (cost=51744.48..51747.26 rows=1113 width=24)
                                 Sort Key: videos_video.id
                                 ->  Hash Left Join  (cost=27707.20..51688.16 rows=1113 width=24)
                                       Hash Cond: (videos_video_tags.tag_id = videos_tag.id)
                                       Filter: ((videos_video.title &@~ 'word'::character varying) OR (videos_tag.name &@~ 'word'::character varying))
                                       ->  Parallel Hash Left Join  (cost=27485.37..49853.70 rows=556770 width=133)
                                             Hash Cond: (videos_video.id = videos_video_tags.video_id)
                                             ->  Parallel Seq Scan on videos_video  (cost=0.00..11396.61 rows=120572 width=117)
                                                   Filter: (is_public AND (published_at <= '2021-12-20 03:22:05.73122+00'::timestamp with time zone) AND (published_at < '2021-12-13 11:30:00+00'::timestamp with time zone))
                                             ->  Parallel Hash  (cost=16712.83..16712.83 rows=557083 width=32)
                                                   ->  Parallel Seq Scan on videos_video_tags  (cost=0.00..16712.83 rows=557083 width=32)
                                       ->  Hash  (cost=152.48..152.48 rows=5548 width=29)
                                             ->  Seq Scan on videos_tag  (cost=0.00..152.48 rows=5548 width=29)
(22 rows)

Also, when I try to run EXPLAIN ANALYZE, I get the following error message I can’t get the result.

server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lostAttempting reset: Failed.

Is there something wrong with the SQL statement?

By removing either GROUP BY or ORDER BY, the index will be used.

Exit mobile version