I have been working with pg_stat_statements extension to PG and it give us a way to see the actual SQL statements being executed by lemmy_server and the number of times they are being called.

This has less overhead than cranking up logging and several cloud computing services enable it by default (example) - so I don’t believe it will have a significant slow down of the server.

A DATABASE RESTART WILL BE REQUIRED

It does require that PostgreSQL be restarted. Which can take 10 or 15 seconds, typically.

Debian / Ubuntu install steps

https://pganalyze.com/docs/install/self_managed/02_enable_pg_stat_statements_deb

Following the conventions of “Lemmy from Scratch” server install commands:

sudo -iu postgres psql -c "ALTER SYSTEM SET shared_preload_libraries = 'pg_stat_statements';"

Followed by a restart of the PostgreSQL service.

  • RoundSparrow@lemmy.mlOPM
    link
    fedilink
    arrow-up
    2
    ·
    edit-2
    1 year ago

    Update July 24, 2023

    Some major stored procedure SQL problems were overlooked until 2 days ago. I submitted revised statements to fix a massive write operation on every single post and comment creation on a local site. The site_aggregates table… every row modified instead of a single row. https://github.com/LemmyNet/lemmy/pull/3704

    I was curious why pg_stat_statements didn’t draw more attention to the INSERT statements hitting so many rows, and I found out that by default it does not take into account stored procedure execution! https://stackoverflow.com/questions/56741860/pg-stat-activity-how-to-see-current-activity-inside-a-running-stored-procedure

    pg_stat_statements.track = all
    
    

    Now I’m seeing far more activity than I have been looking at for the past 2 months… the stored procedure statements are showing up!

    Install steps: https://gist.github.com/rcanepa/535163dc249539912c25

  • RoundSparrow@lemmy.mlOPM
    link
    fedilink
    arrow-up
    0
    ·
    edit-2
    1 year ago

    Once the extension is installed and enabled, you can reset the stats whenever you want and it will start to build a new list of queries and I assume reset the average execution time.

    I found a curious query that seems to take 10 or 11 seconds on my server:

    SELECT "person"."id", "person"."name", "person"."display_name", "person"."avatar", "person"."banned", "person"."published", "person"."updated", "person"."actor_id", "person"."bio", "person"."local", "person"."banner", "person"."deleted", "person"."inbox_url", "person"."shared_inbox_url", "person"."matrix_user_id", "person"."admin", "person"."bot_account", "person"."ban_expires", "person"."instance_id", "person_aggregates"."id", "person_aggregates"."person_id", "person_aggregates"."post_count", "person_aggregates"."post_score", "person_aggregates"."comment_count", "person_aggregates"."comment_score" FROM ("person" INNER JOIN "person_aggregates" ON ("person_aggregates"."person_id" = "person"."id"))
    WHERE (("person"."admin" = $1) AND ("person"."deleted" = $2)) ORDER BY "person"."published"
    

    This query is being called over 1000 times an hour with my instance only really doing incoming Federation content. Should this be optimized by doing a subSELECT on the WHERE clause before doing the INNER JOIN?

    • RoundSparrow@lemmy.mlOPM
      link
      fedilink
      arrow-up
      1
      ·
      1 year ago

      IMPORTANT NOTE: I am rusty on my PostgreSQL, it’s been over 10 years since I designed and ran a major time-critical website with it. That is not “SECONDS”, it is “MILLISECONDS”, but it is still a slow query being called frequently.