The Why
So I find myself writing increasingly complex SQL scripts and it's at the stage where we need to optimize the queries because some scripts are noticeably slow (as observed by the customer...) and then others not.
The What
I'm going to run these benchmark tests against a system that is both up and running via the front-end and back-end. It's MediaWiki CMS used by Wikipedia.org and the like. I like queries against this database because it involves linking a lot of tables and outputting... just articles and their titles.
I have another table holding the audit trail of content approvers on the system. Approvers can e-sign an article (approving it) by clicking on a button. We want to bring back the articles that aren't listed in the audit table (articles yet to be approved).
So this will be based on the SQL query to get just your articles from your MediaWiki CMS:
-- to get articles from MediaWiki CMS: Let's call it STATEMENT1 SELECT wikimedia_page.page_id AS PageID, CONVERT(wikimedia_page.page_title USING latin1) AS PageTitle, CONVERT(wikimedia_text.old_text USING latin1) AS PageContent FROM wikimedia_page LEFT JOIN ( wikimedia_revision, wikimedia_text) ON wikimedia_page.page_latest=wikimedia_revision.rev_id WHERE wikimedia_page.page_namespace=0 AND wikimedia_revision.rev_text_id=wikimedia_text.old_id AND wikimedia_text.old_text'' AND wikimedia_page.page_is_redirect=0 -- yields: let's call it MyTable1: Query time: 0.187s. 880 rows returned. PageID PageTitle PageContent ---------------- ------------------ --------------------- 1144 7911_and_7912_t... ==Description==...
- -- to get articles from MediaWiki CMS: Let's call it STATEMENT1
- SELECT
- wikimedia_page.page_id AS PageID,
- CONVERT(wikimedia_page.page_title USING latin1) AS PageTitle,
- CONVERT(wikimedia_text.old_text USING latin1) AS PageContent
- FROM
- wikimedia_page
- LEFT JOIN (
- wikimedia_revision, wikimedia_text)
- ON
- wikimedia_page.page_latest=wikimedia_revision.rev_id
- WHERE
- wikimedia_page.page_namespace=0
- AND wikimedia_revision.rev_text_id=wikimedia_text.old_id
- AND wikimedia_text.old_text''
- AND wikimedia_page.page_is_redirect=0
- -- yields: let's call it MyTable1: Query time: 0.187s. 880 rows returned.
- PageID PageTitle PageContent
- ---------------- ------------------ ---------------------
- 1144 7911_and_7912_t... ==Description==...
-- to get a row from my custom audit table SELECT ApprovalID, ArticleID, DateSigned FROM Audit_Publishing WHERE ArticleID IS NOT NULL -- yields: let's call it MyTable2: Query time: 0.016s. 4367 rows returned. ApprovalID ArticleID DateSigned ---------------- ------------------ --------------------- 17 1144 2010-06-18 12:25:22
- -- to get a row from my custom audit table
- SELECT
- ApprovalID, ArticleID, DateSigned
- FROM
- Audit_Publishing
- WHERE
- ArticleID IS NOT NULL
- -- yields: let's call it MyTable2: Query time: 0.016s. 4367 rows returned.
- ApprovalID ArticleID DateSigned
- ---------------- ------------------ ---------------------
- 17 1144 2010-06-18 12:25:22
The How
And here is the dilemma. The 3 below SELECT statements were derived from a post to the Stack Overflow website about the best way to check for rows in one table which don't exist in another.
-- STATEMENT2 SELECT PageID, PageTitle, PageContent FROM MyTable1 WHERE PageID NOT IN (SELECT ArticleID FROM MyTable2)
- -- STATEMENT2
- SELECT PageID, PageTitle, PageContent
- FROM MyTable1
- WHERE PageID NOT IN (SELECT ArticleID FROM MyTable2)
-- STATEMENT3 SELECT PageID, PageTitle, PageContent FROM MyTable1 WHERE NOT EXISTS (SELECT ArticleID FROM MyTable2 WHERE MyTable2.ArticleID = MyTable1.PageID)
- -- STATEMENT3
- SELECT PageID, PageTitle, PageContent
- FROM MyTable1
- WHERE NOT EXISTS
- (SELECT ArticleID
- FROM MyTable2
- WHERE MyTable2.ArticleID = MyTable1.PageID)
-- STATEMENT4 SELECT PageID, PageTitle, PageContent FROM MyTable1 LEFT OUTER JOIN MyTable2 ON (MyTable1.PageID= MyTable2.ArticleID) WHERE MyTable2.ArticleID IS NULL
- -- STATEMENT4
- SELECT PageID, PageTitle, PageContent
- FROM MyTable1
- LEFT OUTER JOIN MyTable2
- ON (MyTable1.PageID= MyTable2.ArticleID)
- WHERE MyTable2.ArticleID IS NULL
Results | STATEMENT1 | 880 rows | 0.187s |
STATEMENT2 | 741 rows | 49.933s | |
STATEMENT3 | 741 rows | 48.844s | |
STATEMENT4 | 741 rows | 49.603s |
And Vice-Versa?
Well actually I wanted all the articles that have NOT been approved yet... So would this work if I switched MyTable1 with MyTable2? Um I'll get back on this article when I work this out without using two SQL scripts.
Currently I've returned to the process where one SQL query returns all possible results and the PHP script runs another SQL query for each row to check the ID does not exist in the audit table...
-- Note: This article was updated on 2011-04-19 11:54