Common queries for MySQL 5
  aKMlxlNJExwO 2023年11月02日 26 0


Common queries for MySQL 5


Extending Chapter 9



TreeView


Aggregates

Find child tables

Group column statistics in rows

Cascading aggregates

Find parent tables

Pivot table with CONCAT

Cross-aggregates

Find primary key of a table

Pivot table without GROUP_CONCAT

Group by datetime period

Find the size of all databases on the server

Relational division

Per-group aggregate lists of specified size

List differences between two databases

All possible recipes with given ingredients

Per-group aggregates

List users with access to a database

Parties with candidates in all districts

Per-group aggregates across multiple joins

Rename Database

Who makes all the parts for a given assembly?

Show only one child row per parent row

Show Create Trigger

Schedules

Skip repeating values

Show Tables

Game schedule

Within-group aggregates

Files

Pivot table schedule

Within-group quotas (Top N per group)

Exporting query result to a file

Sequences

Aggregates and statistics

Frequencies

Find blocks of unused numbers

Average the top 50% of values per group

Display column values which occur N times

Find missing numbers in a sequence

Averages from bands of values

Display every Nth row

Find previous and next values in a sequence

Count unique values of one column

Graphs and Hierarchies

Find sequence starts and ends

Median

Trees, networks and parts explosions in MySQL

Find specific sequences

Mode

JOIN

Gaps in a time series

Rank order

Approximate joins

Make values of a column sequential

Data comparison

Cascading JOINs

Track stepwise project completion

Backslashes in data

Data-driven joins

Spherical geometry

Compare data in two tables

Many-to-many joins

Great circle distance

Date and time

Parties who have contracts with one another

Statistics without aggregates

Age in years

Join or subquery?

Moving average

Appointments available

Parents without children

Multiple sums across a join

Count business days between two dates

The unbearable slowness of IN()

Percentiles

Count Tuesdays between two dates

The [Not] Exists query pattern

Random row selection

Date of first Friday of next month

What exams did a student not register for?

Running Sum

Datetime difference

NULLs

Sum across categories

Find available reservation periods

List NULLs at end of query output

Top ten

Find sequenced duplicates

Show NULLs as blank cells

Stored procedures

Is a given booking period available?

Ordering resultsets

A cursor if necessary, but not necessarily a cursor

Last business day before a reference date

Next row

Emulate sp_exec

Make a calendar table

Order by month name

Variable-length arguments for query IN() clause

Sum time periods

Suppress repeating ordering values

Strings

Sum time values

Pagination

Count substrings

DDL

Pagination

Proper case

Add auto-incrementing primary key to a table

Pivot tables

Strip HTML tags

Auto-increment: reset next value

Automate the writing of pivot table queries

Compare structures of two tables

Column value associations



Cascading aggregates



When you have parent-child-grandchild tables, eg companies, users, actions, and your query requirement is for per-parent aggregates from the child table and per-child aggregates from the grandchild table, then cascading joins yield spuriously multiplied counts, and correlated subqueries fail because the second correlated subquery cannot find a visible joining column.

One solution is to use derived tables. Assuming ...

CREATE TABLE companies (id int, name char(10)); 
 CREATE TABLE users (id INT,companyid INT); 
 CREATE TABLE actions (id INT, userid INT, date DATE);


then...

  • Join companies & users once to establish a derived company-user table.
  • Join them a second time, this time aggregating on users.id to generate user counts per company.
  • Join the first derived table to the actions table, aggregating on actions.id to report actions per user per company:

Here is the SQL:


 SELECT cu1.cid, cu1.cname, cu2.cid, cu2.uCnt, ua.aCnt 
 FROM ( 
   SELECT c.id AS cid, c.name AS cname, u1.id AS uid  
   FROM companies c 
   INNER JOIN users u1 ON u1.companyid=c.id 
 ) AS cu1 
 INNER JOIN ( 
   SELECT c.id AS cid, COUNT(u2.id) AS uCnt 
   FROM companies c 
   INNER JOIN users u2 ON u2.companyid=c.id 
   GROUP BY c.id 
 ) AS cu2 ON cu1.cid=cu2.cid 
 INNER JOIN ( 
   SELECT u3.id AS uid, COUNT(a.id) AS aCnt 
   FROM users u3 
   INNER JOIN actions a ON a.userid=u3.id 
   GROUP BY u3.id 
 ) AS ua ON ua.uid=cu1.uid; 


Back to top



Cross-aggregates



Given the table authorbook(authid INT, bookid INT), what query finds the books who have authors with more than one book in the table?

Even one level of recursion can induce a mild trance. Escape the trance by taking the problem one step at a time. First write the query that finds the authors with multiple books. Then join an outer query to that on authorid, and have the outer query select bookid:

SELECT a1.bookid 
 FROM authorbook a1 
 INNER JOIN ( 
   SELECT authid,count(bookid) 
   FROM authorbook a2 
   GROUP BY authid 
   HAVING COUNT(bookid)>1 
 ) AS a3 ON a1.authid=a3.authid; 


Back to top



Group by datetime period



To group rows by a time period whose length in minutes divides evenly into 60, use this formula:

GROUP BY ((60/periodLen) * HOUR( thistime ) + FLOOR( MINUTE( thistime ) / periodlen ))

where thistime is the TIME column and periodLen is the period length. So to group by 15-min periods ...

SELECT ...
GROUP BY ( 4 * HOUR( thistime ) + FLOOR( MINUTE( thistime ) / 15 ))
...

When the desired grouping period is a value returned by a MySQL date-time function, matters become simpler: just group by the desired value. Thus to group by weeks, write ..

SELECT ...
GROUP BY WEEK( datecol)
...

If there is no MySQL date-time function that returns the desired grouping period, you will need to write your own stored function.



Back to top



Per-group aggregate lists of specified size



Find the values of a table column c1 for which there are a specified number of listed values in another column c2.

To get an overview of the values of c2 for each value of c1:

SELECT 
   c1,  
   GROUP_CONCAT(c2 ORDER BY c2) AS 'C2 values 
 FROM table 
 GROUP BY c1;


To retrieve a list of c1 values for which there exist specific values in another column c2, you need an IN clause specifying the c2 values and a HAVING clause specifying the required number of items in the list ...


 SELECT c1  
 FROM table 
 WHERE c2 IN (1,2,3,4) 
 GROUP BY c1 
 HAVING COUNT(c2)=4;


This is easy to generalise to multiple column expressions, and a HAVING clause specifying any number of items from the IN list.

To list c1 values that have exactly one instance of each c2 value, add DISTINCT to the count:

Back to top

SELECT c1  
 FROM table 
 WHERE c2 IN (1,2,3,4) 
 GROUP BY c1 
 HAVING COUNT(DISTINCT c2)=4; 


Per-group aggregates



This is the simplest grouping query pattern. For column foo, display the first (smallest), last (largest) or average value of column bar.

SELECT foo, MIN(bar) AS bar 
 FROM tbl 
 GROUP BY foo


To return the highest value, and order top to bottom by that value:

SELECT foo, MAX(bar) AS Count 
 FROM tbl 
 GROUP BY foo 
 ORDER BY Count DESC;


Ditto for AVG(), COUNT() etc. It is easily extended for multiple grouping column expressions.



Back to top



Per-group aggregates across multiple joins



Given a parent table and two child tables, a query which sums values in both child tables,
grouping on a parent table column, returns sums which are exactly twice as large as they
should be. In this recent example from the MySQL General Discussion list:

CREATE TABLE packageItem ( 
  packageItemID INT,  
  packageItemName CHAR(20),  
  packageItemPrice DECIMAL(10,2) 
 ); 
 INSERT INTO packageItem VALUES(1,'Delta Hotel',100.00); 

 CREATE TABLE packageCredit ( 
  packageCreditID INT,  
  packageCreditItemID INT,  
  packageItemType CHAR(10),  
  packageCreditAmount DECIMAL(10,2) 
 );   
 INSERT INTO packageCredit VALUES 
 (1,1,'Deposit',25.00), 
 (2,1,'Balance',92.00); 

 CREATE TABLE packageItemTax ( 
  packageItemTaxID INT,  
  packageItemTaxItemID INT,  
  packageItemTaxName CHAR(5),  
  packageItemTaxAmount DECIMAL(10,2) 
 ); 
 INSERT INTO packageItemTax VALUES 
 (1,1,'GST',7.00), 
 (2,1,'HST',10.00);


The query ...


 SELECT  
   packageItemID 
 , packageItemName 
 , packageItemPrice 
 , SUM(packageItemTaxAmount) as Tax 
 , SUM(packageCreditAmount) as Credit 
 FROM packageItem 
 LEFT JOIN packageCredit ON packageItemID=packageCreditItemID 
 LEFT JOIN packageItemTax ON packageItemTaxItemID=packageItemID 
 GROUP BY packageItemID 
 ORDER BY packageItemID



returns ...

+---------------+-----------------+------------------+-------+--------+ 
 | packageItemID | packageItemName | packageItemPrice | Tax   | Credit | 
 +---------------+-----------------+------------------+-------+--------+ 
 |             1 | Delta Hotel     |           100.00 | 34.00 | 234.00 | 
 +---------------+-----------------+------------------+-------+--------+


With three child tables, the sums are tripled. Why? Because the query aggregates across
each join.How then to get the correct results? With correlated subqueries:

SELECT  
   packageItemID, 
   SUM(packageItemPrice), 
   ( SELECT SUM(c.packageCreditAmount)  
     FROM packageCredit c  
     WHERE c.packageCreditItemID = packageItemID 
   ) AS CreditSum, 
   ( SELECT SUM(t.packageItemTaxAmount)  
     FROM packageItemTax t  
     WHERE t.packageItemTaxItemID = packageItemID 
   ) AS TaxSum 
 FROM packageItem 
 GROUP BY packageItemID; 

 +---------------+-----------------------+-----------+--------+ 
 | packageItemID | SUM(packageItemPrice) | CreditSum | TaxSum | 
 +---------------+-----------------------+-----------+--------+ 
 |             1 |                100.00 |    117.00 |  17.00 | 
 +---------------+-----------------------+-----------+--------+ 

 If subqueries are unavailable or too slow, replace them with temp tables. 



Back to top



Show only one child row per parent row



Given tables parent(id int not null primary key, etc...) and child (id int not null primary key, pid int not null references parent (id), etc...), how do we write a query that retrieves only one child row per pid even when the child table has multiple matching rows? MySQL permits use of GROUP BY even when the SELECT list specifies no aggregate function, so this will work:


 select p.id, c.id  
 from parent p  
 join child c on p.id=c.pid 
 group by p.id; 


Back to top



Skip repeating values



You want to report all unique values of a column and skip all rows repeating any of these values.


 SELECT col, ... 
 FROM foo 
 GROUP BY col 



Back to top



Within-group aggregates



You have a products table with columns item, supplier, price. Multiple suppliers offer various prices for the same item. You need to find the supplier with the lowest price for each item.

Your first thought may be to GROUP BY item, but that is not guaranteed to return the correct supplier value for each minimum item price. Grouping by both item and supplier will return more information than you want. Nor can you write WHERE price=MIN(...) because the query engine will evaluate the WHERE clause before it knows the MIN value.

This is the problem of aggregating within aggregates. It is sometimes called the 'groupwise aggregates' problem, but the term 'groupwise' is ambiguous at best, so we think better names for it are subaggregates, inner aggregates, or within-group aggregates.

It's easy to show that the within-group aggregates problem is a form of the problem of returning values from non-grouping columns in an aggregate query. Suppose you write ...


 SELECT item,supplier,MIN(price) 
 FROM products 
 GROUP BY item;



Will this tell you which supplier offers the minimum price per item? No. Unless there is exactly one supplier per item, the supplier value returned will be arbitrary.

The simplest and often best-performing solution to the within-aggregates problem is an outer self exclusion join...

SELECT p1.item,p1.supplier,p1.price 
 FROM products AS p1 
 LEFT JOIN products AS p2 ON p1.item  = p2.item AND p1.price > p2.price 
 WHERE p2.id IS NULL; 


...because in the resultset built by joining on left item=right item and left price larger than right price, the left-sided rows for which there is no greater right-sided price are precisely the per-item rows with the smallest prices.

You can also accomplish this by building a table of aggregated minimum prices. Before MySQL 4.1, it has to be a temporary table:


 CREATE TEMPORARY TABLE tmp ( 
   item INT, 
   minprice DECIMAL DEFAULT 0.0 
 ); 
 LOCK TABLES products READ; 
 INSERT INTO tmp  
   SELECT item, MIN(price)  
   FROM products  
   GROUP BY item; 


to which you then join the products table:


 SELECT products.item, supplier, products.price  
 FROM products  
 JOIN tmp ON products.item = tmp.item 
 WHERE products.price=tmp.minprice; 
 UNLOCK TABLES; 
 DROP TABLE tmp;



From MySQL 4.1 on, the temporary table can be a correlated subquery. This is the most intuitively obvious syntax for the problem. Often it's also the slowest solution:


 SELECT item, supplier, price 
 FROM products AS p1 
 WHERE price = ( 
   SELECT MIN(p2.price) 
   FROM products AS p2 
   WHERE p1.item = p2.item 
 );



Another solution, sometimes the fastest of all, is to move the aggregating subquery from the WHERE clause to the FROM clause:


 SELECT p.item, p.supplier, p.price 
 FROM products AS p 
 JOIN ( 
   SELECT item, MIN(price) AS minprice 
   FROM products 
   GROUP BY item 
 ) AS pm ON p.item = pm.item AND p.price = pm.minprice; 


Try all solutions to find which is fastest for your version of the problem.

To find more than one value per group, you might think the LIMIT clause would work, but LIMIT is limited in subqueries. See Within-group quotas.



Back to top



Within-group quotas (Top N per group)



A table has multiple rows per key value, and you need to retrieve, say, the first or earliest two rows per key.

If the groups are fairly small, this can be done efficiently with a self-join and counts. For example the following table (based on a tip by Rudy Limeback) has three small data groups:


 DROP TABLE IF EXISTS test; 
 CREATE TABLE test ( 
   id INT,  
   entrydate DATE  
 ); 
 INSERT INTO test VALUES 
 ( 1, '2007-5-01' ), 
 ( 1, '2007-5-02' ), 
 ( 1, '2007-5-03' ), 
 ( 1, '2007-5-04' ), 
 ( 1, '2007-5-05' ), 
 ( 1, '2007-5-06' ), 
 ( 2, '2007-6-01' ), 
 ( 2, '2007-6-02' ), 
 ( 2, '2007-6-03' ), 
 ( 2, '2007-6-04' ), 
 ( 3, '2007-7-01' ), 
 ( 3, '2007-7-02' ), 
 ( 3, '2007-7-03' );



The first two rows per ID are the rows which, for a given ID, have two or fewer rows with earlier dates. If we use an inequality join with the COUNT(*) function to find the earlier rows per ID ...

SELECT t1.id, t1.entrydate, COUNT(*) AS earlier 
 FROM test AS t1 
 JOIN test AS t2 ON t1.id=t2.id AND t1.entrydate >= t2.entrydate 
 GROUP BY t1.id, t1.entrydate 
 +------+------------+---------+ 
 | id   | entrydate  | earlier | 
 +------+------------+---------+ 
 |    1 | 2007-05-01 |       1 | 
 |    1 | 2007-05-02 |       2 | 
 |    1 | 2007-05-03 |       3 | 
 |    1 | 2007-05-04 |       4 | 
 |    1 | 2007-05-05 |       5 | 
 |    1 | 2007-05-06 |       6 | 
 |    2 | 2007-06-01 |       1 | 
 |    2 | 2007-06-02 |       2 | 
 |    2 | 2007-06-03 |       3 | 
 |    2 | 2007-06-04 |       4 | 
 |    3 | 2007-07-01 |       1 | 
 |    3 | 2007-07-02 |       2 | 
 |    3 | 2007-07-03 |       3 | 
 +------+------------+---------+


... then we get our result immediately by removing rows where the 'earlier' count exceeds 2:

SELECT t1.id, t1.entrydate, count(*) AS earlier 
 FROM test AS t1 
 JOIN test AS t2 ON t1.id=t2.id AND t1.entrydate >= t2.entrydate 
 GROUP BY t1.id, t1.entrydate 
 HAVING earlier <= 2; 
 +------+------------+---------+ 
 | id   | entrydate  | earlier | 
 +------+------------+---------+ 
 |    1 | 2007-05-01 |       1 | 
 |    1 | 2007-05-02 |       2 | 
 |    2 | 2007-06-01 |       1 | 
 |    2 | 2007-06-02 |       2 | 
 |    3 | 2007-07-01 |       1 | 
 |    3 | 2007-07-02 |       2 | 
 +------+------------+---------+


This works beautifully with smallish aggregates. But the query algorithm compares every within-group row to every other within-group row. As the size N of a group increases, execution time increases by N*N. If the query takes one minute for groups of 1,000, it will take 16 minutes for groups of 4,000, and more than four hours for groups for 16,000. The solution does not scale.

What to do? Forget GROUP BY! Manually assemble the desired query results in a temporary table from simple indexed queries, in this case, two rows per ID:

DROP TEMPORARY TABLE IF EXISTS earliers; 
 CREATE TEMPORARY TABLE earliers( id INT, entrydate DATE); 
 INSERT INTO earliers  
   SELECT id,entrydate FROM test WHERE id=1 ORDER BY entrydate LIMIT 2; 
 INSERT INTO earliers  
   SELECT id,entrydate FROM test WHERE id=2 ORDER BY entrydate LIMIT 2; 
 INSERT INTO earliers  
   SELECT id,entrydate FROM test WHERE id=3 ORDER BY entrydate LIMIT 2; 


You need one INSERT statement per grouping value. To print the result, just query the earliers table:


 SELECT * FROM earliers 
 ORDER BY id, entrydate; 
 +------+------------+ 
 | id   | entrydate  | 
 +------+------------+ 
 |    1 | 2007-05-01 | 
 |    1 | 2007-05-02 | 
 |    2 | 2007-06-01 | 
 |    2 | 2007-06-02 | 
 |    3 | 2007-07-01 | 
 |    3 | 2007-07-02 | 
 +------+------------+ 
 DROP TEMPORARY TABLE earliers;


Most useful reports run again and again. If that's the case for yours, automate it in a stored procedure: using a cursor and a prepared statement, auto-generate an INSERT statement for every grouping value, and return the result:

DROP PROCEDURE IF EXISTS listearliers; 
 DELIMITER | 
 CREATE PROCEDURE listearliers() 
 BEGIN 
   DECLARE curdone, vid INT DEFAULT 0; 
   DECLARE idcur CURSOR FOR SELECT DISTINCT id FROM test; 
   DECLARE CONTINUE HANDLER FOR SQLSTATE '02000' SET curdone = 1; 
   DROP TEMPORARY TABLE IF EXISTS earliers; 
   CREATE TEMPORARY TABLE earliers( id INT, entrydate DATE); 
   SET @sql = 'INSERT INTO earliers SELECT id,entrydate FROM test WHERE id=? ORDER BY  entrydate LIMIT 2'; 
   OPEN idcur; 
   REPEAT 
     FETCH idcur INTO vid; 
     IF NOT curdone THEN 
       BEGIN 
         SET @vid = vid; 
         PREPARE stmt FROM @sql; 
         EXECUTE stmt USING @vid; 
         DROP PREPARE stmt; 
       END; 
     END IF; 
   UNTIL curdone END REPEAT; 
   CLOSE idcur; 
   SELECT * FROM earliers ORDER BY id,entrydate; 
   DROP TEMPORARY TABLE earliers; 
 END; 
 | 
 DELIMITER ; 
 CALL listearliers(); 


Back to top



Average the top 50% of values per group



Each row of a games table records one game score for a team:


 DROP TABLE IF EXISTS games; 
 CREATE TABLE games(id INT, teamID INT, score INT); 
 INSERT INTO games VALUES  
   (1,1,3),(2,1,4),(3,1,5),(4,1,6),(5,2,6), 
   (6,2,7),(7,2,8),(8,2,7),(9,2,6),(10,2,7);



How would we write a query that returns the average of the top 50% of scores per team?

The per-team median value is its middle value--lower than the highest 50% and higher than the lowest 50% of values for that team--so a shortcut is to query the team medians, then aggregate on a join that selects per-team scores above the medians.

How to find per-team medians? If a resultset has an odd number of rows, at least one row has the true median score. If it has an even number of rows, the median score is an average of two central values. The following query adapts Joe Celko's formula in "SQL for Smarties" averaging "low" and "high" medians:

DROP TABLE IF EXISTS medians; 
 CREATE TABLE medians 
 SELECT p1.teamid, AVG(P1.score) AS median 
 FROM games AS P1, games AS P2 
 WHERE p1.teamid=p2.teamid 
 GROUP BY p1.teamid 
 HAVING ( 
   SUM(CASE WHEN P2.score <= P1.score THEN 1 ELSE 0 END) >= ((COUNT(*) + 1) / 2) 
   AND  
   SUM(CASE WHEN P2.score >= P1.score THEN 1 ELSE 0 END) >= (COUNT(*)/2 + 1) 
 ) 
 OR ( 
   SUM(CASE WHEN P2.score >= P1.score THEN 1 ELSE 0 END) >= ((COUNT(*) + 1) / 2) 
   AND  
   SUM(CASE WHEN P2.score <= P1.score THEN 1 ELSE 0 END) >= (COUNT(*)/2 + 1) 
 ); 
 +--------+--------+ 
 | teamid | median | 
 +--------+--------+ 
 |      1 | 4.5000 | 
 |      2 | 6.8333 | 
 +--------+--------+ 


Now join games to medians accepting only top-half values:

SELECT g.teamid, AVG(g.score) AS Top50Avg 
 FROM games g 
 JOIN medians m ON g.teamid = m.teamid AND g.score >= m.median 
 GROUP BY g.teamid 
 ORDER BY Top50Avg DESC; 
 +--------+----------+ 
 | teamid | Top50Avg | 
 +--------+----------+ 
 |      2 |   7.2500 | 
 |      1 |   5.5000 | 
 +--------+----------+ 
 DROP TABLE medians; 
 
 
 Yes, all the logic can be moved into one query: 
 

 SELECT g.teamid, AVG(g.score) AS Top50Avg 
 FROM games g 
 JOIN ( 
   SELECT p1.teamid, AVG(P1.score) AS median 
   FROM games AS P1, games AS P2 
   WHERE p1.teamid=p2.teamid 
   GROUP BY p1.teamid 
   HAVING ( 
     SUM(CASE WHEN P2.score <= P1.score THEN 1 ELSE 0 END) >= ((COUNT(*) + 1) / 2) 
     AND  
     SUM(CASE WHEN P2.score >= P1.score THEN 1 ELSE 0 END) >= (COUNT(*)/2 + 1) 
   ) 
   OR ( 
     SUM(CASE WHEN P2.score >= P1.score THEN 1 ELSE 0 END) >= ((COUNT(*) + 1) / 2) 
     AND  
     SUM(CASE WHEN P2.score <= P1.score THEN 1 ELSE 0 END) >= (COUNT(*)/2 + 1) 
   ) 
 ) AS m ON g.teamid = m.teamid AND g.score >= m.median 
 GROUP BY g.teamid 
 ORDER BY Top50Avg DESC;  


Back to top



Averages from bands of values



To count and average scores in bands of 10:

SELECT 10  * FLOOR( score / 10  ) AS  Bottom, 
        10  * FLOOR( score / 10  )   9 AS Top,  
        Count( score ), 
        Avg( score )  
 FROM scores 
 GROUP BY 10  * FLOOR( score / 10  ); 


Back to top



Count unique values of one column




 SELECT col_name, COUNT(*) AS frequency  
 FROM tbl_name 
 GROUP by col_name 
 ORDER BY frequency DESC; 


Back to top



Median



Statistically, the median is the middle value--the value that is smaller than that found in half of all remaining rows, and larger than that found in the other half:

SELECT l1.hours As Median 
 FROM BulbLife As l1, bulbLife AS l2 
 GROUP BY l1.Hours 
 HAVING SUM(CASE WHEN l2.hours <= l1.hours THEN 1 ELSE 0 END) >= (COUNT(*)+1) / 2  
    AND SUM(CASE WHEN l2.hours >= l1.hours THEN 1 ELSE 0 END) >= (COUNT(*)/2) + 1; 


Back to top



Mode



Statistically, the mode is the most frequently occurring value. Given tables parent(id int) and child(pid int, cid int), where child.pid references parent.id as a foreign key, what query finds the parent.id most often represented in the child id, that is, the modal count of child.pid?


 SELECT pid, COUNT(*) AS frequency 
 FROM child 
 GROUP BY pid 
 ORDER BY frequency DESC 
 LIMIT 1; 


Back to top



Rank order



Without MSSQL's RANK() aggregate function, how do we display rank order in a MySQL query, for example from a table like this?

CREATE TABLE votes( name CHAR(10), votes INT ); 
 INSERT INTO votes VALUES 
   ('Smith',10),('Jones',15),('White',20),('Black',40),('Green',50),('Brown',20);


The query is a two-step:
1. Join the table to itself on the value to be ranked, handling ties
2. Group and order the result of the self-join on rank:

SELECT v1.name, v1.votes, COUNT(v2.votes) AS Rank 
 FROM votes v1 
 JOIN votes v2 ON v1.votes < v2.votes OR (v1.votes=v2.votes and v1.name = v2.name) 
 GROUP BY v1.name, v1.votes 
 ORDER BY v1.votes DESC, v1.name DESC; 
 +-------+-------+------+ 
 | name  | votes | Rank | 
 +-------+-------+------+ 
 | Green |    50 |    1 | 
 | Black |    40 |    2 | 
 | White |    20 |    3 | 
 | Brown |    20 |    3 | 
 | Jones |    15 |    5 | 
 | Smith |    10 |    6 | 
 +-------+-------+------+ 



Back to top



Backslashes in data



Backslashes multiply weirdly:

SELECT 'a\b' RLIKE 'a\b';


returns 1, as does...

SELECT 'a\\b' RLIKE 'a\\\\b';


because in a pair of backslashes, the second is not escaped by the first, so to compare two literals you double each backslash in the RLIKE argument. But if you are querying a table for such a string from the MySQL client, this doubling happens twice--once in the client, and once in the database--so to find a column value matching 'a\\b', you need to write...

SELECT desc FROM xxx WHERE desc RLIKE 'aa\\\\\\\\bb';


That's eight backslashes to match two!



Back to top



Compare data in two tables



This query UNIONs queries for matching column names from two tables, and keeps just those rows which occur once in the union. Those are the rows unique to one table or the other. Customise your column list { id, col1, col2, col3 ...} as desired.

Back to top

SELECT  
   MIN(TableName) as TableName, id, col1, col2, col3, ... 
 FROM ( 
   SELECT 'Table a' as TableName, a.id, a.col1, a.col2, a.col3, ... 
   FROM a 
   UNION ALL 
   SELECT 'Table b' as TableName, b.id, b.col1, b.col2, b.col3, ... 
   FROM b 
 ) AS tmp 
 GROUP BY id, col1, col2, col3, ... 
 HAVING COUNT(*) = 1 
 ORDER BY ID; 


Age in years



Computing age in years is a natural job for a stored function:

CREATE FUNCTION Age( dob DATE, today DATE ) RETURNS INTEGER 
 DETERMINISTIC 
 BEGIN 
  RETURN DATE_FORMAT(FROM_DAYS(TO_DAYS(today) - TO_DAYS(dob)), '%Y') + 0; 
 END;  



Back to top



Appointments available



Given a clinic of physicians, patients and appointments, how to find an available appointment time for a given physician?

This is a variant of the [Not] Exists query pattern. Though we can write it with subqueries, performance will be crisper with a join. But finding data that is not there requires a join to data which is there. So in addition to tables for appointments, doctors and patients, we need a table of all possible appointment datetimes. Here's a schema illustrating the idea ...


 CREATE TABLE a_dt (         -- POSSIBLE APPOINTMENT DATES AND TIMES 
   d DATE, 
   t TIME 
 ); 
 CREATE TABLE a_drs (        -- DOCTORS 
   did INT                   -- doctor id 
 ); 
 CREATE TABLE a_pts (        -- PATIENTS 
   pid INT 
 ); 
 CREATE TABLE a_appts (      -- APPOINTMENTS 
   aid INT,                  -- appt id 
   did INT,                  -- doctor id 
   pid INT,                  -- patient id 
   d DATE, 
   t TIME 
 );


Now we can apply the [Not] Exists query pattern. To find free appointment datetimes for a given doctor in a given datetime range, we left join possible appointments to existing appointments on date and time and doctor, add Where conditions for desired appointment datetimes, and finally add a Where condition that the appointment slot be null, i.e. free...

Back to top

SELECT d.did, a.d, a.t 
 FROM a_dt AS a 
 LEFT JOIN a_appts AS ap USING (d,t)  
 LEFT JOIN a_drs AS d  
    ON a.d = ap.d  
   AND a.t = ap.t  
   AND ap.did = d.did  
   AND ap.did = 1 
 WHERE a.d BETWEEN desired_start_date AND desired_end_date 
   AND a.t BETWEEN desired_start_time AND desired_end_time 
   AND ap.aid IS NULL; 


Count business days between two dates



Given a table named dates with date columns d1,d2, if the two dates are in the same year, the solution is simply the date difference in days minus the date difference in weeks:

SELECT d1, d2, DATEDIFF(d2, d1) - (WEEK(d2) - WEEK(d1)) * 2 AS BizDays 
 FROM dates 
 ORDER BY d1, d2;


For dates that span different years, week numbers won't work. The answer is the number of raw days, minus twice the number of whole weeks (because there are two weekend days/week), minus the number of weekend days in any remainder part-week. This algorithm works when the start and stop dates are themselves business days (but needs refinement to work when passed weekend dates--anybody want to try?):

SET @d1='2007-1-1'; 
 SET @d2='2007-3-31'; 
 SET @dow1 = DAYOFWEEK(@d1); 
 SET @dow2 = DAYOFWEEK(@d2); 
 SET @days = DATEDIFF(@d2,@d1); 
 SET @wknddays = 2 * FLOOR( @days / 7 ) + 
                 IF( @dow1 = 1 AND @dow2 > 1, 1,                                
                     IF( @dow1 = 7 AND @dow2 = 1, 1,                
                         IF( @dow1 > 1 AND @dow1 > @dow2, 2,        
                             IF( @dow1 < 7 AND @dow2 = 7, 1, 0 )    
                           ) 
                       ) 
                   ); 
 SELECT FLOOR(@days-@wkndDays) AS BizDays;


The algorithm is easily encapsulated in a function:


 DROP FUNCTION IF EXISTS BizDayDiff; 
 DELIMITER | 
 CREATE FUNCTION BizDayDiff( d1 DATE, d2 DATE )  
 RETURNS INT 
 DETERMINISTIC 
 BEGIN 
   DECLARE dow1, dow2, days, wknddays INT; 
   SET dow1 = DAYOFWEEK(d1); 
   SET dow2 = 

【版权声明】本文内容来自摩杜云社区用户原创、第三方投稿、转载,内容版权归原作者所有。本网站的目的在于传递更多信息,不拥有版权,亦不承担相应法律责任。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱: cloudbbs@moduyun.com

  1. 分享:
最后一次编辑于 2023年11月08日 0

暂无评论

推荐阅读
aKMlxlNJExwO