## Thursday, February 4, 2021

### Today's "Stroke Damaged My Brain" SQL Question

The query:

FROM incidents
WHERE ([cause]='MI' or [cause]='MI?')
GROUP BY (int (year/10)*10);

This produces three columns:

1660 1 2
1800 1 8
1860 7 27
1870 4 19
1880 12 55
1890 19 78
1900 43 183
1910 21 83
1920 7 31
1940 2 18
1960 2 21
1970 6 27
1980 8 45
1990 9 48
2000 4 55
2010 11 170
2020 1 5

You will notice that there are no rows for the decades 1810 through 1859.  This is because there are no incidents where the cause was clearly or likely mental illness.  (If this seems odd, schizophrenia rates in Britain roughly octupled from the17th to 19th centuries, and similar dramatic increases happen in America over this same period.  One possible cause is cats going from mousekillers in barns into homes as pets with toxoplasmosis in cat feces, as one possible environmental trigger.)  So how do I get rows with 0 incidents into that query?

I tried:

FROM Table2
WHERE ((cause_ID=15) or (cause_ID=16))
UNION
FROM Table2
WHERE ((cause_ID<>15) and (cause_ID<>16))
GROUP BY (int (year/10)*10) ;
but the complaint is that "Your query does not include the specified expression int(year/10)*10 as an aggregate function."

Thanks to all.  This mostly fixes it:
FROM Table2
WHERE ((cause_ID=15) or (cause_ID=16))
GROUP BY (int (year/10)*10)
UNION SELECT int(year/10)*10 AS decade, 0, 0
FROM Table2
WHERE ((cause_ID<>15) and (cause_ID<>16))
GROUP BY (int (year/10)*10);

However, if the first SELECT has matches, the UNION gives me two rows before DECADE, and some decades are still missing:

1650 0 0
1660 1 2
1680 0 0
1690 0 0
1720 0 0
1750 0 0
1780 0 0
1800 0 0
1800 1 8
1820 0 0
1830 0 0
1840 0 0
1850 0 0
1860 0 0
1860 7 27
1870 0 0
1870 4 19
1880 0 0
1880 12 55
1890 0 0
1890 19 78
1900 0 0
1900 43 183
1910 0 0
1910 21 83
1920 0 0
1920 7 31
1930 0 0
1940 2 18
1950 0 0
1960 0 0
1960 2 21
1970 0 0
1970 6 27
1980 0 0
1980 8 45
1990 0 0
1990 9 48
2000 0 0
2000 4 55
2010 0 0
2010 11 170
2020 0 0
2020 1 5

I think instead of a UNION I need to only do the second SELECT if there is no record from the first SELECT.

1. You need to include the group by in the first half before the union. A union combines two complete queries, so each half has to stand alone.

2. You need two GROUP BY clauses, one for each SELECT statement. Add a copy of the GROUP BY clause before the UNION.

3. I'm not an expert in SQL, but have you tried putting the same GROUP BY after both WHERE clauses? Your problem might be that you're trying to take the union of two result sets that don't have the same structure, i.e. the first isn't grouped, but the second one is.

4. Clayton:

Try using conditional sums (sum( case ....)). This will report all years, but only aggregate data for the categories in question.

For example:

SELECT
SUM(
CASE
WHEN cause_id=15 THEN 1
WHEN cause_id=17 THEN 1
ELSE 0
END
) AS Incidents,
SUM(