Version: Base version, SP1, SP2 (latest), or SP2 with various patches (as of Jul 2017)
Edition: Standard, Enterprise
This post is about MS SQL Server 2014 parallel query bug that rarely & randomly occurs which causes long execution, but not widely documented yet.
Problem DescriptionA query that used to take 18 sec suddenly switched its execution plan and took more than 16 hr. The data volume only increased by 100 rows, although it was joining 10 tables. The largest table size was 40,000 rows, while the rest is < 10,000 for each table, yet the bug hit with 100 rows increase in 4 out of 10 tables, which is consider < 0.1% of the table size.
Using SSMS Activity Monitor, one can see there are many sessions with same Session ID, as well as the WaitType=CXPACKET.
Backup/restore this DB and try in MS SQL Server 2014 Standard Edition, Enterprise Edition, vanilla version, SP1, SP2, SP2 with latest Jul 2017 patch, VMware, physical server, etc, all are producing the same result.
The behavior is consistently reproducible by running the query through SSMS even remove the WHERE clause.
AnalysisCPU is consistently 40% (of multi CPU) immediately when the query fire, and last for entire day. Modified the query to have TOP 1, and the query took ~ 1 min
Disabled parallelism by setting "Max Degree of Parallelism" to 1, and it is 50% better, and not using parallel query execution in the plan, but still slower than before
In Activity Monitor, one can see there are multiple sessions (all with same Session ID) due to parallelism query execution plan (the more CPU core you have, there more Session ID it will spawn). Except 1 is waiting on SOS_SCHEDULER_YIELD. This occurs consistently with 40% total CPU utilization for > 16 hr (for something that used to take 20 sec a day ago).
Isolated that there is no storage sub-system issue, Windows patch, DB patch, DB startup parameter change, antivirus patch, virus, firewall, network issue, etc.
A closer look at the SQL execution plan reviews that all spawned Session ID are in SUSPENDED mode with waittype=CXPACKET, expect with waittype SOS_SCHEDULER_YIElD that is running.
Tied to limit the parallelism to 1 in the SQL scope like folow:
SELECT ...[columns]... FROM [table1] INNER JOIN [table2]..... WHERE ...[conditions] ... OPTION (MAXDOP 1, RECOMPILE)
Although the SQL execution plan is not using parallelism, the execution time is still > 15 min (cancelled early).
Re-gathered all indexes using following procedure, and no change in the behavior
Checked database, and confirmed it is not corrupted. No different:
Updated statistics for all tables. No different:
Checked sp_who2 that there is nothing blocking, and no deadlock
Found it spawned 16 child sessions, 15 waiting for CXPACKET for entire 16 hr
Examined the query execution plan (by choose any of the child session ID) and see 40% - 50% of cost for Parallelism (Repartition Streams), and one of the index has 100x more rows in the Estimated Number of Rows.
What if only select TOP 1? How long it will takes? So let's try it out
SELECT TOP 1 ...[columns]... FROM [table1] INNER JOIN [table2]..... WHERE ...[conditions] ...
It turns out it took ~ 1 min with only 1 row. So there is something very wrong with the query optimizer in MS SQL Server 2014, which we can't workaround it by more CPU, set DOP=1, set DOP=8, set DOP=# CPU core, rebuild index, faster storage, update statistics. As the number of rows increase progressively, it may just flip and suddenly take 5000x longer to run.
Last resort, force DB engine to use SQL Server 2012 query optimizer, using trace flag 9481
SELECT ...[columns]... FROM [table1] INNER JOIN [table2]..... WHERE ...[conditions] ... OPTION (QUERYTRACEON 9481, RECOMPILE)
BINGO! Query took 20 sec to run. The execution plan is not using any of the parallel query execution, and no CXPACKET waittype at all (in Activity Monitor).
Depending on the application, if there is noway to specify this trace flag in query, or session, then it can only specified in the global scope as startup parameter
Conclusion: MS SQL Server 2014 has bug in the query optimizer engine, which will take 5000x to 10000x to run the same SQL with 0.01% of row increase. We are talking about the timing 1000x which far exceeds extraction of all the relevant tables in the join (table full scan or index full scan).
I consider this as a bug due to the fact that it took much longer than disregard DOP is 1 (disable), 2, 8, 16, max up to CPU core, the SQL will take 5000x longer to run, even the SQL is not using parallel query execution.
This SQL Server 2012 cardinal estimator compatibility level also can be set by setting the database compatibility level to SQL Server 2012 (110), which limits the entire database not to leverage SQL Server 2014's feature.
For SQL Server 2016, you can set database compatibility to 2016, yet uses "alter database scoped configuration set legacy_cardinality_estimation = on" to gives an equivalent effect of trace flag 9481. This can be set in the UI
Anyway, I still consider this as temporary workaround given the SQL takes 5000x longer to run