MySQL中distinct和group by性能比較

 長沙一度軟件培訓(xùn)  2022-04-06 07:42:01  8

MySQL中distinct和group by性能比較,MySQL是目前最流行的關(guān)系型數(shù)據(jù)庫之一，而關(guān)系數(shù)據(jù)庫將數(shù)據(jù)保存在不同的表中，而不是將所有數(shù)據(jù)放在一個大倉庫內(nèi)，這樣就增加了

課程價格請咨詢

上課時段：授課校區(qū)：

詳細介紹

MySQL是目前最流行的關(guān)系型數(shù)據(jù)庫之一，而關(guān)系數(shù)據(jù)庫將數(shù)據(jù)保存在不同的表中，而不是將所有數(shù)據(jù)放在一個大倉庫內(nèi)，這樣就增加了速度并提高了靈活性。我們知道在MySQL數(shù)據(jù)庫中DISTINCT可以去掉重復(fù)數(shù)據(jù)，而GROUP BY在分組后也會去掉重復(fù)數(shù)據(jù)，那這兩個關(guān)鍵字在去掉重復(fù)數(shù)據(jù)時的效率，究竟誰會更高一點？本文我們就來比較一些distinct和group by的性能。

一、測試過程：

準備一張測試表

??CREATE TABLE `test_test` (?????`id` int(11) NOT NULL auto_increment,??????`num` int(11) NOT NULL default '0',??????PRIMARY KEY ?(`id`)?????) ENGINE=MyISAM ?DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;

建個儲存過程向表中插入10W條數(shù)據(jù)

???create procedure p_test(pa int(11))?????begin??????declare max_num int(11) default 100000;??????declare i int default 0;?????declare rand_num int;??????select count(id) into max_num from test_test;?????while i < pa do??????????????if max_num < 100000 then??????????????????????select cast(rand()*100 as unsigned) into rand_num;??????????????????????insert into test_test(num)values(rand_num);??????????????end if;??????????????set i = i +1;??????end while;?????end

調(diào)用存儲過程插入數(shù)據(jù)

call p_test(100000);

開始測試：（不加索引）

?select distinct num from test_test;????select num from test_test group by num;????[SQL] select distinct num from test_test;????受影響的行: 0????時間: 0.078ms????[SQL] ????select num from test_test group by num;???受影響的行: 0????時間: 0.031ms

二、num字段上創(chuàng)建索引

ALTER TABLE `test_test` ADD INDEX `num_index` (`num`) ;

再次查詢

select distinct num from test_test;????select num from test_test group by num;????[SQL] select distinct num from test_test;???受影響的行: 0????時間: 0.000ms????[SQL] ?????select num from test_test group by num;????受影響的行: 0????時間: 0.000ms

這時候我們發(fā)現(xiàn)時間太小了 0.000秒都無法精確了。

我們轉(zhuǎn)到命令行下測試

?mysql> set profiling=1;????mysql> select distinct(num) from test_test;????mysql> select num from test_test group by num;????mysql> show profiles;????+----------+------------+----------------------------------------+????| Query_ID | Duration ??| Query ?????????????????????????????????|????+----------+------------+----------------------------------------+????| ???????1 | 0.00072550 | select distinct(num) from test_test ???|????| ???????2 | 0.00071650 | select num from test_test group by num |???+----------+------------+----------------------------------------+?

加了索引之后 distinct 比沒加索引的 distinct 快了 107倍。

加了索引之后 group by 比沒加索引的 group by 快了 43倍。

再來對比：distinct 和 group by

不管是加不加索引 group by 都比 distinct 快。因此使用的時候建議選 group by。

默認情況下，distinct會被hive翻譯成一個全局唯一reduce任務(wù)來做去重操作，因而并行度為1。而group by則會被hive翻譯成分組聚合運算，會有多個reduce任務(wù)并行處理，每個reduce對收到的一部分數(shù)據(jù)組，進行每組聚合（去重）

通過上述兩個實驗，我們可以得出這樣一條結(jié)論：在重復(fù)量比較高的表中，使用DISTINCT可以有效提高查詢效率，而在重復(fù)量比較低的表中，使用DISTINCT會嚴重降低查詢效率。所以并不是所有的DISTINCT都是降低效率的，當然你得提前判斷數(shù)據(jù)的重復(fù)量。想要獲取更多的MySQL知識，請到本站的MySQL教程觀看最新的MySQL學(xué)習(xí)資料，開啟全新的MySQL學(xué)習(xí)之旅。

培訓(xùn)啦提醒您：交易時請核實對方資質(zhì)，對于過大宣傳或承諾需謹慎！任何要求預(yù)付定金、匯款等方式均存在風(fēng)險，謹防上當。