From: Peter Rabbitson Date: Fri, 24 Jan 2014 05:26:43 +0000 (+0100) Subject: Augment the logic from aca094b4d, add deprecation to settle the distinct drama X-Git-Tag: v0.08260~10 X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?p=dbsrgits%2FDBIx-Class.git;a=commitdiff_plain;h=722274d02c729c50a01db38b0d62042fbc48b5e7 Augment the logic from aca094b4d, add deprecation to settle the distinct drama Details in attached IRC log magnet#dbic-cabal_20131108.log ==== [19:01:38] using aggregates and distinct seems like it should be an error [19:02:12] since it makes no sense to ask for "GROUP BY everything we select" and then select aggregates AFAICS [19:03:20] so, yes, what we did before was ... wtf [19:04:44] mst: the point is *sometimes* a select in a group-by may make sense is my gut feeling, I basically wanted to "send the nonsense" along, and have the rdbms be the judge [19:05:52] well ... what we did before was definitely completely wrong [19:06:09] what we're doing now ... I don't want to release it without thinking it through some more, but it's certainly less wrong [19:07:40] but I mean ... I dunno, I think group_over_selection makes sense for subquery wrapping [19:08:04] but in the case of 'distinct => 1' I think "can't do that with an aggregate in select, pass a real group by you fucking pansy" would be pretty good too [19:08:14] since I'm really not convinced it ever can make sense [19:08:27] mst: and I am not convinced it never makes sense [19:08:36] mst: so bubbling it up doesn't seem like a loss [19:09:09] hmmm [19:09:44] mst: also I do not want to *entirely* discourage mysql/sqlite-centric grouppage - if a person wants to do that, they should be able to [19:09:56] hence why the explicit \'' rule [19:10:03] (that is back from 2009-ish actually) [19:10:05] actually, looking at this [19:10:23] the sql looks like what the user asked for [19:10:38] how so? [19:10:59] distinct is "final selection grouper" not "first selection" [19:11:00] well, they created a resultset of [19:11:04] i.e. it is inherently lazy [19:11:29] right, it does not take effect on the current selection, didn't from before I took over [19:11:53] SELECT year, COUNT(me.cdid) AS cnt .... [19:11:57] and then grabbed the 'cnt' column [19:13:06] ribasushi: ResultSet originally (0.05000) did '$attrs->{group_by} ||= $attrs->{select} if delete $attrs->{distinct};' in new() [19:13:15] so, yes, it takes effect on the current selection [19:13:36] so sql is correct according to the original behaviour as implemented [19:13:36] mst: but it did this in *_resolve_attrs* [19:13:55] which is after the selection is modified by RSETColumn [19:14:15] or to put it diff. - RSetColumn does not do a subquery, just like a ->search doesn't [19:14:30] it didn't originally, it did it immediately you supplied the distinct attribute to a search [19:14:40] anything else is a bug; fix that and we don't need this crap [19:14:52] well - when I came around it was lazified via the late resolver [19:15:08] and I had bugs reported when that behavior changed later on when I was doin prefetches [19:15:17] so this is crap in production for years [19:15:41] mst: https://github.com/dbsrgits/dbix-class/blob/master/t/search/distinct.t behaviors on non-aggregates [19:15:55] it accreted stuff obviously, the ages of tests are different [19:17:01] mst: basically this boils to "is distinct => 1 lazy or eager" and I am afraid this ship sailed around 0.07 [19:17:17] "fixing" it will serve nothing but bring breakage [19:17:48] yeah, it looks like it got broken when _resolved_attrs was factored out of new() [19:17:53] nod [19:18:28] so - if we assume "distinct is lazy" - you see why I would go the way I did with the latest fix [19:19:22] ribasushi: distinct is lazy works fine, you just have to disallow aggregates in the select list in that case [19:19:30] and make them do an explicit group_by for that [19:20:31] mst: aggregates via the {} syntax have been supported correctly forever - that is distinct knows to ignore them [19:20:45] mst: this test is more or less "aggregate in literal" (due to being an alias) [19:20:47] even as far as 08000 it was straight up [19:20:51] so I can't disallow it [19:20:56] because it is... a literal [19:20:59] $attrs->{group_by} ||= $attrs->{select} if delete $attrs->{distinct}; [19:21:11] so if we started allowing them via the {} syntax that wasn't me [19:21:17] mst: I came around 0.08010 ;) [19:22:01] ribasushi: $attrs->{group_by} ||= $attrs->{select} if delete $attrs->{distinct}; [19:22:07] ribasushi: in 08011 which is the first release you did [19:22:34] mst: it is very possibkle that I broke it when I wasn't knowing wtf is on [19:22:46] I think you made lazy distinct work "better" [19:22:51] when it shouldn't've been lazy [19:22:59] mst: in any case - it's old shit, so even if I am responsible it still boils to "people depend on that" [19:23:13] mst: nod, may very well be my rookie mistake ;) [19:23:15] sure. at this point it's "I fucked it up, then you made it worse, we'll have to deprecate it" [19:23:19] most likely [19:23:34] and the only question is damage control so we don't break current usages while isolating the code [19:24:48] deprecating it - I wouldn't go that far, I use distinct a lot myself because it does the group construction for me without me thinking [19:25:01] juggling selection/ordering is hard enough ;) --- diff --git a/251_TODO b/251_TODO deleted file mode 100644 index 4d77269..0000000 --- a/251_TODO +++ /dev/null @@ -1,5 +0,0 @@ -List of things riba needs to clear out before next official in order -of importance: -(Keep Getty happy) - -- Clarify/warn on the distinct over multiple columns get_column() diff --git a/Changes b/Changes index 8529a2e..3f77630 100644 --- a/Changes +++ b/Changes @@ -22,6 +22,8 @@ Revision history for DBIx::Class an exception object with broken string overloading - Clarify ambiguous behavior of distinct when used with ResultSetColumn i.e. $rs->search({}, { distinct => 1 })->get_column (...) + - Explicitly deprecate combination of distinct and selecting a + non-column via $rs->get_column() - Setting quote_names propagates to SQL::Translator when producing SQLite DDL (it is one of the few producers *NOT* quoting by default) - Fix incorrect binding of large integers on old versions of diff --git a/lib/DBIx/Class/ResultSetColumn.pm b/lib/DBIx/Class/ResultSetColumn.pm index 3b1f875..1e2a0eb 100644 --- a/lib/DBIx/Class/ResultSetColumn.pm +++ b/lib/DBIx/Class/ResultSetColumn.pm @@ -110,14 +110,11 @@ sub new { } } - # collapse the selector to a literal so that it survives a possible distinct parse - # if it turns out to be an aggregate - at least the user will get a proper exception - # instead of silent drop of the group_by altogether - my $new = bless { - _select => \ $rsrc->storage->sql_maker->_recurse_fields($select), + return bless { + _select => $select, _as => $column, - _parent_resultset => $new_parent_rs }, $class; - return $new; + _parent_resultset => $new_parent_rs + }, $class; } =head2 as_query @@ -478,12 +475,33 @@ sub throw_exception { sub _resultset { my $self = shift; - return $self->{_resultset} ||= $self->{_parent_resultset}->search(undef, - { - select => [$self->{_select}], - as => [$self->{_as}] + return $self->{_resultset} ||= do { + + my $select = $self->{_select}; + + if ($self->{_parent_resultset}{attrs}{distinct}) { + my $alias = $self->{_parent_resultset}->current_source_alias; + my $rsrc = $self->{_parent_resultset}->result_source; + my %cols = map { $_ => 1, "$alias.$_" => 1 } $rsrc->columns; + + unless( $cols{$select} ) { + carp_unique( + 'Use of distinct => 1 while selecting anything other than a column ' + . 'declared on the primary ResultSource is deprecated - please supply ' + . 'an explicit group_by instead' + ); + + # collapse the selector to a literal so that it survives the distinct parse + # if it turns out to be an aggregate - at least the user will get a proper exception + # instead of silent drop of the group_by altogether + $select = \ $rsrc->storage->sql_maker->_recurse_fields($select); + } } - ); + + $self->{_parent_resultset}->search(undef, { + columns => { $self->{_as} => $select } + }); + }; } 1; diff --git a/t/71mysql.t b/t/71mysql.t index 1e31e8c..242989e 100644 --- a/t/71mysql.t +++ b/t/71mysql.t @@ -3,6 +3,7 @@ use warnings; use Test::More; use Test::Exception; +use Test::Warn; use DBI::Const::GetInfoType; use Scalar::Util qw/weaken/; @@ -347,16 +348,22 @@ ZEROINSEARCH: { select => [ \ 'YEAR(year)' ], as => ['y'], distinct => 1, }); - is_deeply ( - [ sort ($rs->get_column ('y')->all) ], + my $y_rs = $rs->get_column ('y'); + + warnings_exist { is_deeply ( + [ sort ($y_rs->all) ], [ sort keys %$cds_per_year ], 'Years group successfully', - ); + ) } qr/ + \QUse of distinct => 1 while selecting anything other than a column \E + \Qdeclared on the primary ResultSource is deprecated\E + /x, 'deprecation warning'; + $rs->create ({ artist => 1, year => '0-1-1', title => 'Jesus Rap' }); is_deeply ( - [ sort $rs->get_column ('y')->all ], + [ sort $y_rs->all ], [ 0, sort keys %$cds_per_year ], 'Zero-year groups successfully', ); @@ -367,11 +374,14 @@ ZEROINSEARCH: { year => { '!=', undef } ]}); - is_deeply ( + warnings_exist { is_deeply ( [ $restrict_rs->get_column('y')->all ], - [ $rs->get_column ('y')->all ], + [ $y_rs->all ], 'Zero year was correctly excluded from resultset', - ); + ) } qr/ + \QUse of distinct => 1 while selecting anything other than a column \E + \Qdeclared on the primary ResultSource is deprecated\E + /x, 'deprecation warning'; } # make sure find hooks determine driver diff --git a/t/88result_set_column.t b/t/88result_set_column.t index cc6f68b..b1c1e96 100644 --- a/t/88result_set_column.t +++ b/t/88result_set_column.t @@ -277,7 +277,7 @@ is_same_sql_bind ( # would silently drop the group_by entirely, likely ending up with nonsensival results # With the current behavior the user will at least get a nice fat exception from the # RDBMS (or maybe the RDBMS will even decide to handle the situation sensibly...) - is_same_sql_bind( + warnings_exist { is_same_sql_bind( $rstypes->{'implicitly grouped'}->get_column('cnt')->as_query, '( SELECT COUNT( me.cdid ) @@ -294,7 +294,10 @@ is_same_sql_bind ( => 'evancarrol' ] ], 'Expected (though nonsensical) SQL generated on rscol-with-distinct-over-function', - ); + ) } qr/ + \QUse of distinct => 1 while selecting anything other than a column \E + \Qdeclared on the primary ResultSource is deprecated\E + /x, 'deprecation warning'; { local $TODO = 'multiplying join leaks through to the count aggregate... this may never actually work'; diff --git a/t/search/subquery.t b/t/search/subquery.t index 382a46d..a281fe9 100644 --- a/t/search/subquery.t +++ b/t/search/subquery.t @@ -46,7 +46,7 @@ my @tests = ( ], }, sqlbind => \[ - "( SELECT (SELECT id FROM cd me LIMIT ?) FROM artist me )", + "( SELECT (SELECT me.id FROM cd me LIMIT ?) FROM artist me )", [ $ROWS => 1 ], ], }, @@ -59,7 +59,7 @@ my @tests = ( ], }, sqlbind => \[ - "( SELECT me.artistid, me.name, me.rank, me.charfield, (SELECT id FROM cd me LIMIT ?) FROM artist me )", + "( SELECT me.artistid, me.name, me.rank, me.charfield, (SELECT me.id FROM cd me LIMIT ?) FROM artist me )", [ $ROWS => 1 ], ], },