From: Peter Rabbitson Date: Thu, 28 Jul 2016 12:59:15 +0000 (+0200) Subject: Merge the ResultSource diagnostics rework X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?p=dbsrgits%2FDBIx-Class.git;a=commitdiff_plain;h=1cf609901b03af9464242860f0b870f0056538fc Merge the ResultSource diagnostics rework ...And this is what the products that we make do: these are the consequences. They either empower people, or they steal bits of their lives. Because experiences are all we have in life: if you think about them as grains of sand in an hour glass, once those grains are gone – they are gone. And experiences with people and experiences with things: they use up the same grains. That's why we have a profound responsibility to respect the experiences of the people that we build for... -- Aral Balkan: Free is a Lie TNW 2014 https://youtu.be/upu0gwGi4FE?t=1548 This set of commits is unusual - the 2+kloc of changes (in lib/ alone) do not add any new runtime functionality, nor do these changes alter significantly any aspect of DBIC's runtime operation. Instead this is a culmination of a nearly 4 months long death-march [1] ensuring the increasingly complex and more frequent ( courtesy of rising use of Moo(se) ) failure modes can be reasoned about and acted upon by ordinary users, without the need to reach out to a support channel. The changeset has been extensively tested against 247 downstream CPAN dists (as described at the end of commit 12e7015a) and against several darkpan test suites. As of this merge there are no known issues except RT#114440 [2] and a number of dists (enumerated in 12e7015a) now emitting *REALLY LOUD* though warranted and actionable, diagnostic messages. The diagnostic is emitted directly on STDERR - this was a deliberate choice designed to: 1) prevent various test suites from failing due to unexpected warnings 2) make the warnings *harder* to silence by a well meaning but often too eager-yet-not-sufficiently-dilligent staffer, before the warnings had a chance to reach a senior developer What follows is a little bit of gory technical details on the commit series, as the work is both generic/interesting enough to be applied to other large scale systems, and is "clever" enough to not be easily reasoned about without a summary. Think of this as a blog post within an unusual medium ;) === BACKGROUND ============== Some necessary history: DBIC as a project is rather old [3][4]. When it got started Moose wasn't a thing. Neither (for perspective) was jQuery or even Tw(i)tt(e)r. The software it was modeled on (Class::DBI) has "single-level" metadata: you have one class per table, and columns/accessor were defined on that class and that was it. At the time mst made the brilliant decision to keep the original class-based API (so that the CDBI test suite can be reused almost verbatim, see ea2e61bf) while at the same time moving the metadata to a "metaclass instance" of sorts. The way this worked was for each level of: - Individual Result Class (class itself, not instance) - Result Class attached to a Schema class - Result Class attached to a Schema instance to have a separate copy-on-the-spot created metadata instance object of DBIx::Class::ResultSource. One can easily see this by executing: ~/dbic_checkout$ perl -Ilib -It/lib -MDBICTest -MData::Dumper -e ' my $s = DBICTest->init_schema; $Data::Dumper::Maxdepth = 1; warn Dumper [ DBICTest::Schema::Artist->result_source_instance, DBICTest::Schema->source("Artist"), $s->source("Artist"), ] ' The technique (and ingenious design) worked great. The downside was that nobody ever really audited the entire stack past the original implementation. The codebase grew, and mistakes started to seep in: sometimes modifications (add_columns, etc) would happen on a derivative metadata instance, while the getters would still be invoked on the "parent" (which at this point was oblivious of its "child" existence, and vice versa). In addition there was a weird accessor split: given a result instance one could reach *different* metadata instances via either result_source() or result_source_instance(). To add insult to the injury the latter method is never defined anywhere, and was always dynamically brought to life at runtime via an accessor maker call on each individual class [5]. If that wasn't bad enough, some (but crucially *not* all) routines used to manipulate resultsource metadata were proxied [6] to the main Result classes, also aiming at allowing the reuse of the existing Class::DBI test suite, and to provide a more familiar environment to Class::DBI converts. The complete map of current metadata manipulation methods and their visibility from a typical ResultClass can be seen at the end of commit message 28ef9468. The downside was that to an outsider it would seem only natural that if in order to make something metadata-related happen, one normally calls: SomeResultClass->set_primary_key then it makes sense that one should be able to override it via: sub SomeResultClass::set_primary_key { my $ret = shift->next::method(@_); { do extra stuff } } That thinking has been applied to pretty much all straight-pass-through getters in the wild, with the expectation that DBIC will respect them throughout, like e.g. [7]. In reality this never happened - half of DBIC would never even look at the Result class and instead simply called the needed method on the result source instance directly. As noted in 28ef9468: the overwhelmingly common practice is to hook a method in a Result class and to "hope for the best". A rare example of "doing it right" would be DBIx::Class::ResultSource::MultipleTableInheritance [8], but as can be seen from its SYNOPSIS the API is rather counterintuitive ( what is table_class() anyway?! ) and more importantly - the earlier example seems "just right". Another innovation (remember: pre-Moose) was the use of the just-in-time implemented [9] alternative C3 method resolution order (MRO)[10] right on top of the default perl DFS MRO. While DBIC used multiple inheritance (MI) from the start, with all the corresponding problems [11][12][13] and non-scalable "solutions" [14], it wasn't until C3 MRO became available that the true potential of the resulting plugin system became clear. To this day (mid-2016) MI, as used within the DBIC ecosystem, remains the single most flexible (and thus superior given the problem domain) plugin-system on CPAN, easily surpassing rigid delegation, and having an upper hand on role-based solutions as promoted by the Moo(se) ecosystem. It must be noted that delegation and/or roles are not without uses - they are an excellent (and frankly should be a default) choice for many application-level systems. It is the mid-level to low-level libraries like DBIC, where the stateless nature of a predictable yet non-coordinated call-order resolution truly begins to shine. === PROBLEM(S) ============== Things stayed undisturbed for a while, until around 2012~2013 folks started showing up with more and more complaints which all traced to Moo(se)-based subclassing. Originally the C3 MRO composition worked just fine, because almost invariably a ->load_components() call (which explicitly switches the callER MRO) would have happened early enough in the life of any end-user Result/ResultSource class. But when extends()/with() got more prominent this was lost. The more complex the inheritance chain - the more likely that the topmost leaf class is in fact stuck under DFS mro with everything going sideways from there. Sometimes with truly mindbending failure cases as [15]. There was no clear solution at the time, and aside from some toothless documentation warnings [16] nothing was done to address this (in fact even the doc-patch itself is incomplete as noted in [17]). The inconsistencies, and the resulting mistakes, however, were all localized, and even though the problems were often major, each instance was sufficiently different (and bizarre) that each individual deployment could neither report them properly, nor find the time to reason through the layers of history in order to arrive at a solution they fully understand. Yet the original design which solidified towards the end of 2007 was *just* good enough to keep being kicked down the road. But people kept writing more and more MOP-inspired stuff. Given the general tendency of perl code to get "all over the place", the desire was only natural to standardize on "one true way" of doing OO throughout an entire end-user project/app. And there were more and more ways in the wild to combine/abstract individual Result classes and ResultSet components. The comprehensive DBIx::Class::Helpers [18] are just the tip of the heap of all possible permutations DBIC is exposed to. Towards mid-2015 it became utterly untenable to brush off problems with "meh, just don't do that and all will be be fine". On the personal front I first ran into the baroque jenga tower head-on when I tried to make sense of the ResultSource subsystem in an airport lounge pre-YAPC::EU 2011 (Riga). I honestly do not remember *why* I started digging in this direction but the result of that attempt (and the later effort to revive it) got immortalized in my local tree [19]. Enough said. Next was the dash to implement sane relationship resolution semantics in 03f6d1f7, and then in 350e8d57 (which was actually needed to allow for d0cefd99 to take place... sigh). During that journey 4006691d made a subtle but fatal in the long run change - it upset the balance of which source instance object we looked at during *some* (but not all) codepaths. The really sad part is that I had the feeling that something is not right, and even made a record of it as the last paragraph of 350e8d57. But light testing did not reveal anything, and I irresponsibly shipped everything as-is a bit later. It wasn't until Oct 2015 that someone noticed this being an actual problem [20]. Early attempts to fix it quickly demonstrated just how deep the rabbit hole goes, and were the main reason the entirety of this work was undertaken: the accumulated debt simply did not leave any room for a half-way solution :/ === SOLUTION(S) =============== The writeup below describes only the final set of commits: it does not cover driving into and backing out of at least 3 dead-ends, nor does it cover the 5 distinct rewrites and re-shuffles of the entire stack as more and more involved testing revealed more and more involved failure modes. I must stress that if you plan to undertake a similar crusade against another projects architectural debt you are in for a rough (but *not* impossible!) ride. The height of the "tenacity-bar" necessary to pull off such work is not reflected in any way within the seemingly effortless walkthrough that follows. It is also worth acknowledging that the code at times is incredibly terse and hard to follow: this was a deliberate choice as the extra diagnostic sites that are enabled during runtime had to be implemented as "close to the VM", so to speak, as possible. In isolation none of the contortions are warranted, but because I ended up with so many of them the result does pay off. See comments within individual commit messages for various performance impacts for more info. As first order of business some mechanism was needed to track the logical relationship between the 3 levels of ResultSource instances as shown earlier in this writeup. Luckily, the user-unfriendly nature of the metadata stack meant there are very few spots on CPAN (and to the best of my knowledge on DarkPAN) that do anything exotic with the subsystem. This means the simplest thing would in fact work and was implemented as 534aff61: corral all instantiations of ResultSource objects (and Schema objects while we are at it) [21]. This code ensured that nothing in the stack will create an instance of either class-type without our knowledge. With that in place, we also provide an explicit clone method [22] encouraging folks to use that whenever possible. The switch of all relevant callsites within DBIC itself was verified through another check within new [23], guarded by the same compile-time assertion constant (which in turn was provided by both the CI and the local smoke-script from 5b87fc0f) With the above in place, ensuring 99.99% of the ResultSource "derivative" instances were obtained via $rsrc->clone, it was time for 0ff33686. A simple private registry hash with object addresses as keys and this hash as values: { derivatives => { addr_derived_rsrc_1 => $reference_to_infohash_of_derived_rsrc_1, addr_derived_rsrc_2 => $reference_to_infohash_of_derived_rsrc_2, ... }, weakref => $weak_reference_of_self, } As necessary for any structure holding addresses of object references, a CLONE "renumbering" routine takes care of keeping everything in sync on iThread spawns (if you believe that iThreads are evil and one shouldn't go through the trouble: be reminded that any call of fork() within a Win32 perl is effectively an iThread, and fork() can and *is* being called by some CPAN modules [24] implicitly). Now that we had a good handle on "what came from where", the first major diagnostic milestone 73f54e27 could be covered. As can be seen in the table of methods in commit 28ef9468 there are only a handful of attributes on an actual ResultSource class. A couple new Class::Accessor::Grouped method types were added, which would behave just like the 'simple' and 'component_class' they were replacing, but with a twist: - any setter-based change would record its callsite in any derivative that was being tracked by 0ff33686, effectively marking that derivative stale - any getter call would consult its own entry in the metadata instance "stale log", and complain that things have moved on based on the callsite the setter left earlier The result is the exact warning as described in commit message 73f54e27. Of course there are some extra considerations - some high-level setters (e.g. remove_columns) do call a getter underneath to do their job. These cases had to be short-circuited by using a local()-based "setter callstack" mark. But in general the changeset has been surprisingly non-invasive: once the proper hook points were identified the rest was a breeze. There was also a brief scratching of heads when the last stages of DarkPAN tests emitted errors which I myself could not explain for a while, until the reason (and trivial solution) were identified in d56e05c7 and [25]. As a brief detour, I considered switching ResultSource to a proper Moo class, but quickly abandoned this idea as there are no provision for clean get-time triggers. Nevertheless the attempt was a useful demonstration what does it take to switch a low-level class (which means many somewhat questionable uses by consumers in the wild) to Moo(se) with zero loss of functionality. The result is preserved for posterity as 8ae83f0e [26]. While working on the above and f064a2ab (the solution to RT#107462 [20]), it occurred to me that the confusion of having both result_source_instance() and result_source() can be reduced further by forcing all "getter" calls to go through result_source() which is defined in Row.pm and is thus always available. The result was the improved diagnostic as described in the commit message of e570488a, but also a useful set of assertions that were used to weed out many of the wrinkles [27]. The next major step was to resolve once and for all the fallout from incorrect inheritance composition. The highly dynamic nature of all Perl's programs, an "eternal compile/execute/compile/execute... cycle", meant that just "fixing things" as DBIC sees them would not work - calling set_mro() could do little when called late enough. This led to the revert of the originally-promising "forced c3-fication" of the stack 7648acb5. Instead the practical design turned out to be "let the user know and carry on". The first part of getting there was to devise a way to precisely and very quickly tell "what does a class look like right now?" I have been brooding over how to do this since mid-February, but it wasn't until I noticed the excellent App::Isa::Splain [28] by @kentfredric, that the final interface came into focus: 296248c3 (with several minor fixups later on). Here I want to take a moment to apologize to @kentfredric, as he was led on a several week long wild-goose chase due to a misguided comment of mine [29] :( Amusingly while implementing this I hit a wall related to perl 5.8 (for the first time in 6+ years): As stated in the timings at the end of commit message 296248c3 and as elaborated in [30] - the non-core MRO is just too expensive to work with. This resulted in a 1.5 week long detour to try to squeeze every last ounce of performance. Amusingly I ran into a lot of "interesting" stuff along the way [31][32][33] The result was not only a semi-usable 5.8 implementation, but even running on 5.10+ was sped up about 2 times in the end, which translated into tangible gains in the end: the number cited as 16% in 12e7015a was originally 28%(!). The moral of this story? - gerontoperlia [34] makes your modern foundation code better. With a reliable way to tell what each methods "variant stack" looks like, it was trivial to implement the 'valid_c3_composition' part of ::SanityChecker - one would simply check a class' MRO, and in case of 'dfs' compare all stacks to what they would look like if the MRO were 'c3' [35]. In parallel but unrelated to the above the ever increasing tightening of various DBIC internal callpaths ( e5053694, d99f2db7, 3b020224 ) had to be addressed in some way. The urgency truly "hit home" when testing revealed RT#114440 [2] - it was nothing short of a miracle this code survived that long without being utterly broken by other components. The solution came out of crossing the work on describe_class_methods (296248c3) with the concept of the fail_on_internal_call guard (77c3a5dc). We already have a list of method "shadowing stacks" (to borrow @kentfredric's terminology) - if we find a way to annotate methods in a way that we can tell when a "non-overrideable" method was in fact overridden - we will be able to report this to the user. The somewhat fallen out of favor subsystem of function attributes was chosen to carry out the "annotation" task. It must be noted that this is one of the few uses of attributes on CPAN that is architecturally consistent with how attributes were originally implemented. An attribute is meant to attach to a specific reference ( in our case a code reference ), instead of a name. This is also why the FETCH/MODIFY_type_ATTRIBUTE API operates strictly with references. As an illustration why tracking attributes by name is fraught with peril consider the following: perl -e ' use Data::Dumper; use Moose; use MooseX::MethodAttributes; sub somemethod :Method_expected_to_always_returns_true { return 1 }; around somemethod => sub { return 0 }; warn Dumper { attributes => __PACKAGE__->meta->get_method("somemethod")->attributes, result => __PACKAGE__->somemethod }; ' It should also be noted that as of this merge describe_class_methods lacks a mechanism to "see" code references captured by around-type modifiers, and by extension the "around-ed" function's attributes will not appear in the "shadowed stack". A future modification of Class::Method::Modifiers, allowing minimal introspection of what was done to which coderef should alleviate most of this problem. Once all relevant methods were tagged with a 'DBIC_method_is_indirect_sugar' attribute in 1b822bd3, it was trivial to implement the schema sanity check no_indirect_method_overrides which simply ensures no user-provided method "shadows" a superclass method with the 'sugar' attribute set [36]. The success of the attribute-based approach prompted a pass of annotating all the methods DBIC generates for one reason or another: 09d8fb4a. Aside from enabling the last improvement, it also allowed to replicate a part of the DBIx::Class::IntrospectableM2M functionality in core, without elevating the status of the m2m sugar methods in any way (the historic labeling of these helpers as relationships is a long standing source of confusion). See the commit message of 09d8fb4a for a couple use-cases. The last piece of the puzzle 28ef9468 addressed the "override and hope for the best" duality of ResultSource proxied methods as described at the start of this writeup and at [37]. What we essentially do is add an around() [38] for every method in ResultSource, which then checks whether it was called via ResultSourceProxy (inherited from DBIx::Class::Core), or directly via the ResultSource instance: i.e. MySchema::Result::Foo->proxied vs $rsrc->proxied IFF we are called directly and there *is* an override of the same method on the currently-used $rsrc->result_class we either follow one of the options as given by an attribute annotation [37], or we emit a diag message so that the user can do something about it. That was easy wasn't it? === FINAL THOUGHTS ================== This work took about 50 person-days to carry out, and for obvious reasons expanded to cover a much larger period of actual wall-time. While I am by far not the most efficient developer that I have met, I am pretty sure that the process of planning, designing, implementing and testing all of this could not have been significantly accelerated. Even at the (laughable) rate of $50/h The Perl Foundation is willing to pay for unique talent [39] this endeavor would cost at least $20,000 USD - way beyond the scope (and aim?) of a TPF grant. On the other hand it would be surprising if this work can be qualified as unnecessary. I personally estimate that the savings due to the proper diagnostics alone will "make up" for the effort within the first month of wide deployment of these improvements. Time will tell of course, as the stream of questions is only about to start come the first days of August. In any case - this project is by far not the only one in dire need of such "humane" overhaul. Moo, Catalyst, various pieces of the toolchain, and other staples of what is known as "modern perl5" are in similar or worse shape: a situation which can *not* be rectified simply by "writing patches" without a concerted effort directed by a single [40] dedicated individual. I yet again strongly urge the "powers of Perl" to reconsider their hands-off approach to funding the consistently shrinking pool of maintainers. *PLEASE* consider stealing (in the spirit of tradition) the proven successful model of RubyTogether [41] before you end up losing even more maintainers like myself. Peter "ribasushi" Rabbitson Outgoing maintainer of a cornerstone Perl5 ecosystem project ( in the future you may be needing https://archive.org/web/ to see some of these ) [1] https://gist.github.com/ribasushi/6ea33c921927c7571f02e5c8b09688ef [2] https://rt.cpan.org/Ticket/Display.html?id=114440#txn-1627249 [3] http://static.spanner.org/lists/cdbi/2005/07/25/90c9f5f1.html [4] http://lists.digitalcraftsmen.net/pipermail/classdbi/2005-August/000039.html [5] https://metacpan.org/source/RIBASUSHI/DBIx-Class-0.082840/lib/DBIx/Class/ResultSourceProxy/Table.pm#L17-21 [6] https://metacpan.org/source/RIBASUSHI/DBIx-Class-0.082840/lib/DBIx/Class/ResultSourceProxy.pm#L53-87 [7] https://metacpan.org/source/VANSTYN/RapidApp-1.2000/lib/RapidApp/DBIC/Component/VirtualColumnsExt.pm#L52-67 [8] https://metacpan.org/pod/DBIx::Class::ResultSource::MultipleTableInheritance#SYNOPSIS [9] https://twitter.com/hashtag/dammitstevan [10] https://en.wikipedia.org/wiki/C3_linearization [11] https://en.wikipedia.org/wiki/Multiple_inheritance#The_diamond_problem [12] http://static.spanner.org/lists/cdbi/2005/07/25/caf44f84.html [13] http://static.spanner.org/lists/cdbi/2005/07/26/e593c147.html [14] http://static.spanner.org/lists/cdbi/2005/07/26/ea509a6a.html (... tell people "Be Careful!" ) [15] https://blog.afoolishmanifesto.com/posts/mros-and-you/ [16] https://metacpan.org/pod/DBIx::Class::ResultSet#ResultSet-subclassing-with-Moose-and-similar-constructor-providers [17] https://github.com/dbsrgits/dbix-class/pull/49#issuecomment-47637403 [18] https://metacpan.org/release/DBIx-Class-Helpers [19] http://i.imgur.com/A3acsCD.png [20] https://rt.cpan.org/Ticket/Display.html?id=107462 [21] https://github.com/dbsrgits/dbix-class/blob/534aff61/lib/DBIx/Class/_Util.pm#L1082-L1135 [22] https://github.com/dbsrgits/dbix-class/blob/534aff61/lib/DBIx/Class/ResultSource.pm#L160-L184 [23] https://github.com/dbsrgits/dbix-class/blob/534aff61/lib/DBIx/Class/ResultSource.pm#L126-L143 [24] http://grep.cpan.me/?q=my+%5C%24pid%3D+fork+dist%3DXML-Twig [25] https://github.com/ctrlo/GADS/pull/9/files [26] http://git.shadowcat.co.uk/gitweb/gitweb.cgi?p=dbsrgits/DBIx-Class-Historic.git;a=commitdiff;h=8ae83f0e [27] https://github.com/dbsrgits/dbix-class/blob/e570488a/t/lib/DBICTest/BaseSchema.pm#L379-L528 [28] https://metacpan.org/pod/App::Isa::Splain#SYNOPSIS [29] https://github.com/kentnl/Devel-Isa-Explainer/issues/1#issuecomment-212248379 [30] https://github.com/dbsrgits/dbix-class/blob/12e7015a/lib/DBIx/Class/Schema/SanityChecker.pm#L92-L102 [31] https://twitter.com/ribasushi/status/753678208076242944 [32] https://github.com/dbsrgits/dbix-class/commit/296248c3#diff-c13797cc2e5864c4a1d6a92ba65871b6R801 [33] https://github.com/dbsrgits/dbix-class/blob/1cf2ad8b/lib/DBIx/Class/_Util.pm#L663-L664 [34] https://youtu.be/2Ln0YHtKgaI?t=3731 [35] https://github.com/dbsrgits/dbix-class/blob/12e7015a/lib/DBIx/Class/Schema/SanityChecker.pm#L484-L505 [36] https://github.com/dbsrgits/dbix-class/blob/12e7015a/lib/DBIx/Class/Schema/SanityChecker.pm#L359-L394 [37] https://github.com/dbsrgits/dbix-class/blob/28ef9468/lib/DBIx/Class/MethodAttributes.pm#L242-L298 [38] https://github.com/dbsrgits/dbix-class/blob/28ef9468/lib/DBIx/Class/ResultSourceProxy.pm#L137-L333 [39] http://news.perlfoundation.org/2016/02/grant-proposal-perl-6-performa.html#comment-38362169 [40] http://queue.acm.org/detail.cfm?id=2349257 ( ... quality happens only if somebody has the responsibility for it, and that "somebody" can be no more than one single person ) [41] https://rubytogether.org/roadmap --- 1cf609901b03af9464242860f0b870f0056538fc