pause icon indicating copy to clipboard operation
pause copied to clipboard

Why is Apache2::RequestRec in the index?

Open thaljef opened this issue 12 years ago • 11 comments

I'm trying to figure how Apache2::RequestRec (from mod_perl-2.0.8) got into the 02packages index. The distribution does not contain a file named Apache2/RequestRec.pm nor is any such package listed in the META.

However, there is an Apache2::RequestRec package declared in a DummyVersions.pm file. This code suggests that if a package has ever been a simile in the past, then PAUSE will always put it in the index, even if it the package name doesn't actually match the file name any more. But I looked through BackPAN and could not find any authorized release containing a real (i.e. simile) Apache2::RequestRec module.

So I'm stumped. Can you help me understand what PAUSE is doing here?

Thanks!

thaljef avatar Apr 18 '13 04:04 thaljef

AFAIK, packages are indexed by PAUSE if they're declared (on a single line!) in any source file that isn't explicitly excluded. Packages aren't required to be in a .pm of their own name.

The 'package Apache2::RequestRec;' line in DummyVersions.pm is enough to include it in the index. Why do you think it should be ignored?

Regardless, there is also a more meaty package definition here: https://metacpan.org/source/PHRED/mod_perl-2.0.8/lib/Apache2/compat.pm#L454

tsibley avatar Apr 18 '13 05:04 tsibley

The 'package Apache2::RequestRec;' line in DummyVersions.pm is enough to include it in the index.

That's what I thought too. But somehow I had convinced myself that the simile method invoked here effectively excludes packages from the index if the package name doesn't actually match the file name.

Now that I look around, I see lots of packages in the index that don't match the file name. So my original premise is obviously wrong. But now I'm completely confused about what the simile thing means.

thaljef avatar Apr 18 '13 05:04 thaljef

Here's the discussion I had with Andreas about simile() a while back:

Hi Andreas-

The indexing of Poe-Test-Loops shows that it contains two packages: Poe::Test::Loops and Poe::Test::DondeEstan. But if you look inside the tarball, it contains several other .pm files which have other package declarations (notably Poe::Kernel) like this:

https://metacpan.org/source/RCAPUTO/POE-Test-Loops-1.351/lib/POE/Test/Loops/comp_tcp.pm#L21

The META for the dist does not specify "provides" nor does it specify "noindex", so PAUSE must have scanned files for the packages. I don't think the author intended for those packages to be indexed, so PAUSE actually did the right thing. But my question is: why?

simile()

A file with a name "foo.pm" can only be indexed for a package name /.*\bfoo$/

Other package names are ignored. That's not scientifically correct but so far it has turned out to be good enough.

thaljef avatar Apr 18 '13 06:04 thaljef

Read the code and comments of simile() again. It doesn't exclude packages/files from indexing; it flags whether to index them differently. That is, it indicates if the package "Baz::Bat::Foo" is found in a "Foo.pm" file (either under a Baz/Bat/ directory hierarchy or not). If it isn't, then it's an inner package declaration and it handles setting the version differently (see this else block). Notably, the META.* data isn't consulted and if a version was already found for the package it doesn't overwrite it.

This behaviour lets an inner package such as Apache2::RequestRec be indexed just fine without ever having its own file.

tsibley avatar Apr 18 '13 18:04 tsibley

I have tried, but I find the PAUSE code rather hard to understand. And so far, my "understandings" have all proven to be wrong.

I thought it might help if I stepped through it with a debugger. But merely getting the code to run has been a challenge. There is a Makefile.PL but I gave up when it insisted on having a mod_perl from around 1998. But that is a separate gripe.

Are you aware of any non-code specification of how PAUSE indexes a distribution? Or if you understand the code well enough, could I persuade/bribe you to write such a specification?

thaljef avatar Apr 18 '13 20:04 thaljef

I am not aware of any such specifications. PAUSE grew organically. I understand parts of the code and indexing process, the ones I've had reason to examine while working on rt.cpan.org or metacpan. Sorry, but I doubt you could bribe me to dig in and turn that into a spec. :) There are better people to proposition...

tsibley avatar Apr 18 '13 21:04 tsibley

The indexing process now has some passable test coverage at least in the rjbs case-fix branch.

 $ prove -l t/mldistwatch.t

Presumably, you could run t/mldistwatch.t with a debugger and watch that.

On Thu, Apr 18, 2013 at 4:55 PM, Jeffrey Ryan Thalhammer < [email protected]> wrote:

I have tried, but I find the PAUSE code rather hard to understand. And so far, my "understandings" have all proven to be wrong.

I thought it might help if I stepped through it with a debugger. But merely getting the code to run has been a challenge. There is a Makefile.PL but I gave up when it insisted on having a mod_perl from around 1998. But that is a separate gripe.

Are you aware of any non-code specification of how PAUSE indexes a distribution? Or if you understand the code well enough, could I persuade/bribe you to write such a specification?

— Reply to this email directly or view it on GitHubhttps://github.com/andk/pause/issues/42#issuecomment-16609375 .

David Golden [email protected] Take back your inbox! → http://www.bunchmail.com/ Twitter/IRC: @xdg

dagolden avatar Apr 18 '13 21:04 dagolden

So I have finally figured out the simile thing in PAUSE, which explains why Apache2::RequestRec is indexed, but other packages that don't match the file name are not. I believe the basic rule is this:

PAUSE will index a package that does not match the filename but only if that package name has never been seen before in a filename that it did match.

I think I understand the motivation, but this feels really icky to me. It means the contents of the PAUSE index is dependent on every module it has ever seen (not just the ones that are actually index). That could make it hard for DarkPANs and PAUSE to co-exist peacefully, since only PAUSE really knows how a distribution should be indexed.

I really wish the indexing logic was pushed further down the toolchain into the authoring tools (e.g. the provides metadata)

thaljef avatar May 24 '13 08:05 thaljef

Jeffrey Ryan Thalhammer [email protected] writes:

So I have finally figured out the simile thing in PAUSE, which explains why Apache2::RequestRec is indexed, but other packages that don't match the file name are. I believe the basic rule is this:

PAUSE will index a package that does not match the filename but only if
that package name has never been seen before in a filename that it did
match.

I can't confirm that this logic is on place, although I can't really deny it, it may be a bug.

I think I understand the motivation, but this feels really icky to me.

Agree.

It means the contents of the PAUSE index is dependent on every module it has ever seen (not just the ones that are actually index). That could make it hard for DarkPANs and PAUSE to co-exist peacefully, since only PAUSE really knows how a distribution should be indexed.

I hope you're just misreading things.

I really wish the indexing logic was pushed further down the toolchain into the authoring tools (e.g. the provides metadata)

It has long been pushed down to "provides". But it's optional to the author. People who use valid provides (for some values of valid), get exactly indexed what they say they want to have indexed.

andreas

andk avatar May 29 '13 05:05 andk

It means the contents of the PAUSE index is dependent on every module it has ever seen

I hope you're just misreading things.

I may have overstated things a bit. It means the indexing of any given package is dependent on whether that package was a simile the last time it was indexed.

But the basic problem remains: indexing is not an isolated phenomenon because it depends on history. So only PAUSE is truly capable of creating the index we have now.

I'm 85% certain that my interpretation of the code is correct. And I don't think it is a bug -- it is a deliberate feature (or misfeature, depending on your point of view).

It isn't the end of the world. But @dagolden and @rjbs should watch out for this as they continue hacking on PAUSE.

thaljef avatar May 29 '13 09:05 thaljef

Jeffrey Ryan Thalhammer [email protected] writes:

I'm 85% certain that my interpretation of the code is correct.

Would you mind bumping that to 99%, maybe posting some evidence where the code sits that does so wrong?

andreas

andk avatar May 30 '13 03:05 andk