Possibly a performance improvement
In the plain_modpow function, there are a lot of bitshifts and no comments, combined with the fact that I am more a mathematician than a programmer, I don't really know if this method of optimization is already implemented there, because I don't completely understand the code, so if it is, just close this issue.
Here is the optimization: Since we are looking at the residue class ring (Z/(m), +, *) where m refers to the modulo inside the equivalence classes, we say that an integer a in Z/(m) to the power of a natural number b is defined as the repeated multiplication of a b-times to itself so a*a*a*..*a b-times = a^b. Thus if we find the smallest natural number c <= b which has the property that a^c = 1, the neutral element of multiplication (this is especially fast with small sets that form the ring and large exponents). We firstly find out the result of b mod c =: d. Now we declare an integer e such that c * e +d = b. We can calculate this integer with the following formula e := floor(b/c) where "/" refers to division in the rationals and floor to the largest integer smaller than (or equal to) b/c with the classical order relation on the rationals. We arrive at the following identity: a^b = (a^c)^e * a^d = (1)^e * a^d = a^d since b = c*e+d.
Quick proof for b=c*e+d: e is defined as e = floor(b/c), thus floor(b/c) <= b/c with the property that floor(b/c) is the largest such integer. floor(b/c) <= b/c is equivalent to floor(b/c)*c <= b = b/c * c. It follows from the definition of floor(), that this floor(b/c)*c is the largest integer less than or equal to b that can be divided by c. From the definition of modulus we know that d (which is b mod c) is the residue of the division of b with c. It follows from those two properties that b = c*e+d. Q.E.D.
For this method to be useful the exponents have to be sufficiently large and the identity that has to be computed has to be known beforehand (especially efficient if we have rings that are often used) or the ring has to be sufficiently small. Note that here "sufficient small" can be considered quite huge numbers for humans.
In the plain_modpow function, there are a lot of bitshifts and no comments, combined with the fact that I am more a mathematician than a programmer, I don't really know if this method of optimization is already implemented there, because I don't completely understand the code, so if it is, just close this issue.
It's basically just exponentiation by squaring, so the bitshifting is extracting powers of 2 from the exponent. This gives us O(log e) mul-mod operations for exponent e.
Thus if we find the smallest natural number c <= b which has the property that a^c = 1, [...]
OK, sure, this looks like Euler's theroem.
For this method to be useful the exponents have to be sufficiently large and the identity that has to be computed has to be known beforehand (especially efficient if we have rings that are often used) or the ring has to be sufficiently small. Note that here "sufficient small" can be considered quite huge numbers for humans.
This is where I'm not sure it's an optimization we should directly implement. Knowing beforehand implies that we're caching known values, which I don't think we should do implicitly. Calculating it on the fly would only make sense if you can do this faster than what you'll save in that O(log e) complexity.
Maybe this exponent reduction should just be suggested in documentation? That if you're repeating modpow many times with the same modulus, calculating that identity may be worth it to reduce the overall computation.
This is where I'm not sure it's an optimization we should directly implement. Knowing beforehand implies that we're caching known values, which I don't think we should do implicitly. Calculating it on the fly would only make sense if you can do this faster than what you'll save in that O(log e) complexity.
That is true, maybe an option would be to cache it explicitly or provide a function to do the reduction manually. I.e. for my applications, I don't often find myself switching to a different ring.
This requires calculating φ(m), Euler's totient function. That is in general a much harder problem than exponentiating, since it requires factoring m into primes. So it's not really an optimization in general, even if you're calculating with the same m many times, except in some special cases.
maybe an option would be to cache it explicitly or provide a function to do the reduction manually.
There already is a way to do this manually: if you know that gcd(a, m) = 1 and you know what φ(m) is, calculate b % φ(m).