EJTP-lib-python icon indicating copy to clipboard operation
EJTP-lib-python copied to clipboard

Improve handling of jacks ($200)

Open MoritzS opened this issue 12 years ago • 51 comments

I think jacks should get a dynamic registering structure like I explained for Frames in #65.

Furthermore it would make sense to be able to make Jacks read or write only. For example a tls-jack will only need a ca-certificate if it's read only but will also need a proper certificate and key if it's acting as a server.

MoritzS avatar Feb 02 '13 23:02 MoritzS

Indeed. While I'd like to focus on Frames first, I think both would benefit from this kind of overhaul.

MaddieM4 avatar Feb 03 '13 07:02 MaddieM4

Dealing with read-write disparity

We do want very cleanly separated classes for server and client where it makes sense, but for most persistent connection transports, it's wasteful not to use the same pipe where already available. The question is how to make a system that works in a satisfactory way despite the wildly varying needs and models of different transports.

Well, it took awhile, but I think I can finally answer that question.

Connection class

Generic among the Jack subsystem, i.e., not subclassed by individual jacks. Initialized with two callback properties, read and write. write is called by the router to send outgoing messages. read is called by whatever jack receives an incoming message for that connection.

>>> Connection(router, read, write, address_local, address_remote)

Reader class

Class that handles incoming connections. Creates a new Connection object when connected to.

Each module exposes this as reader_class.

>>> reader = ejtp.jacks.tcp.reader_class(router, address)

Writer class

Class that handles outgoing connections. Creates a new Connection object when a frame is being sent somewhere we don't already have a connection for.

Each module exposes this as writer_class. May be equivalent to reader_class to signal that a single class performs both duties. The writer_ignores_interface module variable tells the router to do 2 things if True: initialize the class without a second parameter, and use (class, None) as its key in the router's jack table. Needless to say, since reader_class requires an interface to bind to, it's a really bad idea to set both reader_class == writer_class && writer_ignores_interface.

>>> ejtp.jacks.tcp.writer_ignores_address # Interface is ignored by some writer classes, but not this one
False
>>> writer = ejtp.jacks.tcp.writer_class(router, interface)
>>> writer.connect(address)
<ejtp.jacks.connection.Connection object at ...>

Router responsibilities

Manage jacks and connections. Connections register as they are created. Always use an existing connection where available, otherwise create a new one with the appropriate jack module's writer class.

Jack reuse is an important topic - what makes a jack distinct? Class + address truncation (2 elements long). So we can use (class, addr) tuples as keys in the router's jack cache. This seamlessly handles the logic in situations where there are not separate Reader and Writer classes, but a single dual-purpose class.

make_jack

"Create on need" is pretty straightforward with the Writer class. Clients can probably get away with just caring about setting up a Reader so they can take incoming data from the network. So we'll go with a hybrid solution along that line - create Reader on Client init, if make_jack = True, and create Writers on the fly as needed.

Registration

Let's model it based on how we do it in frames - a global registration module in ejtp.jacks, with a registerJack decorator and a createJack(address, is_reader) function.

While we're at it, make one of the attributes of the BaseJack the "router key", AKA (class, address) tuple. Maybe BaseJack.router_key, nice and simple.

State of this proposal

Edited to be less shaky and incorporate stuff decided in discussion. Can hopefully be worked on in parallel with Router changes (which may not be totally independent of these changes, but we can do some temporary router bandaging in the very short term while waiting for a proper router reimplementation).

MaddieM4 avatar Feb 19 '13 08:02 MaddieM4

These are some really well thought proposals! But I have some questions and suggestions:

  • What exactly does writer_ignore_interface do? Does that just mean that the reader/writer gets called without an interface argument?
  • I really like the idea with the readers and writers. Does that also mean, that we theoretically could use a Connection with a tls reader and an irc reader? (by the way: I would name them just reader and writer. In my opinion that's just a bit clearer.
  • The make_jack should work similar to the new createFrame, i.e. load the correct Jack out of a list of previously registered ones.
  • the "create on need" idea could be implemented with some sort of a "lazy class" that wraps or inherits from a reader/writer class and initializes them as late as possible. Another alternative is to make this behaviour standard for all reader/writer classes.

MoritzS avatar Feb 19 '13 11:02 MoritzS

Heh, thanks, it was really late in my timezone when I wrote them, and I wasn't totally sure which side of the straddled hazy/genius line this would ultimately fall on :)

  • writer_ignores_interface is used for protocols where it doesn't make sense to bind the writer to any address. This keeps the router from falsely spawning redundant writer jacks.
  • A connection is associated with a single external jack address (which actually would be good to include in the Connection constructor). So I can't imagine a situation where a connection spanning protocols would make sense, although you can certainly do it by invoking the constructor directly and laughing maniacally.
  • reader and writer probably are better names - less redundant, at least. Sure, let's go with those.
  • Agreed about create_jack, I guess I thought that went without saying :) We might want to defer the actual creation responsibility to the router, if we think we might want to make jacktype registration a router config matter, instead of global registration.
  • Wrap is better, I think. You can use it on any jack without modifying the jacks. I'm trying to decide if this is overengineering, though - the router can definitely handle this in a functional/callback way - so let's wait on that decision until the code around that idea is somewhat fleshed out, and we can make a better-educated decision about whether the idea fits or not.

Sorry if there's chaos in your email notifications, I had a bit of minor catastrophe trying to fix the formatting after making this reply via email.

MaddieM4 avatar Feb 19 '13 15:02 MaddieM4

Bounty is $100.

http://www.freedomsponsors.org/core/issue/197/improve-handling-of-jacks


(Copied from acceptance criteria)

Work with me and possibly other developers to create a registered jack system, according to designs determined in conversation in the issue tracker.

Feature may or may not ship in stable at the end of this month, depending on its readiness. If it doesn't land in 0.9.3, we'll just push it to 0.9.4.

MaddieM4 avatar Feb 21 '13 06:02 MaddieM4

I don't totally grasp the (old) router system, yet. Why exactly is it necessary to distinct between Clients and Jacks? Or is this just a thing that will become obsolete if we have the new Connection class with readers and writers?

MoritzS avatar Feb 21 '13 16:02 MoritzS

Historically, it's because client addresses are longer, with that third "local identifier" element. So for outbound messages, you have to check a trimmed version of the frame address against available jack addresses.

I think we'll still want to keep the two distinct, even if we can make address comparison somewhat agnostic to the above problem, because the Connection class would be pointless overhead for local message-passing.

MaddieM4 avatar Feb 21 '13 17:02 MaddieM4

I have some more questions:

  • what should is_reader in createJack affect? Perhaps making Connections read only?
  • shall the Connection class replace the Jack classes? As in the jacks now define a reader and writer instead of a Jack class
  • what is router_key for? For storing the connection in the Router's cache?

MoritzS avatar Feb 21 '13 18:02 MoritzS

  • It's to tell createJack whether to return an instance of the module's reader class (if True) or its writer class (if False).
  • Woah, not at all! Jacks are still responsible for "owning" the sockets and such in a persistent fashion. Connections are just tiny, consistent little package containing a bit of metadata, and some arbitrary callbacks created by the Jack.
class MyWriterJack(...):

   def connect(self, address):
       read = lambda conn, frame: conn.router.recv(frame)
       write = lambda conn, frame: self.send_with_arbitrary_function(frame)
       return Connection(self.router, read, write, self.address, address)

Connections are practically glorified tuples, and are created on-the-fly by Jacks. Writer Jacks create them when you want to send a frame somewhere, and a Connection for it doesn't exist yet. Reader Jacks create them when you receive a message from the outside world, and don't have a Connection for that remote host yet.

  • Correct. Cache logic in the router is a lot simpler when the jacks are responsible for their own key computation.

MaddieM4 avatar Feb 21 '13 19:02 MaddieM4

But now I don't get what the reader and writer classes are needed for. What do they do if the Jacks create the Connections on the fly without using them?

MoritzS avatar Feb 21 '13 20:02 MoritzS

Jacks manufacture Connections for the router to use. If the router doesn't have a connection, it has the writer jack make one. Incoming messages through the reader jack also create connections and register them with the router, so replies can use the same existing socket connection.

MaddieM4 avatar Feb 21 '13 20:02 MaddieM4

When you are talking about writer and reader jacks do you mean the same as writer and reader classes in jack modules?

MoritzS avatar Feb 21 '13 21:02 MoritzS

Yes, exactly.

I think I may need to draw and scan a diagram to get this across unambiguously, but it sounds like we're approaching the same page :)

MaddieM4 avatar Feb 21 '13 22:02 MaddieM4

Yeah, I think a sketch will help a lot to understand this!

MoritzS avatar Feb 21 '13 22:02 MoritzS

Okay, I ended up doing something a bit clearer. It's not done, but thanks to the nature of Google Docs, you can watch me work!

https://docs.google.com/presentation/d/1lcLTDPlcjuWvgxmYTOgX9W5xqLuPBYYA_c8vE13BGOg/edit?usp=sharing

EDIT: Done. Now I kinda want to clean up the Connection spec, since I figured out in the process of graphing everything out that the read function is really roundabout and pointless. Not sure how much conversation will sound like nonsense if I do that, though.

MaddieM4 avatar Feb 21 '13 23:02 MaddieM4

Well first I have to understand it as it is know ;-) So basically we split the old jack classes in a class that handles writing connections and another that does reading? And connections are client and address dependent, just like TCP connection sockets?

MoritzS avatar Feb 22 '13 00:02 MoritzS

Right. I mean in reality it's less a read/write paradigm, and really a server/client paradigm (respectively) for 90% of all possible jacks, so I'm okay with making that more semantically straightforward and changing the names in this design to match that reality (you'll notice in the presentation, the jack class names were TCPServerJack and TCPClientJack).

And again, you get it. Connections are address-dependent, and each address is 2-element (jack style) instead of 3-element (client style). The router can just find the first connection with the right remote_address and call its write(frame) function to ship a frame there.

MaddieM4 avatar Feb 22 '13 06:02 MaddieM4

I had a first go at MoritzS@f162725d703ce243b49826fa79b6dcad491ba348 Basically I did not implement the reader/writer thing but RegisterJack checks if the class is subclass of ReaderJack or WriterJack. In the end it's the same but I think it's easier to understand like this. Also the writer_ignores_interface is now WriterJack.nobind and WriterJack.router_key acts accordingly. My idea of a basic implementation of a Jack is something like this:

# module MyJack
@RegisterJack('myproto')
class MyReaderJack(ReaderJack):
    def run(self):
        # [...]
        self.recv(data)
        # [...]

@RegisterJack('myproto')
class MyWriterJack(WriterJack):
    nobind = False

    def send(self, data):
        # [...]
        some_send_method(data)
        # [...]

The thing with a single class for reader and writer also works:

@RegisterJack('myproto')
class MyReaderJack(ReaderJack, WriterJack):
    # all needed methods here

Please tell me if that's what you thought of or if I'm completely on the wrong track!

MoritzS avatar Feb 22 '13 13:02 MoritzS

Brilliant! Same concepts, much clearer implementation. You're very much on the right track, @Moritzs!

MaddieM4 avatar Feb 22 '13 15:02 MaddieM4

@MoritzS just checking in to see how things are going. I'm totally willing to pitch in if you're burned out, or the real world has been too demanding of your time, or whatever. Just want a rough idea where we're at, and when we can expect to see a pull request, since most other paid issues block on this.

MaddieM4 avatar Mar 12 '13 19:03 MaddieM4

I just don't have enough time at the moment, I'll try to do this either this or next week.

MoritzS avatar Mar 12 '13 21:03 MoritzS

That's cool. And I'll try too keep adding paid issues that are orthogonal to this one, like #118 and #102, so that there's still stuff people can contribute on.

MaddieM4 avatar Mar 12 '13 21:03 MaddieM4

In order to keep this project moving, I'm opening it back up to everybody. Whoever has time and inclination to do it, can do it based on the discussion here. I'd love to give @MoritzS the time he needs to work in development time around the rest of his life, but not at the expense of the issues blocking on this one.

MaddieM4 avatar Mar 31 '13 06:03 MaddieM4

I never wanted other people to stop working on this, it's totally fine and even encouraging to have other people do this!

MoritzS avatar Apr 07 '13 17:04 MoritzS

@MoritzS Agreed :) Unfortunately, we haven't seen much interest in getting this done, other than yours, so I've cranked up the bounty to $200. Hopefully I won't have to go higher than that and break the bank/trigger some godawful Paypal fraud detection thing.

MaddieM4 avatar Apr 08 '13 18:04 MaddieM4

Maybe I can start working on this again next week, we'll see.

MoritzS avatar Apr 10 '13 08:04 MoritzS

Sounds great to me! Let me know when/if you have something to look over. On Apr 10, 2013 1:27 AM, "Moritz Sichert" [email protected] wrote:

Maybe I can start working on this again next week, we'll see.

— Reply to this email directly or view it on GitHubhttps://github.com/campadrenalin/EJTP-lib-python/issues/66#issuecomment-16161443 .

MaddieM4 avatar Apr 11 '13 04:04 MaddieM4

I'm working on this right now. Give me a little time to go through all our thoughts! ;-)

MoritzS avatar May 07 '13 18:05 MoritzS

Sweet! I've threatened to do this myself, but I've been pretty caught up in love-webplayer and getting a proof of concept for DJDNS (which turns out to be much harder than I thought, thanks to some things I didn't realize about pymds, forcing me to make a significant fork before even being able to use it in DJDNS). You'll have plenty of time, believe me, though one way or another this will be done by the end of the month.

On Tue, May 7, 2013 at 11:38 AM, Moritz Sichert [email protected]:

I'm working on this right now. Give me a little time to go through all our thoughts! ;-)

— Reply to this email directly or view it on GitHubhttps://github.com/campadrenalin/EJTP-lib-python/issues/66#issuecomment-17561450 .

MaddieM4 avatar May 07 '13 19:05 MaddieM4

I made two new commits, that I'll explain here.

MoritzS@a390e6a7d641e16470a3bca06f4b92e7ad48f8cf:

  • I renamed the nobind thing to bind and moved it to BaseJack, therefore the property router_key now also lays in BaseJack
  • BaseJack got the methods link and unlink, and I changed the methods for loading jacks in the router a bit. They now use jack.router_key for example. See the docstrings for more information

MoritzS@38776b6c6a061ecd47e059a47c43d547279c4b9f: In this commit I basically just implemented the listening for data and threading stuff for the ReaderJack.

Please tell me if that goes along with your ideas!

MoritzS avatar May 09 '13 10:05 MoritzS

This looks really good so far. Clean and well-documented, and goes along with our design (at least as far as I've been able to refresh my memory, it's a long conversation :) ). Excellent work!

MaddieM4 avatar May 09 '13 16:05 MaddieM4

I'm currently trying to implement the new style tcp jack and got some more questions. I suppose a WriterJack should normally have set bind=False, right? For example a single tcp (or tcp6) socket can connect to multiple addresses. But should these connections be (openend and) closed right (before and) after sending a frame over it or should the stay somewhat longer to make transmission faster in some cases? Right now I don't have any idea of how to implement the persistent connection thing, so if that is what you want, I'll appreciate some input!

MoritzS avatar May 15 '13 15:05 MoritzS

Jacks should hold onto the underlying sockets until they break. How they are stored and tracked is an implementation detail that's up to jack authors - in this case, you. But for the sake of example, you could just have a dict keyed on address tuple, with holds the socket objects as values.

class SomeWriterJack(...):

    def __init__(self, ...):
        # ... call parent init and such
        self.sockets = {}

    def register_socket(self, s):
        self.sockets[s.getpeername()] = s

That's actually probably a common enough storage pattern to justify implementing it in some base class, rather than redoing it over and over in the subclasses.

MaddieM4 avatar May 16 '13 02:05 MaddieM4

But now it seems a bit redundant to not have the WriterJack to bind to an address on the one hand but still managing different addresses within the jack on the other hand. In my opinion it would be more consistent to set bind=True in the TcpWriterJack. The router will create a new one for each address he needs, anyway. Setting bind=False makes more sense in protocols like UDP.

MoritzS avatar May 16 '13 14:05 MoritzS

My understanding, which I admit may be rusty and wrong, is that bind should be true when there is a consistent local address to the connections, and false when it's not. So for TCP Writer, the local end is just going to be have some random port that was free (thus bind=False), whereas the TCP Reader binds to a specific addr/port as a server (therefor bind=True).

This is totally independent of whether a Jack keeps track of the sockets it owns. For TCP Reader, new sockets will come from remote processes connecting in. For TCP writer, new sockets will come from the router wanting to be connected to remote processes. Either way, you keep a dict of sockets, and provide the Router with Connection objects that use those sockets.

MaddieM4 avatar May 16 '13 14:05 MaddieM4

This might be true for ReaderJack (it also has set bind=True as default), but in my opinion it doesn't really make sense to talk about local addresses when sending data. Besides we (will) also have jacks that work totally different in comparison to IP protocols, so we should find an unified definition of what "bind" means. For ReaderJack, I assume it's kind of obvious, since the analogy to TCP server sockets makes sense even with other types of jacks. (e.g. a Sqlite jack will have to bind to a database to read incoming frames) For WriterJack it's a good idea to look at what the attribute bind was initially supposed to cause. As far as I remember its main use was to distinguish whether to return the full jack address in BaseJack.router_key or set the last field of the address to None. This will help the router to decide whether to create a new WriterJack on the fly with a new address or use a existing one. Therefore there already is a place where different addresses of the same jack type are handled, so I don't see why we should add this logic to the jacks themselves.

MoritzS avatar May 16 '13 17:05 MoritzS

I think this might be a good motivation to rename the "bind" variable to something more unambiguous and descriptive, like "has_local_address". That'll help us work out the rest of this mess with less cognitive tangle.

MaddieM4 avatar May 16 '13 17:05 MaddieM4

Yeah that's a good idea. Apart from that, do you agree with my suggestions?

MoritzS avatar May 17 '13 22:05 MoritzS

It seems like it's probably mostly right, if I understand what you're saying, but there really only needs to be one WriterJack instance, maximum, per protocol. Both kinds of jacks basically exist to create Connection objects to feed the router. While you will need a ReaderJack per address you serve on, you only ever need one WriterJack to initiate connections to the outside world. The Connection objects abstract away the transmission details for the router, so it can treat each of them as a bidirectional pipe. Does that make sense?

MaddieM4 avatar May 18 '13 06:05 MaddieM4

Also, sorry for the high latency in getting back to you on this stuff. I really want to dig into the code with you in a collaborative environment, so I can show you what I'm talking about, instead of trying to mold it into words like sun-baked playdoh, or procrastinating about replying in the hope that it'll be easier to turn the ideas into words later (which it never is).

I think the closest we could get is to both work on the same branch in parallel (moritz/new-jacks and campadrenalin/new-jacks, for example), so that when one of us is stuck or confused, the other can express the solution as mergeable code. Let me know if that sounds good to you too.

MaddieM4 avatar May 18 '13 06:05 MaddieM4

Yeah that's a good idea! It's better to talk about written code rather than just ideas!

MoritzS avatar May 22 '13 09:05 MoritzS

Alright, cool! Happy to help.

I'll be in Philadelphia until Friday afternoon, though, and as a west-coaster, a lot of that will be on a plane (or at least will feel that way) so I'll get on this when I get back, but I'll be working on pymads during my small amounts of free time. Due to external, still-undiagnosed regressions in the build or test system, EJTP no longer builds at all on my laptop, so it's just as well that my head is in other projects while I'm living laptop-only! On May 22, 2013 2:48 AM, "Moritz Sichert" [email protected] wrote:

Yeah that's a good idea! It's better to talk about written code rather than just ideas!

— Reply to this email directly or view it on GitHubhttps://github.com/campadrenalin/EJTP-lib-python/issues/66#issuecomment-18268097 .

MaddieM4 avatar May 22 '13 13:05 MaddieM4

I finally got some of my Connection class ideas into the code, as well as a bunch of general cleanup, though it's still gonna be a godforsaken mess for awhile. Refer to #135 to see what I'm talking about.

MaddieM4 avatar May 30 '13 05:05 MaddieM4

I won't have time to look over it in detail until Sunday, but I had a quick look and I can definitely say that I understand your ideas way better now!

MoritzS avatar May 31 '13 21:05 MoritzS

I know, right? Code is always clearer!

Still need to figure out how to deal with the colliding taxonomies of IPv4/IPv6, Reader/Writer, etc, especially with regards to registration. I've got half a mind to make Reader vs Writer an initialization parameter, rather than a class thing. So there's a ways to go before this is useful, and I don't think it will make this EJTP release either. But the next one? Definitely.

MaddieM4 avatar May 31 '13 22:05 MaddieM4

So I have some questions now. Why did you move the send and recv methods out of ReaderJack and WriterJack into BaseJack? Are Connections supposed to be client-wise (like a TCP-client) only, or should they also be able to be incoming connections (like a TCP-server), or even both at once? Besides, I really like the idea of having the router handle Connections instead of Jacks!

MoritzS avatar Jun 05 '13 14:06 MoritzS

It's been a busy day, so I haven't had a good opportunity to give these good solid thinking, but I'll try to hammer out some thoughts now.

  • It was simpler that way. The difference between the jack types is not 'can send data vs can receive data', it's 'creates connections vs connected to from elsewhere'. Both types need to be able to send and receive data, given a Connection. My thinking is that the socket objects will be stored and passed around in the Connection objects.
  • Both. In the example of TCP, a ReaderJack acts as a server, and a WriterJack acts as a client, but either way, they're manufacturing Connection objects based on those sockets, which are initialization-agnostic from the router point of view.
  • You are completely right. I'm still figuring out how to avoid making jacks responsible for Connections after creating them. Tragic as the real-life analogy is, the ideal scenario is that the jacks act like delinquent parents that pop out babies (Connections) and let the state (Router) handle their care.

MaddieM4 avatar Jun 06 '13 07:06 MaddieM4

As I understand your suggestions, you still mix up the concept of Jacks, as a gateway to the internet (similar to an ethernet interface) but without further knowledge of addresses and such, and Connections, as a layer on top of the Jack that is aware of addresses, clients, servers, etc. Specifically, this means the following:

  • a Jack instance should "know" how to establish connections, so
    • WriterJacks know how to establish connections, that send data
    • ReaderJacks know how to establish (i.e. listen for) connections, that read data
  • the Connection class actually cares about addresses, senders and receivers

For the router this means, that it will never instantiate a Jack class directly but use the (yet to be implemented) registration system, that will look at the address, instantiate a Jack class and return a connection.

MoritzS avatar Jun 06 '13 20:06 MoritzS

Correct. If I recall correctly, I'm at the point in development where I'm actually trying to register and use jacks. Not sure if I pushed that or just committed it.

And also, you're right, I'm confident about the role of each piece in the system but the terminology is fucked :) At least for jacks. I don't know how to improve it though, so I'm open to suggestion/wild brainstorming. On Jun 6, 2013 1:26 PM, "Moritz Sichert" [email protected] wrote:

As I understand your suggestions, you still mix up the concept of Jacks, as a gateway to the internet (similar to an ethernet interface) but without further knowledge of addresses and such, and Connections, as a layer on top of the Jack that is aware of addresses, clients, servers, etc. Specifically, this means the following:

  • a Jack instance should "know" how to establish connections, so
    • WriterJacks know how to establish connections, that send data
    • ReaderJacks know how to establish (i.e. listen for) connections, that read data
      • the Connection class actually cares about addresses, senders and receivers For the router this means, that it will never instantiate a Jack class directly but use the (yet to be implemented) registration system, that will look at the address, instantiate a Jack class and return a connection.

— Reply to this email directly or view it on GitHubhttps://github.com/campadrenalin/EJTP-lib-python/issues/66#issuecomment-19071776 .

MaddieM4 avatar Jun 06 '13 23:06 MaddieM4

Tell me, what you think about this mock-up!

MoritzS avatar Jun 08 '13 15:06 MoritzS

@MoritzS Commented on the commit itself, in case GH doesn't automatically notify you. Looks very promising, and I think we may have found a good terminology solution, but there are parts I don't quite get yet.

MaddieM4 avatar Jun 08 '13 20:06 MaddieM4