m.css icon indicating copy to clipboard operation
m.css copied to clipboard

XMLDoc (C#) support for input

Open Jofairden opened this issue 7 years ago • 7 comments

File: https://files.gitter.im/mosra/m.css/gEIP/Terraria.xml

Not using @brief, but <summary> etc. (xml tags) Visual studio can produce above XMLdoc (provided the checkbox is ticked in project settings) Anyone that has the XMLdoc in the same place as the reference exe (with the same name) will get to see our documentation in the editor

Our current documentation: (look how shit the search bar is): http://blushiemagic.github.io/tModLoader/html/index.html

Jofairden avatar Dec 07 '18 17:12 Jofairden

Hi!

A bit of progress on this, finally: http://tmp.mosra.cz/terraria-docs/Terraria.html At the moment it's extracting just namespaces and types, not function arguments, properties or anything like that yet, so there's way less shown that it should be. It also leaves quite a lot to be desired, so for undocumented things I have to guess whether a given name is a namespace or a class. But that's okay I think.

Once I'm done with extracting other kinds of information to have this more complete, I'll hook up the search. The above link will get gradually updated with new versions.

Just for comparison to the Doxygen theme: getting here took me maybe six hours and 300 lines of code (and I assume a bunch more days and I'm done with most of it). For comparison, the Doxygen parser is at the moment 3k lines, more than half of that are workarounds for all kinds of crazy stuff it's doing.

mosra avatar Jan 04 '19 02:01 mosra

Very cool, thanks for keeping us updated

Jofairden avatar Jan 04 '19 09:01 Jofairden

Random thoughts -- while I like the simplicity and clarity of the XMLDoc representation, I am not sure about a few things. Since I worked with C# last time in 2010, I am not well versed in this language anymore, so maybe you could help me with some:

  • As far as I can see, it's not possible to know if a type is a struct or a class (the XML file doesn't contain any such information). Is it important? It's very important for C++ because (on MSVC, at least) there are different mangling rules. But for C# I have no idea.
  • Classes (and members) that don't have any documentation don't appear in the XML at all. In that case, if I find a member of an undocumented class, I can guess that the parent symbol was a class, but not in all cases -- sometimes it might have been a namespace.
  • While I can probably guess what types are enums, it's also not really possible to guess which of these is an interface.
  • Similarly, the difference between a property and a member function is also blurred, since the XML doesn't include () for parameter-less methods, so they appear like properties.

I'm thinking, since all docs from Microsoft appear to use XMLDoc (and they are themselves suggesting to use Sandcastle to build docs out of these), I'm thinking that there has to be some additional (XML?) file that describes what is what -- instead of requiring the doc generator to get the one-way-ticket to hell and try to parse the C# sources. Or maybe such information could be appended to the current XML file by specifying additional flags and it's just not done by default?

I'm a total newb to C# or how it's being build or introspected or anything, so I don't even know what I should google :sweat_smile:

mosra avatar Jan 04 '19 17:01 mosra

Hm.

So after digging a bit more, I found that apparently the only way to get to know whether a type is a class, a struct, an interface or enum, one has to perform reflection on the DLL (e.g. as here in monodoc). And the generated XML file contains everything except stuff that's available through reflection.

There's also the alternative monodoc tool, but that forces you to first generate XML stubs from your C# code and then aditing those XMLs in order to document them. Those XMLs seem to finally contain at least some type information, but such workflow seems a bit excessive to me. Way too much XML editing. Even with that, the "ECMA documentation" format seems to be Mono-specific, so I guess there's no way to getting that to work with Visual Studio.

To perform reflection of the DLL without needing to implement a C# tool I was looking at Python's ctypes but that only provides a very basic support for calling functions, not any introspection tools. Then I realized there's IronPython and that could probably use that for introspecting the DLL directly from python code. It works on Linux through Mono, so probably doable for me as well.

But ... with the reflection stuff it exploded into way more work than I initially thought and I'm not sure anymore if this is the right thing to do at this point, since I have other top-priority work to do. So, while this could be a fun thing to do, I'm afraid I have to put this aside for now. Sorry.

I'm adding the help wanted label here in case somebody would want to help, the csharp branch contains the current work-in-progress state that was used to generate the docs linked above.


Just to be clear: yes, I know the other way would be to try with Doxygen again, but seeing the current docs it generates, starting from scratch would take almost the same amout of time as just trying to undo the horrible stuff it did because it treated C# as some weird C++ dialect (the virtual keywords and other crazy stuff). Since C# has proper tools for reflection and the XMLDoc format is well-documented, I think that's the way forward, not by increasing this project reliance on Doxygen even more. Especially since I want to make my Doxygen replacement anyway. ;)

mosra avatar Jan 04 '19 19:01 mosra

Yeah so, you'll definitely need to reflect on the assembly to get all useful information. The generated XML file seems nothing more than a mapping to map X doc to Y thing, but it has no other real information. I don't really know what the letter before the actual name means, T seems to be some kind of definition and F a member of that definition.

The reflection however, is super easy since the member name provided in the XML file is all you'd need. You can do something like

Assembly.GetExecutingAssembly().GetType("the string in member name= without the T: or F: part");

You can then check various things really easily, such as if it's a class: type.IsClass Abstract: type.IsAbstract Enum: type.IsEnum

And so forth. So I suppose the entire script to parse the assembly is just something like:

			foreach (Type type in Assembly.GetExecutingAssembly().GetTypes())
			{
				// logic on what to do with type
			}

This is all in CSharp at least, I dont know your preferred solution.

To parse the XML document, I found:

XDocument doc = XDocument.Load("test.xml"); // Or whatever
var allElements = doc.Descendants();

So to find all elements with a particular attribute, for example:

var matchingElements = doc.Descendants()
                          .Where(x => x.Attribute("foo") != null);

(source: https://stackoverflow.com/questions/7467496/c-sharp-getting-all-nodes-of-xml-doc) These two things can probably be easily combined to generate the docs

You can also compile a XML doc for a single file: csc XMLsample.cs /doc:XMLsample.xml (Source: https://docs.microsoft.com/en-us/dotnet/csharp/programming-guide/xmldoc/how-to-use-the-xml-documentation-features) This may be useful to you if you'd like to handle files individually, though that might be an unnecessarily complex approach

To generate the entire XMLdoc file, I dont think there's an easy command to do so, I believe you need to run the compiled with the doc option to have it generated: image

As for other languages, I dont know, but you can probably invoke the c# reflection library and do it in your desired language, I got no experience with that though.

Jofairden avatar Jan 05 '19 13:01 Jofairden

Oh, thanks! Didn't expect the reflection to be this straightforward :)

The letter at the front (and also everything else) is fortunately very well documented. Such a big improvement from the opaque Doxygen-generated XMLs, hehe :)

Yesterday I also managed to find the equivalent command for Mono (it's msc /doc) and generate a dll+xml, which I assume I would be later able to reflect and extract the type information from there. Because I have the search implemented in Python already (and also all the templating, math rendering, code highlighting, ...), I think going with IronPython (as mentioned above) for the reflection instead of rewriting the rest in C# would make more sense. Which, however, in turn means you'd need to have IronPython installed in order to do the DLL introspection on your side when generating the docs.

I may be able to get back to this later this month (it sure looks fun and I would learn a bunch of new things, which is never bad), but probably not in time for your release, as I had to give priority to other things now.

mosra avatar Jan 05 '19 13:01 mosra

Which, however, in turn means you'd need to have IronPython installed in order to do the DLL introspection on your side when generating the docs. This is a completely fine requirement if you ask me, especially if it suits your development needs.

I assume I would be later able to reflect and extract the type information from there Yeah you can, you can load in the dll and access the assembly from there, just doing similar stuff like above.

Jofairden avatar Jan 05 '19 15:01 Jofairden