Source Generators in C# - BC. ONDŘEJ SLIMÁK Master's Thesis - IS MUNI
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
FACULTY OF INFORMATICS Source Generators in C# Master’s Thesis BC. ONDŘEJ SLIMÁK Advisor: RNDr. Jaroslav Pelikán, Ph.D. Department of Computer Systems and Communications Brno, Spring 2022
Declaration Hereby I declare that this paper is my original authorial work, which I have worked out on my own. All sources, references, and literature used or excerpted during elaboration of this work are properly cited and listed in complete reference to the due source. Bc. Ondřej Slimák Advisor: RNDr. Jaroslav Pelikán, Ph.D. iii
Acknowledgements I would like to thank my boss, Tomáš Herceg, for encouraging me to dedicate my thesis to this topic, my supervisor, RNDr. Jaroslav Pelikán, Ph.D, for the valuable feedback while developing the text of the thesis, and, of course, my dear family, friends, and colleagues for their support. iv
Abstract Source Generators are a recent .NET feature with considerable potential to improve several aspects of software development, such as perfor- mance or development time. This thesis aims to provide a profound guide to this feature. After reading this thesis, a reader should under- stand Source Generators, their purpose and use cases, their difference from the other common code generation options in C#, and the reader should be able to use them in practice. Additionally, the thesis also analyses the potential use of Source Generators in enterprise applications. The currently available packages based on Source Generators have a niche purpose. On the contrary, the thesis proposes a more complex framework, which could generate a common boilerplate code for an enterprise application and thus improve development efficiency. The thesis delivers a proof of concept of the framework. This frame- work, called HUGs, generates several constructs based on simple do- main model schemas. However, this thesis and the framework also highlight the immaturity of Source Generators and their sometimes cumbersome integration into existing .NET tools, such as Visual Studio and IntelliSense. Keywords Source Generators, C#, .NET, program analysis, source code genera- tion, enteprise applications v
Contents 1 Introduction 1 2 Other code generation options in C# 3 2.1 Code snippets . . . . . . . . . . . . . . . . . . . . . . . . 3 2.1.1 Code snippet example . . . . . . . . . . . . . . . 4 2.2 CodeDOM . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2.1 CodeDOM Example . . . . . . . . . . . . . . . . 7 2.3 T4 Templates . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3.1 T4 Template example . . . . . . . . . . . . . . . . 13 2.4 Reflection . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.4.1 Reflection example . . . . . . . . . . . . . . . . . 17 2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3 Source Generators 21 3.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . 25 3.2.1 Initialization . . . . . . . . . . . . . . . . . . . . . 26 3.2.2 Execution . . . . . . . . . . . . . . . . . . . . . . 28 3.3 Fundamental concepts . . . . . . . . . . . . . . . . . . . 29 3.3.1 Adding source code to compilation . . . . . . . . 29 3.3.2 Additional files . . . . . . . . . . . . . . . . . . . 30 3.3.3 Syntax receivers . . . . . . . . . . . . . . . . . . . 31 3.3.4 Diagnostics . . . . . . . . . . . . . . . . . . . . . 33 3.4 Configuration and setup . . . . . . . . . . . . . . . . . . 34 3.4.1 Packaging . . . . . . . . . . . . . . . . . . . . . . 35 3.4.2 Output files . . . . . . . . . . . . . . . . . . . . . 39 3.4.3 Testing . . . . . . . . . . . . . . . . . . . . . . . . 40 3.5 Comparison with other code generation options . . . . 42 3.6 Current state . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.6.1 Performance problems . . . . . . . . . . . . . . . 43 4 Source Generators in enterprise applications 45 4.1 Common characteristics of enterprise applications . . . 45 4.2 Domain–Driven Design . . . . . . . . . . . . . . . . . . 46 4.2.1 Relevant concepts . . . . . . . . . . . . . . . . . . 46 vi
4.3 Utilization of Source Generators . . . . . . . . . . . . . . 48 5 Developing Source Generator framework 51 5.1 Design of the framework . . . . . . . . . . . . . . . . . . 51 5.1.1 Additional files structure . . . . . . . . . . . . . 52 5.1.2 Code generation process . . . . . . . . . . . . . . 54 5.2 Implementation of the framework . . . . . . . . . . . . . 55 5.2.1 Framework code generation flow . . . . . . . . . 56 5.2.2 Testing . . . . . . . . . . . . . . . . . . . . . . . . 57 5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 5.4 Future development . . . . . . . . . . . . . . . . . . . . 59 6 Conclusion 61 A Attachments 63 B Installation and Use 64 Bibliography 65 vii
List of Figures 2.1 Output of the prop Code snippet . . . . . . . . . . . . . . . 3 2.2 Add new T4 Template . . . . . . . . . . . . . . . . . . . . . 14 2.3 Run–time T4 Template file properties . . . . . . . . . . . . . 14 3.1 Schema of compilation with Source Generators phase [19] . 21 3.2 Generated files in Solution Explorer (taken from the Mi- crosoft documentation) . . . . . . . . . . . . . . . . . . . . 40 5.1 Framework’s code generation process . . . . . . . . . . . . 54 5.2 Framework’s input to source code transformation . . . . . 54 viii
1 Introduction We invented computers to ease our lives, and we expect software developers to prepare the computers to do so. It is a difficult task that requires much work; however, good developers are lazy developers. They do not want to write the same code more than once; they do not want to repeat themselves; they do not want to do extra work [1, 2]. In the best case, they do not want to write the code at all. Consequently, the developers build various tools to ease their job. For instance, IDE applications often provide syntax highlighting or a visually appealing GUI for source code management and debugging. Also, for standard pieces of code, the IDE usually supports macros, code snippets, or other forms of shortcuts for writing a code. When- ever the developer uses any of the available shortcuts, the IDE can autocomplete the code or, rephrased differently, generate the code. Therefore, the developer often does not need to remember the correct syntax. Moreover, regarding syntax, the IDE knows better than the developer. For this reason, the generation of source code is a hot topic among developers. Code shortcuts require the developer to be typing in an IDE ma- nually. However, other code generation tools might not require any manual code typing in the IDE. The tool could be a console application, which would generate the source code based on its input arguments. For instance, an API client can be generated based on the API specifi- cation. Furthermore, the source code can be generated based on the existing code. This approach is more complex as it usually requires more profound knowledge about code analysis. This approach to code generation is enabled by the recent C# feature, Source Generators. This thesis focuses mainly on Source Generators, but also discusses other ways to generate code in .NET platform, namely in C# and its most common IDE, Visual Studio1 . C# developers can utilize multiple different mechanisms to generate code, and Source Generators is the latest one. It was released with .NET 5 by the end of 2020, and its goal is to enable compile–time metaprogramming. In other words, it is a feature that enables developers to create code based on the existing 1. In some topics, also JetBrains products for .NET are considered ( Rider and ReSharper). 1
1. Introduction code during compilation dynamically. However, the Source Genera- tors are still a recent feature, and no in–depth guides are available. Moreover, the existing guides often describe the same, or very similar, simple Source Generator samples. Therefore, this thesis aims to provide the missing in–depth guide to Source Generators. It should compare Source Generators with other code generation mechanisms in C#, explain the distinct role of Source Genera- tors, and prepare the reader to use this feature in practice. Additionally, since C# and .NET target primarily large enterprise applications, this thesis should propose the use of Source Generators in the development of such applications. Overall, this thesis should serve as a complete learning material for C# developers interested in Source Generators. This thesis has the following structure. The second chapter men- tions other code generation options in C#. It briefly describes common options and discusses their advantages, drawbacks, and use cases. Moreover, a simple example is presented for each of the options. The third chapter introduces Source Generators. It explains the prin- ciples, capabilities, limitations, and purpose of this feature. The chapter profoundly describes the operation of Source Generators, the fundamen- tal concepts of working with them, and their configuration. After these concepts are explained, the chapter compares the Source Generators with other code generation options mentioned in the second chapter. Lastly, this chapter discusses the current state of Source Generators. The fourth chapter analyses the potential use of Source Generators during the development of enterprise applications. It defines this kind of application and mentions its traits and requirements. Then, it introduces Domain Driven Design, shortly, DDD, and explains re- levant DDD concepts. Moreover, this chapter rationalizes the use of Source Generators in enterprise application development and proposes a Source Generator framework that would improve the development of enterprise applications. Also, this chapter concludes the theoretical part of the thesis. The practical part of the thesis is described in the fifth chapter. This chapter begins with a description of the framework design, followed by a description of its implementation. Lastly, the developed framework is evaluated, and its value is stated. The chapter ends by describing a possible future development of the framework. 2
2 Other code generation options in C# C# offers multiple different code generation options. Although a com- prehensive list of the code generation options in C# does not exist, the common options are Code snippets, CodeDOM, T4 Templates, reflection, and Source Generators. Their approach to code generation differs, and each has its advantages and drawbacks. Before introducing Source Generators, it is essential to describe the other options to understand their differences and the use case of Source Generators. 2.1 Code snippets Code snippets1 [3, 4] aim to generate frequently used code blocks. Es- sentially, Code snippets are shortcuts, which IDE can recognize and substitute them with a corresponding source code2 . Therefore, they require the developer to manually type a shortcut known by their IDE, and then to use a proposed substitution. Standard IDEs used by C# programmers3 can insert a small block of code, but also whole methods, members, and classes. Some of the commonly used Code snippets are prop, if, ctor, or class4 . Typically, the generated code blocks require parameters. For in- stance, the prop Code snippet requires the developer to specify the type and name of the property. Therefore, Code snippets support replacement parameters, which are placeholders of the required parameters, and the developer must replace them manually. Figure 2.1: Output of the prop Code snippet Code snippets can increase the productivity of developers because developers do not need to write the code thoroughly. As a side-effect, 1. In .NET environment also known as Templates, Code templates, or Live templates. 2. Usually much longer than the shortcut. 3. Visual Studio, Visual Studio for Mac, Visual Studio Code, and Rider. 4. The complete list of default Code snippets in Visual Studio can be found in docu- mentation. 3
2. Other code generation options in C# they also provide consistency to the generated code; the generated code always has the same structure. Moreover, the developer can typi- cally define and import custom Code snippets in their IDE. Therefore, developers can share the same set of Code snippets and improve their application’s source code consistency overall. However, Code snippets require the manual work of the developer. Furthermore, they are not updated automatically. If the generated code needs to be changed, the developer must either update the generated code manually or recreate it again with the Code snippet. 2.1.1 Code snippet example Code snippets and their management differ based on IDE. Nonetheless, the variance in Code snippets is not significant. Thus, for the sake of brevity, an example will be provided for the Visual Studio environment. In Visual Studio, the tool for Code snippet management is called Code Snippet Manager5 and it uses XML files for Code snippet definition. For a basic Code snippet, this file contains a definition for a header and a code. Optionally, the file can also contain declarations of replacement parameters or namespace dependencies. For manual performance diagnosis, a custom Code snippet for a timer might be helpful. A short version of a file with Timer Code snippet is presented in Listing 2.1. A complete version of the file is available in the attachments (folder CsharpCodeGenerationDemos). This file contains a single Code snippet definition inside the Code node. The snippet starts with a Header within which Title, Author, Description, and Shortcut are defined. Node Snippet contains Code, Declarations, and Imports nodes. The Code node holds the source code generated by the snippet and it has a required attribute Language, since Code snippets support both C# and Visual Basic. The source code itself is held inside a CDATA element6 . Replacement parameters are declared within the Declarations sec- tion and each of them must contain ID and Default. The ID serves as an identification of the replacement parameter inside the source code in the Code node. In the source code, the replacement parameter iden- 5. Documentation is available in the Microsoft documentation or RIP Tutorial portal. 6. This allows usage of symbols problematic to XML, such as , or &. 4
2. Other code generation options in C# tification must be surrounded by $ symbols. The Default states the default value of the parameter. Imports declare namespaces required by the source code generated by snippet. These namespaces will add a using directives with namespaces listed in the Imports node. Listing 2.1: TimerCodeSnippet.snippet file
2. Other code generation options in C# 2.2 CodeDOM Code Document Object Model [5], or CodeDOM in short, is a .NET Frame- work specific tool that enables source code generation at run time. CodeDOM generates the code based on a single model that contains CodeDOM elements linked in a graph structure. This structure is inde- pendent of a programming language, and it can be used by various code providers to generate the code in various languages7 . CodeDOM has been released with the first versions of .NET Frame- work, and it is still supported. However, nowadays, this tool can be described as obsolete; nevertheless, CodeDOM was one of the best .NET code generation tools available during the early stages of .NET Framework. The CodeDOM model has a tree graph structure that has Code- CompileUnit instance as a root, and every consequent element of the source code model must be nested within this root. The Listing 2.2 represents an example of the element linking process. Similarly, other elements can be linked together, for example a class element to a namespace element or a member element to a class element. Listing 2.2: Linking code elements into the graph // Create CodeCompileUnit instance - graph var compileUnit = new CodeCompileUnit () ; // Create namespace element var codeNamespace = new CodeNamespace ( " MyNamespace " ) ; // link the element to the graph compileUnit . Namespaces . Add ( codeNamespace ) ; The model can be transformed into a source code of a particular programming language by an according code provider. A basic version of the transformation can be seen in Listing 2.3. The process uses the CSharpCodeProvider to generate a C# code that is produced to an output file HelloWorld.cs. Code generation can be further specified by CodeGenerationOptions, which contain additional instructions for the code provider, such as a bracing style. 7. .NET Framework includes generation and compilation by CodeDOM in C#, VB, and JS. 6
2. Other code generation options in C# Listing 2.3: CodeDOM code generation example var compileUnit = new CodeCompileUnit () ; // create a model and link it to the unit // generate source code based on the model var provider = new CSha rpCodeProvider () ; var options = new CodeGeneratorOptions () ; var sourceFile = $ " HelloWorld { provider . FileExtension } " ; using var sw = new StreamWriter ( sourceFile ) ; provider . G e n e r a t e C o d e F r o m C o m p i l e U n i t ( compileUnit , sw , options ) ; CodeDOM enables developers to define a source code model with most of the features of the supported languages; however, due to the language independence, it was not designed to support all their features. The missing features can be generated by a CodeDOM me- chanism called Snippets; however, CodeDOM cannot translate these structures to other languages automatically, and it cannot verify their correctness. Therefore, the Snippet mechanism is not recommended for use unless necessary. 2.2.1 CodeDOM Example CodeDOM is suitable for generation of CLR8 classes and this section will demonstrate such use case. The goal is to create a CodeDOM model representing a class with a field, a property, and a method, since these are common class members in C#. The Listing 2.4 is a high-level example of the CodeDOM model creation process. A namespace and a class elements are built, they are linked, and then added to a compilation unit. Method CreateName- space prepares the namespace element similarly to the Listing 2.2 and the class creation is abstracted by a method CreateClass. 8. Common Language Runtime 7
2. Other code generation options in C# Listing 2.4: Create a CodeDOM model CodeCompileUnit CreateCodeDomModel () { var codeNamespace = CreateNamespace () ; var codeClass = CreateClass () ; var compilationUnit = new CodeCompileUnit () ; codeNamespace . Types . Add ( codeClass ) ; compilationUnit . Namespaces . Add ( codeNamespace ) ; return compilationUnit ; } CreateClass method can be seen in the Listing 2.5. This method creates a declaration of a public class Squirrel. The class declaration is populated by a field, a property, and a method by correspondingly named methods. Listing 2.5: Create a class with CodeDOM CodeTypeDeclaration CreateClass () { var codeClass = new CodeTypeDeclaration ( " Squirrel " ) { IsClass = true , TypeAttributes = TypeAttributes . Public }; AddNameField ( codeClass ) ; AddNameProperty ( codeClass ) ; AddHelloMethod ( codeClass ) ; return codeClass ; } A creation of the field element is demonstrated in the Listing 2.6. The method AddNameField(CodeTypeDeclaration) adds a private field name, which is initialized with a primitive string literal, Chip. 8
2. Other code generation options in C# Listing 2.6: Create a field with CodeDOM void AddNameField ( CodeT ypeDeclaration codeClass ) { var nameField = new CodeMemberField ( new CodeTypeReference ( typeof ( string ) ) , " name " ) { Attributes = MemberAttributes . Private , InitExpression = new Co d eP ri m it i ve Ex p re s si o n ( " Chip " ) }; codeClass . Members . Add ( nameField ) ; } Structure of a property element is more complex, since CodeDOM does not support automatic properties by default9 10 . The Listing 2.7 is an example of a process, that creates a property with a backing field without the use of CodeSnippetTypeMember. This Listing creates a public string property called Name with a getter, which returns a value of the previously created field name. Listing 2.7: Create a property with CodeDOM void AddNameProperty ( CodeTypeDeclaration codeClass ) { var nameProperty = new CodeMemberProperty { Type = new CodeTypeReference ( typeof ( string ) ) , Name = " Name " , Attributes = MemberAttributes . Public , HasGet = true }; nameProperty . GetStatements . Add ( new C o d e M e t h o d R e t u r n S t a t e me n t ( new C o d e F i e l d R e f e r e n c e E x p r e s s i o n ( new C o d e T h i s R e f e r e n c e E x p r e s s i o n () , " name " ) ) ) ; codeClass . Members . Add ( nameProperty ) ; } The Listing 2.8 provides an example of a method declaration in CodeDOM. The Listing builds a public method SayHello(string), 9. Properties with a backing field generated by a compiler. 10. They can be added with CodeSnippetTypeMember, but this ignores the valida- tion provided by CodeDOM. 9
2. Other code generation options in C# which takes a string typed parameter called greetee. The body of this method is declared by a CodeSnippetExpression that contains a C# code in a string form. Listing 2.8: Create a method with CodeDOM void AddHelloMethod ( Cod eTypeDeclaration codeClass ) { var helloMethod = new CodeMemberMethod { Attributes = MemberAttributes . Public , Name = " SayHello " }; var parameter = new C o d e P a r a m e t e r D e c l a r a t i o n E x p r e s s i o n ( typeof ( string ) , " greetee " ) ; helloMethod . Parameters . Add ( parameter ) ; var greetSnippet = new CodeS nipp etEx press ion ( " Console . WriteLine ( $ \" Squirrel named ’{ Name } ’ is greeting ’{ greetee } ’\") " ) ; var greetStatement = new C od eE x pr e ss io n St a te m en t ( greetSnippet ) ; helloMethod . Statements . Add ( greetStatement ) ; codeClass . Members . Add ( helloMethod ) ; } The model defined in the previous Listing can be used to generate a code that represents a class with a field, a property, and a method. This model can be used to generate a C# code similarly to the Listing 2.3, and the generated file is located within bin/debug folder. A complete code of this example is available in the attachments (folder Csharp- CodeGenerationDemos). 2.3 T4 Templates T4 Templates11 [6] is a framework used to automate text generation in Visual Studio12 . T4 files are denoted by a file extension .tt, and they consist of literal text blocks and T4 controls, which are combined to 11. The "T4" stands for "Text Template Transformation Toolkit". 12. Rider as well as MonoDevelop added support for this framework later. 10
2. Other code generation options in C# generate the output text. T4 controls use .NET code13 to alter final text structure; nonetheless, the output file can be of any kind, for example an HTML file, a file with source code in any language, a resource file, an e–mail, or a regular .txt file. There are two kinds of T4 Templates: Run–time and Design–time. They have different purpose and their implementation is slightly dif- ferent. The Run–time templates [7] produce the final output at the run time of the application. More precisely, when created, they prepare a class with a method TransformText, which is used to produce their output. Also, once the TransformText method is available, Visual Studio is not needed for the code generation. Consequently, Run–time templates are useful to generate textual outputs of applications, such as e–mail texts. The Design–time templates [8] generate raw files (without running the application) that will be part of the application source code. Fre- quently, these templates use a model, which represents a data source14 . The template utilizes this model to read the data and to generate out- put files based on the data. The produced files become a part of the code base of the project and they are available at the designtime. There- fore, these templates are typically used to keep the code updated accordingly to data from the model; for example to generate POCO classes based on a DB schema. T4 Templates use three kinds of elements: • Directives – instructions for template processing (used program- ming language, output file extension, namespace imports, etc.) • Control blocks – variable substitution, conditional control, or cy- cles • Text blocks – string literals that are directly copied to the output Typically, the first elements of the template file are directives [9], as they direct subsequent processing of the template. Their general syntax can be seen in the Listing 2.9, and a specific example is provided 13. C# or Visual Basic 14. The data source could be a database, but also a JSON file, or a file with CSV format. 11
2. Other code generation options in C# by the Listing 2.10, in which the directive instructs the T4 Template to use an HTML format for the output file. Listing 2.9: General directive syntax Listing 2.10: Output format directive Control blocks [10] allow developers to use a .NET code to alter the generated text. These controls can be distinguished into three kinds: • Standard control blocks – statements • Expression control blocks – expressions • Class feature control blocks – methods, properties, or fields Standard control blocks provide control over the templating logic. They allow developers to use programming language constructs such as if or for cycles in a template definition to make the template more dynamic. The Listing 2.11 provides an example of a standard control block that instructs the template to output Hello World! three times. This example has statements surrounded by tags, which mark the standard control blocks. Additionally, the example has a class feature control block, that contains a method called TrimMessage. This control block is surrounded by tags. Expression controls are denoted by tags. These controls instruct the T4 Template to write a value of a .NET code variable to the output, such as the message variable in this example. 12
2. Other code generation options in C# Listing 2.11: Control blocks Combining the available elements offers a clear and complex way to generate code. However, the template is not syntax safe as it, essen- tially, only concatenates strings. Therefore, this approach brings more responsibility to developers because the output of the T4 Templates is not syntax–safe. 2.3.1 T4 Template example In Visual Studio15 , a Run–time T4 Template can be created by adding a new item of the type Runtime Text Template (see Figure 2.2). Notice the added .tt file has set Custom Tool property to value TextTempla- tingFilePrepocessor (see Figure 2.3). To change the kind of this T4 Template to a design–time, the Custom Tool must be set to TextTempla- tingFileGenerator. The Run–time templates [7] are helpful for output text of the application—for tasks such as building e–mail messages. A T4 Tem- plate that handles this task can be seen in the Listing 2.12. This List- ing contains a template which produces an HTML e–mail structure with parameters Name, and SentDate. The T4 Template automatically 15. An example of T4 Templates is provided for Visual Studio since they were initially created for this particular IDE. 13
2. Other code generation options in C# Figure 2.2: Add new T4 Template Figure 2.3: Run–time T4 Template file properties generates a code–behind .cs file, which contains a class with the Trans- formText method that returns a text defined by the template. This class is partial, which makes it easy to extend; for example, it can be used to add Name and SentDate parameters to the template. Finally, the template to text transformation is presented in the Listing 2.13. 14
2. Other code generation options in C# Listing 2.12: T4 Template producing e–mail messages < html > < head > < title > Welcome e-mail < body > Welcome < #= Name # >! < br / > < br / > Date sent : < #= this . SentDate . ToString ( " d " ) # > Listing 2.13: Transformation of the e–mail template to string var sentDate = new DateTime (2010 , 11 , 22) ; var emailTemplate = new E m a il R u n ti m e T ex t T e mp l a t e ( " Joe " , sentDate ) ; var emailText = emailTemplate . TransformText () ; As mentioned in the beginning of this chapter, the Design–time templates [8] are useful to generate POCO files. A simple example is presented in the attachments (folder CsharpCodeGenerationDemos). This example generates a class with properties written in a data source file properties.txt. The generated class is immediately available in the application source code, as the Design–time templates runs in the following cases: • the template is edited and focus is changed to a different window of Visual Studio • the template file is saved • user explicitly invokes execution of the template by clicking Trans- form All Templates or Run Custom Tool buttons in Visual Studio For the sake of brevity, the Design–time template example is not further explained in the text of this thesis. 15
2. Other code generation options in C# 2.4 Reflection Reflection [11, 12, 13] is a programming language’s ability to inspect and manipulate code entities without knowing their identification or formal structure. The inspection entails analysis of objects and types to gather structured information about their definition and beha- vior with little or no prior knowledge about them. Manipulation uses information obtained by the analysis to dynamically invoke code, instantiate the types, or even change the structure of types and objects at runtime. Therefore, it is a crucial mechanism that allows a program to work with the rest of the code, even with the code that might not have been written yet. Reflection requires the use of a flexible object-oriented language (such as C#) to express the abstract structure of the application. In .NET, a compiler transforms the source code into IL with metadata that contains information about the types and the abstract structure. When the compiled code runs under CLR, a .NET library System.Reflection can consume the metadata to provide the capabilities of reflection. This library encapsulates many runtime concepts, such as assemblies, modules, types, methods, or fields. The inspection and manipulation of a loaded code focus mainly on System.Type, which represents a runtime type of the targeted object. The Type instance can be accessed via method Object.GetType and it offers various methods to explore the metadata of the type further. For example, a method Type.GetMethod can be used to obtain an in- stance of System.Reflection.MethodInfo, and this instance contains the method’s metadata. Numerous similar methods are available, and they provide the capability to obtain comprehensive descriptive infor- mation about the type; in other words, to inspect the type. Dynamic execution of the methods or creation of new objects by invoking a constructor can be used to manipulate the object. Moreover, reflection also enables generating code at runtime via tools provided by System.Reflection.Emit [14, 15] namespace. This namespace contains types such as MethodBuilder or EventBuilder that allow generating corresponding code constructs. These elements can be added to a dynamic assembly represented by an instance of an AssemblyBuilder. These dynamic assemblies can be saved to disk as portable executable files, typically .dll files. Alternatively, they can 16
2. Other code generation options in C# be emitted directly to memory for immediate use at the expense of persistence. Inspection and manipulation of types and objects at runtime are slower than the identical operations executed statically in the source code—reflection comes with a performance penalty. It is a necessary exchange of performance for the dynamic nature of reflection, and this is commonly the main issue when working with reflection. For example, one of the common uses of reflection is a dynamic serialization (or deserialization) of objects. The reflection can obtain descriptive information about the given type (such as its properties) and serialize the object. Without reflection, a custom serializer for each type might be necessary for this task; with reflection, a single method to serialize objects is enough. However, this is where the performance penalty occurs, as the method needs to discover the type’s metadata before performing the serialization. 2.4.1 Reflection example The core use cases of reflection in C# is gathering descriptive informa- tion about types [16] and then using the information for code mani- pulation [17]. This section explains fundamental reflection operations that enable inspection and manipulation. The object introspection is presented in the Listing 2.14. Firstly, a type of targeted object is obtained via method object.GetType and this instance is used to gather the descriptive information. In this case, methods of the type are listed; nevertheless, more information can be obtained by the type instance, such as its fields, properties, or constructors. Listing 2.14: Type instrospection with reflection Type type = object . GetType () ; Console . WriteLine ( $ " Type : { type . Name } " ) ; MethodInfo [] methods = type . GetMethods () ; foreach ( var methodIn in methods ) { Console . WriteLine ( $ " Method : { methodIn . Name } " ) ; } 17
2. Other code generation options in C# Additional analysis of a method can be seen in the Listing 2.15. The Listing uses type.GetMethod to acquire descriptive information about the given method in a form of MethodInfo object. This object provides procedures to explore the method, such as its return type, name, or parameters. In particular, the Listing 2.15 uses method Get- Parameters, which returns an array of ParameterInfo objects that represent parameters of the method. This information may be useful when trying to invoke the method dynamically. Listing 2.15: Method introspection with reflection MethodInfo method = type . GetMethod ( nameof ( SampleClass . MyMethod ) ); Console . WriteLine ( $ " Method name : { method . Name } , " + $ " Type : { method . ReturnType . Name } " ) ; var parameters = method . GetParameters () . Select ( p = > $ " { p . ParameterType . Name }:{ p . Name } " ) ; Console . WriteLine ( $ " Parameters : { string . Join ( " , " , parameters ) } "); method . Invoke ( sampleClass , new object [] { " Hello World ! " }) ; Reflection can scan a whole assembly to get a list of types with desired properties, such as all classes in the assembly. The Listing 2.16 demonstrates this ability, as it presents a code that scans types of the executing assembly. The scan gathers all class types that implement a specific interface, ISampleInterface in this case. The ability to scan and filter types is often used for the development of frameworks and libraries16 Listing 2.16: Scanning assembly with reflection List < Type > assemblyClassTypes = Assembly . GetExecuti ngAssembly () . GetTypes () . Where ( t = > t . IsClass ) . ToList () ; var c l as s e s O fS a m p le I n t e r f a c e = assemblyClassTypes . Where ( t = > t . GetInterfaces () . Contains ( typeof ( ISampleInterface ) ) ) ; 16. For instance, a common library for object mapping, AutoMapper. 18
2. Other code generation options in C# When developing frameworks, another valuable ability might be instantiating the desired classes (the classes obtained by the scan). This objective can be achieved via Activator class [18], as this class enables developers to instantiate types dynamically. A use of an Activator is demonstrated in the Listing 2.17. Activator offers methods to create instances, such as Activator.CreateInstance. This method requires information about a type that is supposed to be instantiated and if the type has a parameterless constructor, it is enough information to create a new instance. Otherwise, additional information should be supplied, such as parameters for the constructor. In this example, SampleClass has a parameterless constructor and a constructor that requires an int as a first parameter and a string as a second parameter. In the Listing, the first use of the Activator.CreateInstance method creates a SampleClass instance via parameterless constructor. This method is invoked with additional parameters: 1, and “myText“; Activator will try to match the most suitable constructor requiring these parameters. Listing 2.17: Create new instance with reflection object sampleClass = Activator . CreateInstance ( typeof ( SampleClass ) ) ; SampleClass sampleClass2 = ( SampleClass ) Activator . CreateInstance ( typeof ( SampleClass ) , 1 , " myText " ) ; For the sake of brevity, an example of System.Reflection.Emit usage is omitted in this thesis. However, an exhaustive set of examples is available in the Microsoft documentation. 2.5 Summary C# provides various approaches to code generation; each has its advan- tages, disadvantages, and specific use cases. This chapter explained the four frequent mechanisms, whose understanding is helpful to gain a general idea about the options for code generation in C# and the subsequent introduction of Source Generators. Code snippets serve as code shortcuts, and they are a common fea- ture of IDEs. They accelerate the manual coding process and improve 19
2. Other code generation options in C# code consistency. However, the code generation is static, and the ge- nerated code is not updated automatically. CodeDOM is a tool that can generate a source code based on a language–independent model. This ability allows the tool to use a single model to generate the code in multiple languages. However, the structure of the model can be complicated since most of the language features are not supported by default. Moreover, CodeDOM runs only in .NET Framework, which has been overtaken by .NET Core, or the following: .NET 5 and .NET 6. T4 Templates provide an option to combine .NET code with string literals to produce complex strings that are either textual outputs or a part of the program’s codebase. However, T4 Templates are not syntax safe since their purpose is to work with strings, and they do not understand the output they produced. Reflection is a programming language’s ability to inspect and ma- nipulate code at runtime with no prior knowledge about the code. This mechanism allows developers to write code that can dynamically react to the source code structure changes. However, the dynamic na- ture of reflection and its runtime operations also bring a performance penalty. 20
3 Source Generators Source Generators [19, 20, 21] are a C# mechanism for compile–time metaprogramming that is independent of IDE and compiler. They are a part of the compilation process (illustrated in the Figure 3.1), which enables them to access metadata of the code that is being compiled. During the compilation, Source Generators can inspect and analyze the code, use the obtained information to generate more code, and add the generated code to the user’s codebase. Besides the user’s code, a Source Generator can consume Additional files, which are resource files explicitly linked to a Source Generator. These files may contain any textual information, and the Source Gene- rators may be designed to parse and process them in a desired manner. The combination of the capabilities to inspect the existing code and to process additional information outside the code makes Source Genera- tors worthwhile. Figure 3.1: Schema of compilation with Source Generators phase [19] Source Generators (from now on also referred to as the generators or a generator) were designed to enrich the C# code generation options in specific scenarios, and they possess the following properties [22]: • use strings to build and represent the new code • code generation is additive–only by design • support for diagnostics during the code generation process 21
3. Source Generators • access to Additional files • un–ordered run of generators • added to a program via assemblies The string representation of the C# code enhances the clarity of the code building during the generation process. A generator can be constructed to generate a code similarly as T4 text templates do. Such approach contrasts with the approach of CodeDOM framework, whose output is difficult to predict before the generation is completed because the framework utilizes a graph model to represent the code. Source Generators are additive–only by design [23], which means a developer can use them to generate a new code, not to modify an existing code. The code modification is blocked mainly by complex design choices and by Visual Studio tooling. It would require significant resources to resolve such a block, and, most importantly, it is not clear how to design the code modification behavior in some circumstances. IntelliSense, code navigation, or debugging are examples of tools with challenging design choices. It is unclear how the IntelliSense should support a code modified by a generator, how to design code navigation of a dynamically modified code, and how to debug code that generator actively changes. Source Generators have a built–in error reporting capability through the diagnostics. The diagnostics is a standard form for reporting compi- lation errors, and a generator can use this feature to report any issues that caused a failure in the code generation. The diagnostics are suitable to represent helpful information about the cause of the error and to guide a developer to correct the problem. Additional files are text files of any form that are linked explicitly to a generator. The generator can access and employ these files during its code generation. For example, an Additional file can be a .csv file with multiple records, and the generator aims to transform the file’s records into a C# object collection. To achieve this goal, the generator would read the header of the .csv file to prepare a C# model class with matching properties and then generate a C# collection with an element for each record in the file. The un–ordered run of generators is primarily a design choice for performance [24]. Since generators are additive–only, the compilation 22
3. Source Generators process can supply the same input compilation object [25] to all ge- nerators, and their code generation would not cause conflicts, as they cannot modify the existing code in the compilation. If an ordered run were allowed, the compilation would have to wait for some generators to finish their run, then process their output compilation, and deliver it to other generators, which depend on the previous generators. Source Generators are built in an assembly that is configured simi- larly to analyzers, and an application utilizes the generators via a pack- age reference or a project reference. If a ProjectReference is used, it must explicitly specify it references a Source Generator for the compiler to connect the generators to its compilation process. 3.1 Purpose Newtonsoft.Json, AutoMapper, Autofac, or Dapper belong to the most downloaded packages on NuGet1 , and these packages share the same characteristic. They utilize code analysis and code manipulation to improve the coding experience of developers. These packages tend to employ a significant amount of runtime procedures. Typically, their runtime procedures are the following [26]: • Given a Type: check the library cache of known types. If the Type is unknown, perform reflection code to understand the Type, and to generate a strategy implementing the features of the library on the given Type. Finally, expose an API that could be used to invoke this strategy. Source Generators primarily serve as a more performant solution for some of the tasks currently solved by reflection [20, 26]. Examples of the justified and appropriate areas for Source Generators are the following: • AOT compilation Ahead–of–time compilation (or AOT compilation) is the act of compiling a programming language into a different program- ming language before executing it, commonly at build–time2 . 1. The statistics are available at NuGet website. 2. Often higher–level language into a lower–level language, for instance, CIL code into a native machine code. 23
3. Source Generators The use of reflection may severely harm the ability of the AOT compilation to optimize the code. Moreover, AOT may restrict runtime code generation3 , which might forbid emitting an opti- mized code. Source Generators emit code during compilation; therefore, all of the necessary code physically exists in the assemblies. The runtime overhead is not created, and the AOT platforms’ runtime restrictions do not apply. • Linkers Linkers commonly involve trimming an assembly—they remove the unnecessary, unused parts. However, the trimming might cause errors when combined with reflection. The linker might decide to remove a package reference that is not necessary for the present code. However, the linker cannot foresee that the package is required by the reflection at runtime. Therefore, the developer must configure the linker to preserve the required package and simultaneously to remove everything unnecessary. The source code produced by Source Generators is available at compile–time; therefore, the linker can correctly recognize the required dependencies, which should not be trimmed. • Cold start The cold start of an application means the application has no initialized resources; it needs to rebuild its fundamental infra- structure. Some applications run for days or even weeks and are only restarted when a new updated version is deployed. The cold start has a negligible impact on such applications. However, applications invoked for a brief time to perform a short and simple task, such as Azure functions, are directly affected by the cold start delay. Source Generators transfer the reflection operations from runtime to compile–time. This removes the application startup overhead caused by the reflection code, which builds constructs such as dependency containers, mappers, or serializers when the appli- cation starts. 3. For example, Xamarin.iOS. 24
3. Source Generators • Runtime error discovery Reflection operates only at runtime; therefore, the errors are not visible, and the developers do not receive the feedback until they run the application. The invalid code produced by Source Generators will cause build failures. Therefore, the error discovery happens at compile–time. Moreover, Source Generators support diagnostics, so they can raise warnings and inform the developer about problems. • String–typed APIs Currently, some mechanisms rely on strings to operate correctly; for instance, the routing between razor pages and controllers in ASP.NET Core is based on strings. However, the string–typed APIs are fragile, and a typographical error leads to malfunction- ing behavior. Source Generators can dynamically generate a strongly typed mechanism (such as an enumeration) to replace the unsafe strings. 3.2 Implementation Essentially, Source Generators are classes annotated with an attribute Microsoft.CodeAnalysis.GeneratorAttribute4 and implementing an ISourceGenerator interface [20, 22]. A core structure of a Source Generator is presented in the Listing 3.1. ISourceGenerator interface requires two methods to be imple- mented by a generator class, Initialize and Execute. As these names suggest, Source Generators operate in two phases: initialization phase and execution phase. 4. This attribute can also specify the languages (C#, F#, or VisualBasic) to which the Source Generator applies. 25
3. Source Generators Listing 3.1: Core structure of a Source Generator using Microsoft . CodeAnalysis ; [ Generator ] public class MyGenerator : ISourceGenerator { public void Initialize ( G e n e r a t o r I n i t i a l i z a t i o n C o n t e x t context ) { // Initialize generator } public void Execute ( G e n e r a t o r E x e c u t io n C o n t e x t context ) { // Generate code } } 3.2.1 Initialization The method Initialize represents the initialization phase, and the method is executed exactly once prior to any generator run. This method expects a GeneratorInitializationContext parameter that a genera- tor can use to register callbacks that help the generator to accomplish efficient and successful code generation. The GeneratorInitializationContext type contains several mem- bers with various purposes; the type’s structure is illustrated in the Listing 3.2. The first member, CancellationToken, has a common purpose in the C# environment: it can be used to revoke a process (in this case, the initialization of a generator) to prevent a waste of processing resources. The RegisterForSyntaxNotifications methods enable the gene- rator to effectively process the user’s code. Before each generation, a delegate SyntaxReceiverCreator, respectively SyntaxContextRecei- verCreator, is invoked to create an instace of ISyntaxReceiver, re- spectively ISyntaxContextReceiver. The syntax receiver has its OnVi- sitSyntaxNode method invoked for each syntax node [25] in the com- pilation, which allows the syntax receiver to gather specifc information about the code available in the compilation before the code generation (the execution phase) occurs. 26
3. Source Generators Listing 3.2: GeneratorInitializationContext public struct G e n e r a t o r I n i t i a l i z a t i o n C o n t e x t { public readonly CancellationToken CancellationToken { get ;} public void R e g i s t e r F o r S y n t a x N o t i f i c a t i o n s ( Syn taxR eceiv erCr eato r receiverCreator ) ; public void R e g i s t e r F o r S y n t a x N o t i f i c a t i o n s ( S y n t a x C o n t e x t R e c e i v e r C r e a t o r receiverCreator ) ; public void R e g i s t e r F o r P o s t I n i t i a l i z a t i o n ( Action < GeneratorPostInitializationContext > callback ) ; } The two PostInitialization types differ in the type of the parameter of their visit method, and only one of the types should be used5 . If the syntax receiver is an ISyntaxReceiver, its OnVisitSyntaxNode method requires only the syntax node as a parameter. If an ISyntaxContex- Receiver is used, the method has a compound parameter type that contains the syntax node and a semantic model [25] (hence the "Context" in its name). In the generation phase, an instance of ISyntaxReceiver, respec- tively ISyntaxContextReceiver, is available, and any information col- lected by the syntax receiver can be used to generate additional code6 . The method RegisterForPostInitialization allows a generator to add an additional phase called PostInitialization. This phase runs af- ter the initialization, and it can be used to alter the compilation provided to subsequent phases of the generator. For example, a generator may use this phase to add additional sources to the compilation. They are added before the execution phase; therefore, a syntax receiver can visit these sources, and these sources are available for semantic analysis as a part of the GeneratorExecutionContext instance in the execution phase. 5. Source Generators issue a warning when both of them are used. 6. A new instance of the syntax receiver is created per generation; therefore, there is no need to manage its lifetime. 27
3. Source Generators 3.2.2 Execution The execution phase is responsible for the code generation itself, and the phase is represented by the method Execute. This method requires a GeneratorExecutionContext parameter that contains the necessary information for the code generation logic. Moreover, the parameter provides methods that enable the generator to add a new code to the compilation or report an error. The Listing 3.3 illustrates noteworthy content of the GeneratorExecutionContext namespace type. Listing 3.3: GeneratorExecutionContext public struct G e n e r a t o r E x e c u ti o n C o n t e x t { public CancellationToken CancellationToken { get ; } public A n a l y z e r C o n f i g O p t i o n s P r o v i d e r Ana lyze rConf igO p tion s { get ; } public Compilation Compilation { get ; } public ImmutableArray < AdditionalText > AdditionalFiles { get ; } public ISyntaxReceiver ? SyntaxReceiver { get ; } public ISy nt ax Con te xt Re cei ve r ? Syn taxC ontex tRe c eive r { get ; } public void AddSource ( string hintName , string source ) ; public void AddSource ( string hintName , SourceText sourceText ) ; public void ReportDiagnostic ( Diagnostic diagnostic ) ; } CancellationToken has the same purpose as in GeneratorInitia- lizationContext—it can be used to cancel the execution of this phase. A generator can access MSBuild properties and metadata, and use them to customize the code generation. The properties that are sup- posed to be accessible by the generator must be specified for MSBuild, so it can translate them into an AnalyzerConfigOptions property of the generator execution context. Properties Compilation and AdditionalFiles contain the infor- mation a generator can consume to produce code. Compilation rep- resents the compilation at the time of execution. It contains the code 28
3. Source Generators supplied by a user and the code added during PostInitialization; other generated code is not available. The AdditionalFiles property is a collection of Additional files linked to the generator. Typically, the immense amount of user’s code metadata carried by Compilation cannot be processed effectively manully. Hence, Genera- torExecutionContext also has the properties SyntaxReceiver and ISyntaxContextReceiver, which contain an instance of PostInitializa- tion registered during initialization of the generator, or null when no syntax receiver was registered. Methods AddSource are used to add a source code to the compi- lation. The hintName parameter serves as a readable identification of the produced source code7 , and the source, respectively sourceText, parameter contains the code prepared to be added to the compilation. The method ReportDiagnostic enables the generator to add a diag- nostic to the compilation. The severity of the issued diagnostic might cause a compilation failure. 3.3 Fundamental concepts This section describes fundamental concepts of Source Generators and their standard use cases. The use cases and solutions are inspired by and mostly taken from the official samples in the Roslyn documen- tation because the documentation suitably clarifies the fundamental concepts8 . 3.3.1 Adding source code to compilation An elementary example of a generator that populates the compilation with generated source code is demonstrated by the Listing 3.4. In this example, the Execute method adds an empty class called Generated- Class to the compilation, and hence makes the class available to the rest of the user’s code. 7. Although, the generator might add a prefix or a sufix in case of the duplicated identification. 8. A set of Source Generator examples is available in the attachments (folder Source- GeneratorDemos 29
3. Source Generators Listing 3.4: Adding source code with a Source Generator [ Generator ] public class CustomGenerator : ISourceGenerator { public void Initialize ( G e n e r a t o r I n i t i a l i z a t i o n C o n t e x t context ) { } public void Execute ( G e n e r a t o r E x e c u t i on C o n t e x t context ) { string sourceCode = @ " namespace GeneratedNamespace { public class GeneratedClass { // generated code } }"; context . AddSource ( " myGeneratedFile . cs " , SourceText . From ( sourceCode , Encoding . UTF8 ) ) ; } } 3.3.2 Additional files The AdditionalFiles property of the execution context abstracts the files linked to the generator, and it can be used to process these files. The Listing 3.5 illustrates a simplified procedure in which a generator utilizes Additional files during its execution phase. Listing 3.5: Execution phase utilizing Additional files public void Execute ( G e n e r a t o r E x e c u t io n C o n t e x t context ) { var myFiles = context . AdditionalFiles . Where ( at = > at . Path . EndsWith ( " . json " ) ) ; foreach ( var file in myFiles ) { SourceText content = file . GetText () ; string code = M y J s o n T o C s h a r p T r a n s f o r m a t i o n ( content ) ; context . AddSource ( $ " { file . Path } Generated . cs " , code ) ; } } 30
3. Source Generators The procedure takes the content of all JSON Additional files, trans- forms it into a source code, and adds it to the compilation. The method MyJsonToCsharpTransformation abstracts the JSON to a source code conversion logic. 3.3.3 Syntax receivers Processing a user’s code usually requires a means of code annotation so the generator can effectively locate the parts of the codebase to augment. A syntax receiver can use any information to locate the targets for the generator. A common annotation approach uses custom attributes as the "markers" in the user’s code and then instructs a syntax receiver to search explicitly for the parts of the code marked with these attributes. The Listing 3.6 demonstrates such approach. Listing 3.6: Syntax receiver example class MySyntaxReceiver : IS yn ta xC ont ex tR ece iv er { public List < ITypeSymbol > Classes { get ; } = new () ; public void OnVisitSyntaxNode ( Ge ne ra to rSy nt ax Co nte xt context ) { if ( context . Node is not C la ss De cla ra ti onS yn ta x { AttributeLists : { Count : > 0 } } classDeclaration ) return ; ITypeSymbol classSymbol = context . SemanticModel . GetDeclaredSymbol ( classDeclaration ) as ITypeSymbol ; if ( classSymbol . GetAttributes () . Any ( attrData = > attrData . AttributeClass . ToDisplayString () == " MyAttribute " ) ) { Classes . Add ( classSymbol ) ; } } } The example uses a OnVisitSyntaxNode method with a Genera- torSyntaxContext parameter, which provides access to the visited 31
You can also read