Creating a number token
No lecture description Lesson locked purchase
Description
Learn to create your very own C compiler from scratch. In this course we develop a compiler that compiles a subset of the C Programming Language. By the time you finish all modules of this course you will be able to compile C programs that use pointers, structures, unions, arrays, functions, for loops, while loops. do while loops, if statements, switches and much more! This course includes all course modules!
Our compiler also has a preprocessor macro system allowing you to include header files and create definitions just like you would in any C file.
Your compiler is advanced enough to use the GCC standard library so we are able to call C functions from our compiler. Your compiler will be able to compile the C programming language.
This course does not rely on any frameworks we do everything from scratch to ensure the best possible learning experience for students
Module 1
In module 1 of this course we load our C source file that we wish to compile, into memory. We create a lexer to preform lexical analysis on the source input which will convert the source code into a bunch of tokens that our compiler can easily understand. We then pass the tokens through a parser to produce an abstract syntax tree. An AST describes the C program in a logical way that makes it easier for our compiler to understand. For example for the expression 50 + 20 you will end up with a root expression node that has a left operand that has a node of value 50 and a right operand that has a node of value 20. Breaking down problems in this way makes it much easier to create compilers.
Module 2
In module 2 of this course we create a code generator that produces 32 bit Intel assembly language that can then be passed through an assembler to produce a program binary that we can run. We also in this module create a resolver system which is responsible for taking a complicated expression such as "a->b.c.e[50] = 50" and breaking it down into simple steps and rules that our code generator can then easily follow. This abstraction is essential to ensure that the code generator does not become over complex. With the use of a resolver system we can ensure the code base remains clean.
Module 3
In module 3 of this course we create a preprocessor and macro system. This preprocessor system allows us to include header files in our C programs and also use a variety of macro keywords such as "#define" "#ifdef" , "sizeof" and many more.
Module 4
In module 4 we build a semantic validator which validates our C code. A semantic validator ensures that we are not setting variables that do not exist or accessing structures that arent there.
This is the only video course in the world that shows you how to create a C compiler, come and learn today!
Requirements
-
You must have a basic experience of assembly language.
Who This Course is For
People with an interest in compiler design
People who are interested in assembly language
People who are interested in the C Programming language
What You Will be Learn
How to build a C compiler from scratch
Full understanding of stackframes and how assembly language is generared for a C source file
Complete Understanding of lexical analysis and parsing
Stronger Assembly language skills will be gained
Compiler Design
Dragon Zap Instructor
Daniel McCarthy is a seasoned software engineer, boasting an impressive career spanning over 14 years in the industry. Holding a Master's Degree in Advanced Computer Science from Cardiff Metropolitan University, his broad spectrum of experience encompasses everything from web development to complex compiler and interpreter development. Daniel has honed his skills in bootloader and kernel development. In testament to his proficiency in the field, he has designed two proprietary programming languages: Craft, a general-purpose language, and Marble, a web-focused language akin to PHP. Moreover, he has successfully developed compilers for the C programming language. A testament to his versatility, Daniel demonstrates proficiency in an extensive list of programming languages that includes C, C++, Java, x86 Assembly language, PIC assembly, SQL, PHP, HTML5, JavaScript, CSS, and of course, his own creations, Craft and Marble. His professional portfolio also includes the development of Linux kernel modules, a task he has executed with proficiency in a professional context. Currently, Daniel is channeling his wealth of experience and expertise into the education sector, with the aim of nurturing the next generation of professional software engineers.
Ask a question
Questions (2)
Arthur
1 year ago
When using the function compile_process_next_char or compile_process_peek_char you acess the compile_process struct inside the lex_struct to get the struct pos. why? the lex_struct already have this property, i m a little bit confused.
Daniel McCarthy
1 year ago
Hi Please define your question better as its too broad for me to understand your concern. Please list what your confused about in as much detail as possible so i can clarify. Please also only ask questions on dragonzap
Arthur
1 year ago
for example: char compile_process_next_char(struct lex_process* lex_process) { struct compile_process* compiler = lex_process->compiler; compiler->pos.col += 1; char c = getc(compiler->cfile.fp); if (c == '\n') { compiler->pos.line +=1 ; compiler->pos.col = 1; } return c; } this function access the struct 'pos' from the struct 'compile_process' even though the struct 'pos' is present inside the struct 'lex_process'. why not acess it directly instead of going for the field inside 'compile_process'? why did you declared the same property on two different structs? that's my question.
Daniel McCarthy
1 year ago
I see you made duplicate question i deleted it. Please note that sending me emails, or creating new questions will not make me answer faster. I try to answer all questions within 24 hours but I also have commitments in my life. When you ask a question I will always respond please be patient I will always help you and my students. Regarding your question its much more clear now to me. From the top of my head I don't remember the reason I did it but it was likely to do with keeping abstraction. The position might be different for the compiler than the lexer if we later decided to provide other means of reading the file. You could just use the lexer position but I suppose I wanted to be more exact. I don't remember my reasoning but I am sure I had a good reason to do this. You'll probably find out as you continue the course. I can't remember the reason for everything I do because I have so many courses and they take many hours to develop but it was most likely to keep them abstracted from eachother for a compatability reason where the values could differ in the future.
Thomas H.
11 months ago
When I execute my code (and also the code of your repository at the time of this commit) "everything compiled ..." is not printed to the console, but "Unexpected token, ...". Is it possible that the code of your respository and what you showed in your video is missing something to get the same output in the console?
Daniel McCarthy
11 months ago
Hello, I dont think so. Did you do a full clone or just copy and pasted the changes? I recommend a full clone from that point in time so that you can check because since its working in the videos and the commits are the same you shouldn't be having that problem , it could be possible you forgot to clean your object files as well
Thomas H.
11 months ago
I did a full clone and then I did a "git reset --hard [hash of the commit]"
Thomas H.
11 months ago
I called "make clean" and after that I called "make"
Thomas H.
11 months ago
I debugged the code. The problem is that in my case peekc() inside the method "read_next_token" (lexer.c) returns 255 when it is at the end of the file and so it does not run into the case EOF, but into the default case of the switch-statement. That means compiler_error(...)
Thomas H.
11 months ago
I found the problem. I was a problem regarding processor architecture. I'm using a virtual machine on my Mac (apple chip => Arm architecture => ubuntu arm). I tested it on another machine with amd64 (virtual machine with ubuntu arm64) and there it works. I would like to work on my Mac (arm). I hope I found a workaround for the ubuntu arm. Maybe I should specify all char more specific as signed char. On ubuntu with arm architecture char is an unsigned char (therefore the problem)
Thomas H.
11 months ago
* Of course I meant "(virtual machine with ubuntu amd64)" within the second parentheses.
Thomas H.
11 months ago
Now I found the perfect solution for my problem. I changed all gcc calls inside the makefile and added the compiler option "--signed-char"
Daniel McCarthy
11 months ago
Hello, Glad you solved your problem and feel free to reach out for anything else Thanks Dan