GHAS - How to use CodeQL custom queries?
Over the last two years, I have seen a growing awareness when it comes to zero trust computing. When organizations look at zero trust computing, the first thing that comes into mind is getting the infrastructure secure. Assuming breach, ensuring that there are multiple layers of security applied. As an engineer I love seeing this growing awareness. What a lot of organizations seem to forget is that perhaps the most important part of securing its data starts with analyzing the source code of its proprietary software.
Nowadays, most software depends on third party (open source) components, which can have vulnerabilities as well. Furthermore the code that is developed within the organization can contain serious security flaws. GitHub Advanced Security (GHAS) can help those organizations by giving powerful tools like Dependabot (scan source code and its dependencies on vulnerabilities), Secret scanning (scan if there are any secrets embedded in source code and configuration files) and last but not least Code scanning.
One of the cool things of GHAS Code scanning is that we can add custom queries to it, to scan our code on certain code constructs (such as empty blocks or unintended public methods). Getting started with getting your own custom CodeQL queries in the code analysis workflow can be very cumberstone. At least for me it was hard, because I was not able to find clear documentation and examples on this. It costed me almost a day to get up and running. In this blog post, I will describe how I got my first custom CodeQL query to work.
Getting up and running
Enabling Code scanning is straight forward. In the Settings section, choose under Security the option “Code security and analysis”. Scroll down to Code Scanning and choose to set up an Advanced workflow.
A new screen appears, showing a new workflow called “codeql.yml”. This workflow enables automatic code scanning on every push to and pull from the main branch and operates on a cron schedule for periodic scanning.
In this workflow we can embed our own CodeQL queries. Before I come to that part, we need to add a query definition to our repository.
Add my own custom query
Within my repository I created the following folder structure:
Within the custom-queries folder, I store all my custom queries that I want to add as additional queries to the extensive library GitHub is providing. These queries are tailored to your use cases, in my case I added a check for empty code blocks in my source code.
/** * @id codeql/custom-queries/redundant-if-statements * @name Bas's empty blocks * @description Find my empty block statements. * @kind problem * @tags empty * bas */ import csharp from BlockStmt blk where blk.isEmpty() select blk, "This 'if' statement is redundant."
In its essentials, a CodeQL statement looks a bit like a SQL expression. I’ll explain it in more depth.
In top of the file there is a comment header. This comment header is essential and cannot be omitted. The fields @id, @name and @kind need to be set. The @kind indicates how Code scanning should handle the results of the query.
The import csharp statement describes what programming language is going to be analyzed. In the background specific libraries are present which are used to do the analysis. GitHub provides a detailed description of the C# CodeQL Library.
In the example above, my source code will be analyzed for the presence of Block statements (from BlockStmt blk), e.g. if statements. For each statement that is present in the source code the “where blk.isEmpty()” will check if there is any code within the statement. If there is no code present, then the select blk, “This ‘if’ statement is redundant.” will select the empty statement.
The “This ‘if’ statement is redundant.” description will be shown as a friendly description to indicate why this piece of code was marked as a potential problem.
Pack the query
If we want to use the query in our CodeQL workflow, we need to do one additional thing which is “packing” it. Packing means that the custom queries are being compiled to a binary form that can be executed by the CodeQL engine. The compilation can be done upfront or dynamically during the execution of the workflow.
In this blog, I’ll follow the path of the dynamic compilation which means that I have to provide some meta data for the compiler. The meta data is provided in a file called qlpack.yml that has to be present in folder of the custom queries that I provide (note that in a single folder multiple query files can be provided).
A closer look at the qlpack.yml file:
name: basvandesande/csharp-custom-queries version: 0.0.1 dependencies: codeql/csharp-all: "*" extractor: csharp
In the dependencies array field, at least the language definition library has to be specified with the minimum version required. In this case “*” means all versions. In order to parse your source code correctly, the extractor needs to be set to your language. In my case csharp.
Once this is all set, commit and push the code to the repository.
Modifying the CodeQL workflow
Open the CodeQL workflow
Scroll down to line 53 and modify this line as follows:
queries: +security-and-quality, ./.github/codeql/custom-queries
The line has turned into a comma seperated line containing multiple library packs. The “security-and-quality” is the default library GitHub provides. By adding the PLUS (+) in front of it, the presence of this library is ensured.
The second library is the relative path to the folder in which my queries are stored. GitHub Code scanning is pretty picky about the relative folder structure. This costed my a lot of time because existing examples didn’t use a local relative path but referred to an external repository instead.
The CodeQL.yml file should look like this after changing the line:
Let the Code scanning begin!
Each time I push code to or pull code from ‘main’, the CodeQL workflow will start processing in the backgound.
Once the code scanning finished successfully, the security risks can be assessed via the “Security” option in the menu bar. The security option will show a small shield containing the number of risks detected.
As you can see, I have three locations in my code containing an empty code block. Zooming in:
The friendly message that I provided in the query is shown under the highlighted empty code block, giving me a clue on how to handle the alert. Each alert can be dismissed. Once dismissed, the specific alert instance won’t be shown anymore in the list.
With all mechanisms in place, it is time to start working on my own custom queries.