Coding Standards
This page is a list of standards that every scripts or programs in the cerberus-sandbox project must match. If a program file does not follow thoses standards, upgrade them.
The goal is to bind every file of the project in a global scope, with the same high-level sythax. Navigating through the source code of the sandbox should be as user-friendly as possible, and the code and it's comments must speak for themselves.
It will help to integrate new functionality latter, keep everything clear, and eventually to maintain the codebase.
-
Whenever a new script file is added to the project, it name must be the concatenation of what it is doing, keeping only the keywords. Instead of adding spaces, the first letter of each word will be in the uppercase format. i.e:
StaticEngine.py
,Disassembler.py
,PeParser.py
orFileDriver.py
. -
Python3 should be the only autorized version of Python.
-
The first line of the script must indicate the encoding. By default, we'll use the UTF-8 encoding.
# coding: utf-8
- After the encoding, the next 3 lines should be used as a header to indicate the date of creation of the original file, the name of the author and a quick description of what the script is doing, or how to use it.
# 17/10/2020
# HomardBoy
# Low-level linear disassembler based on the Capstone library.
- Everything related to the python code should match the PEP-8 standards. To quickly check if a file is following the PEP-8 rules, you can use the pep8 tool:
$ pip install pep8
$ pep8 --first Disassembler.py
Disassembler.py:222:34: W602 deprecated form of raising exception
Disassembler.py:347:31: E211 whitespace before '('
Apart from the PEP-8 standard, you are encouraged to follow those additional rules:
- For performance, try to only import the Python's modules that you need, not the full library (never import the full library as
from lib import *
):
from lib import function
- For performance reason, string formating should not be done using concatenation of variables and strings, nor by using format strings. Instead, use f-strings. They are faster to load, and very clear to read:
name = "cerberus"
f"The best sandbox is {name}-sandbox."
- Logs messages should use the Python
logger
objects. Don't print anything, log it. If your script or function support this usecase, try to pass the log level as an argument. Always keep your logs message as clear and minimal as possible.
import logging
logging.basicConfig(format='%(levelname)s:%(message)s', level=logging.DEBUG)
logging.debug('I am here for information only')
logging.info('I am used to track how the program is going')
logging.warning('Something might be broken')
logging.error('Something is broken')
logging.critical('Everything is broken')
- Use f-strings inside your loggers objects:
logging.error('the file {filename} was not found')
- Configure the logs of your application to be redirect to a static logfile, and to stdout (you can skip the stdout redirect part if you don't need to; You are in fact advise not to print anything on stdout):
logging.basicConfig(filename='out.log', format='[%(asctime)s][%(name)s][%(levelname)s] %(message)s', filemode='w+')
logger = logging.getLogger('Application name') # Name of the application that is going to generate logs
logger.setLevel('DEBUG') # The static log file should have the higher log level
console = logging.StreamHandler() # 'console' identify the logs that should be print to stdout
console.setLevel(logging.getLevelName(args.log)) # The console logging is choosen by the user (without affecting the static log file level)
formatter = logging.Formatter('[%(asctime)s][%(name)s][%(levelname)s] %(message)s')
console.setFormatter(formatter) # Set the format of the stdout logs
logger.addHandler(console) # The static file will also get a copy of everything that is print to the console
- The logger output should be redirect by default to
/var/log/cerberus.log
:
parser.add_argument("-o", "--output", help = "Log file. Default=/var/log/cerberus.log", required = False, default = "/var/log/cerberus.log")
logging.basicConfig(filename=args.output, ...)
- The logs format should match this one (but you can add additionnal fields inside if you need to be more precise):
format='[%(asctime)s][%(name)s][%(levelname)s] %(message)s'
- Functions name should describe the global job of the function, with the same construct as the filename (using uppercase letter for each new word, and without any spaces). The first letter should be lowercase:
def addWinApiArguments():
return
- Variables names must be as clear as possible, without uppercase, using underscore as a "space" character:
user_input = input("How are you ?")
-
Global variables should be avoided as much as possible.
-
Comment your functions using the
Sphinx
standard.
def disassemble(entry_point, end_point, f):
'''Returns the assembly code for a bunch of given bytes
:param entry_point: start address of the function to disassemble. Only use for RVA.
:type entry_point: integer
:param end_point: address of the last instruction of the function. Only used to check if the result is still inside the CODE section.
:type end_point: integer
:param f: byte representation of the function to disassemble.
:type f: bytes-array
:returns: Disassembly version of the target function.
:rtype: list of strings
'''
-
Try to use multi-thread as much as you can when your script is doing two differents things at the same time. Each part of the sandbox should be design to be as less time-consuming as possible since the sandbox cannot affort to make the end user hang for too long. Performance is key.
-
Use OOP when needed, but don't use it for anything when it doesn't make sense. If what you are writting is going to be used in multiple instances with different values (exception given for functions with kilometers longs list of arguments), then go for OOP. If not, build simple functions or more linear code.
-
Use Python3 build-in functions when you can. It will improve the execution time of your scripts.