16Ağu
Validating file paths to satisfy GitHub CodeQL’s "Uncontrolled data used in path expression" alert
I'm writing functions for a Python package to register files from a file system to an SQL database, and GitHub's CodeQL has flagged that the file paths are a potential security risk.
I have constructed a basic validator to make sure that only file paths that exist and start with a given base path when resolved are accepted for subsequent processing. However, CodeQL still views this as being insufficient.
Is this a false positive, or can my validation check be further enhanced? Note that I use resolve
so that I can get and compare the start of the file paths.
Example Code
# Python version: 3.9.19
# OS: GNU/Linux RHEL8 4.18.0-553.5.1.el8_10.x86_64
from pathlib import Path
# The storage path is defined internally by an environment variable; here is a placeholder
storage_path = Path("/path/to/where/files/are/stored")
# Define a function to validate the file path provided
def validate_file(file: Path) -> bool:
file = Path(file) if isinstance(file, str) else file # Pre-empt accidental string inputs
file = file.resolve() # Get full path for validation
# Fail if file doesn't exist
if not file.exists():
return False
# Use path to storage location as reference
basepath = list(storage_path.parents)[-2] # This can be made stricter eventually
if str(file).startswith(str(basepath)):
return True
else:
return False
# How it's used in the script
incoming_file = Path("some_file.txt") # Can be partial path, or full path on system
if validate_file(incoming_file) is True:
"""
Run code here to store the file in the database
"""
return True
else:
raise Exception("This file failed the validation check")