Assessing the Implications of Using msoffcrypto for Open Sourcing a Medical Data Processing Pipeline
I am in the process of creating a Python pipeline intended for reading and processing sensitive medical personal data from password-protected Excel files. The pipeline utilizes the msoffcrypto library, specifically the OfficeFile.load_key and OfficeFile.decrypt functions, to handle the decryption of these files.
Given the nature of the data and the importance of maintaining privacy and security, I am seeking advice on open sourcing this pipeline. Here are the details and considerations:
The pipeline's primary function is to automate the processing of sensitive medical data stored in Excel files that are password protected for an extra layer of security. I am aware of the necessity to omit the actual passwords from any code or documentation I make public.
I have researched common security practices for open-source projects and have implemented code obfuscation where sensitive information might be included.
Before proceeding, I want to ensure I'm not overlooking any potential security or privacy issues that could arise from making the pipeline's code available to the public. Here is what I have considered and tried so far:
- Password Management: Ensuring no hard-coded passwords are present in the codebase.
- Code Review: Conducting thorough code reviews to check for any inadvertent inclusion of sensitive information.
- Documentation: Preparing documentation that provides guidelines on how to securely use the pipeline without exposing sensitive data.
- Security Audit: Planning to perform a security audit of the code to check for vulnerabilities.
My question is: Are there any other significant issues or best practices I should consider before open sourcing a tool of this nature? Furthermore, are there any specific aspects of the msoffcrypto library that I should be particularly cautious about in the context of open source?
Any guidance or insights on this matter would be greatly appreciated.