Description

Safety Mode is one of the more important parts of our FDIR architecture. Safety mode is essentially designed to keep the satellite alive when a fault occurs that has the potential of impacting the satellite on a subsystem or system level. Without safety mode, the satellite, and by extension the mission is at a high risk of failing.

The diagram below is a good summary of our fault management system:

Untitled

The housekeeping service (a FreeRTOS thread) will continuously poll the handlers (4 different FreeRTOS threads) to check if there are any errors with the different subsystems. If there is an error, the housekeeping service will make a decision whether to execute unit-level safety functions (which the handlers will execute automatically) or raise this error up to the Command Service. This decision matrix is based off of results from an FMECA analysis done by Mission Operations and @Eesa Aamer, which quantifies the criticality of certain satellite faults. If the error is deemed to potentially have an impact that will affect multiple subsystems or the satellite as a whole, then the housekeeping service will raise this error up to the Command Service. The Command Service (which is in charge of all satellite operations) will then put the satellite into Safety Mode.

When the satellite is in safety mode, it is in charge of doing two things:

Untitled

The first step is to try to keep the satellite alive until it can get into contact with the ground station. Since the satellite is orbiting the Earth, our ground station in Toronto will only have a few minutes each day to actively communicate with it. Hence, we need to make sure the satellite can take care of itself while its alone.

To do this, the satellite must maintain its power supply. The payload should be turned OFF and all other subsystems must be using as little power as possible. It should also ensure thermal safe status by polling thermistors on all critical components, to avoid any damage. To maintain a link to the ground, the receiver must be ON, and the flight software must verify this often. In order to get in contact with the ground, we must maintain nadir pointing attitude. Safety mode will need to regularly contact the ADCS module and potentially command it so that our antenna is always pointing towards the Earth. Finally, housekeeping should not stop in safety mode. We still need to poll each of the remaining subsystems for voltages, currents, temperatures, orbit data and altitude data to ensure the spacecraft is not in any immediate danger. Doing step one in most cases should be enough, as safety mode is meant to keep the satellite alive until the operations team on the ground can get into contact with it.

However, it may be crucial that if we can implement fixes to errors that we can predict beforehand. So step two is essentially determining if the error raised is one that we recognize, and then calling a function that implements some sort of solution. Step two is a little open-ended, as you will need the list of errors from Mission Operations and then get creative in terms of effective solutions to those errors.

Your main points of contacts are:

While creating and testing your code, please use the STM32G431RB MCU. This is the MCU that will be used for our on-board computer (OBC).

Tasks

[ ] Implement a function that periodically executes the tasks listed out in Step 1
- [ ] For the housekeeping step, you may wish to simply use the housekeeping service thread instead of recreating that functionality yourself
[ ] Implement a series of functions that can implement a solution to the error that forced the satellite into safety mode
- [ ] It is up to you to decide how you want your Safety Mode function to receive information about the error, and how to use that error to decide which solution to implement
- [ ] Feel free to conduct some research and implement accepted methods of solving certain hardware/software faults on spacecraft

Testing and Verification

[ ] Develop a test plan to validate the functionality of each function that you will be creating
[ ] Write unit tests using the test plan developed earlier to verify everything is working correctly