HUNGRY BIRDS

I use to put some seeds on my balcony during the winter, so I found that watching the birds’ ballet coming to pick seeds has a relaxing effect on me. Let’s see now if it’s the same for Mr MAX78000…

Introduction

The goal of the project is to run an objet classification CNN algorithm (based on the CIFAR10 dataset) in order to detect if there is a bird in the field of view of the camera present on the MAX78000FTHR board. Each time a bird is detected, the image captured and the classification result are stored on the SDCARD installed on the board. The project use only the MAX78000FTHR board, with a SDcard installed on it.

Project details

As a base for this project, I used 3 of the examples furnished in the github MAX78000_SDK (https://github.com/MaximIntegratedAI/MAX78000_SDK) to deal with the camera acquisition (CameraIF example), the file management on the SDcard (SDCard_FTHR example), and the CNN object classification (cifar-10 example). The useful functions of the first 2 examples have been imported after customization in the cifar-10 example.

The relevant files for this project are joined in the zip file :
- Main.c : core of the project, contains now the initialization/call to camera acquisition and CNN inference
- SDHC_access.c/.h : contains the functions to init/close the SDcard, to create a folder at each board power-on, and to store a picture and its corresponding inference statistics when a bird is detected
- Utils.c/.h : contains some low level functions (time and virtual COM port management), and a function to send an extract of the camera image to the attached computer (see the note about the CameraIF example below)
- Makefile : Original file + new dependencies added (SDcard/camera management)
- Grab_image.py : updated python sourcefile to deal with the image download in parallel with the inference result display

After test of each individual example project and merging, here are some specific comments about the code :

Note on the CameraIF example :
On the camera acquisition, the field of view of the acquisition doesn’t change when the resolution changes. The acquisition is then defined to 128x128 pixels (maximum resolution possible in the example to dynamically allocate the image buffer), then only the 32x32central pixels are transferred to the CNN memory buffer. This leads to a x4 zoom respect to an image acquisition done with an initial resolution of 32x32 pixel.

Note on the SDCard_FTHR example :
1) The image transferred to the CNN is limited to 32x32 pixels (parameter defined by the CNN architecture used during training), but the SDcard functions have not this limitation, the original resolution (128x128) is used for the image storage.
2) The storage format of the camera output (from top left to bottom right pixel) differs from a BMP file format (from bottom right to top left pixel), so the stored image is mirrored vertically and horizontally. A function to swap the order of the pixel order from the beginning to the end will correct this.

Is all this working ?

Once the mapping of the camera output to the input of the CNN has been done, It’s time for some “real cases” test : The result of the inference has been collected on 50 successive image acquisitions, and the last corresponding 32x32 image sent to the CNN has been collected with the “Grab_image” python script.
In the following images display the obtained average value of the inference, the 32x32 image sent to the CNN (acquired on the last loop), and a picture made of the “real objet” :

1) CnnTest_clioV6_1/2.png : bad inference result. I’m sure the CNN doesn’t like blue cars
2) CnnTest_chicken.png : poor inference result (34%), maybe the subject doesn’t correspond with the species the most representative of what we see in real life.

After the poor success of the “real life” test and the absence of the birds on my balcony (they made me pay the fact that I stopped giving them seed at the end of the winter), I collected pictures on my preferred internet search engine, and pointed the MAX78000_FTHR on my PC screen :

1) CnnTest_bird_sparrow_1…4.png : good inference result with different bird position / background color with sparrow pictures
2) CnnTest_bird_blue_tit.png : good inference, even with a blue bird. It may not the color of the clio_V6 test that disturbed the CNN…
3) CnnTest_street_car .png: good inference, the CNN definitely prefers SUVs…

Now, the CNN seems to work correctly (Thanks MAXIM’s guys for the pre-trained network J ), the complete code (refer to zip file for relevant source files) has been updated in the final configuration to do the following steps :
- Image acquisition
- Central portion sent to host PC
- CNN inference
- Inference result sent to host PC
- Image/inference result storage if a bird is detected
- Loop to image acquisition

A test similar to the previous one has been done, the MAX780000FTHR camera has been pointed to a screen with … bird pictures of course. The result is that when a “bird” class is detected, it triggers a storage of the picture / inference result on the SDcard (see “File_1.bmp” and “File_1.txt” read from the SDcard after removing it from the board). I noted 2 things to improve : image needs to be reversed before storage, and few first columns of pixels seem not to be in the correct place.The tickets for these bug corrections are already submitted to the corresponding service.

Now, what’s next?

After these first results, here are the next steps to be done :

1) Improve camera effective resolution: As the CNN inference is very fast (<2ms on 32x32 pixels object detection) it makes sense to keep the low resolution image processing to do the inference, then to shift to the full size configuration (VGA) to stream a new image directly from the camera to the SDcard storage. The detected bird should not move that much during these ~2 ms needed to launch a new picture. Doing this would make easier the creation of a bird picture database, with the final goal to have data to train a new network for species classification.

2) Make the system autonomous :The original goal was to supply the board with a couple of battery/solar panel, and to be connected with it by wireless communication. The energy efficiency of the chip would help to make it realistic. In a fisrt step, doing the project completely autonomous could be done simply by connecting the MAX780000FTHR to a powerbank, and get the result and the end of the acquisition session.

3) Test other networksThis is probably the first domain to explore. I’m totally new to this domain, but the MAX78000FTHR is definitely a good platform to start testing neural networks in real case. I didn’t test the other network architectures (streaming input, RISC-V processing help) nor the other interfaces, but the user-friendliness of the example I used revealed a very promising board.

Conclusion

Actually, there is no definitive conclusion. The project is now working with its base functionality, and there are many possibilities of improvement. The journey will now continue on two main axes, the training of new neural network, which is the core usage of the MAX780000FTHR board, and the autonomy, which will imply to work on the power modes of the cortex-M4 and the RISC-V cores.More updates to come in the future…

edit 2021-07-12 : the board support shown on the project picture (in STL format) is added in the CAD file page