2

I have an ESP32-CAM AI Thinker board with an OV2640 image sensor and an ST7789 240x240 TFT LCD (SPI, but without a CS pin). I am trying to get a relatively good frame rate (30+ fps) on the TFT. I am simply capturing images and pushing the images to the screen.

I have had no trouble getting both the camera and the screen to work with the board, both in separate projects and together. This project however requires as close to real time view on the screen of what the camera is capturing. There are no other peripherals attached and there is no processing other than grab frames and push them to the screen. This is not over WIFI, nor will anything be saved to SD or any other storage.

Comments in the code provide the connection information for how these are wired together.

I am using:

  • Arduino development environment
  • Bodmer's TFT-eSPI library
  • TJpg-Decoder library (when trying with jpeg format)

I have tried (fps, ms):

  • Capturing RGB565 images only (12.71 fps, 78.68 ms)
  • Capturing RGB565 images and just moving them to the screen (12.52 fps, 79.87 ms)
  • Capturing JPG images only (50.97 fps, 19.62 ms)
  • Capturing JPG images and decoding them (22.35 fps, 44.74 ms)
  • Capturing JPG images, decoding them and moving them to the screen (13.34 fps, 74.96 ms)

I am actually surprised that the RGB capture is so slow. Pushing it to the screen provides no performance impact.

However capturing JPG has the frame rate I want, but the decode and draw take a bunch of time, making then effectively equal. Taking the numbers naively and assuming that everything is sequential means that the decode takes ~25 ms and the write to the screen takes ~30 ms. This makes me think that with some decode optimization, DMA transfer over the SPI bus, use of the second core and maybe a few other tricks the goal of 30 fps is not unreasonable.

With a 40MHz SPI bus, each image could be transferred in approximately 240x240x16/40000000 = 23.04 ms => which would allow ~43 fps. I know that there is some overhead, but I think that the SPI bus isn't going to prevent 30 fps and the numbers above seem to support that. The bus speed can be increased to 60MHz I believe, so I don't think that the bus will be an issue. Finally, I am treating this a code problem and will not consider changes to the hardware.

My Questions:

  1. What can I do to speed this up? I do not care what the capture format is.

  2. Are there other suggestions for speed ups other than the ones given above?

  3. Does anyone have any idea about where the most gain could be had?

  4. Is my current code inefficient?

  5. Are the libraries I am using inefficient?

  6. Are there faster capture settings for the camera while still getting quality full color 240x240 images?

    //************************************************//
    //Pinouts
    //************************************************//
    //Camera
    //  PWDN_GPIO_NUM     32
    //  RESET_GPIO_NUM    -1
    //  XCLK_GPIO_NUM      0
    //  SIOD_GPIO_NUM     26
    //  SIOC_GPIO_NUM     27
    //  Y9_GPIO_NUM       35
    //  Y8_GPIO_NUM       34
    //  Y7_GPIO_NUM       39
    //  Y6_GPIO_NUM       36
    //  Y5_GPIO_NUM       21
    //  Y4_GPIO_NUM       19
    //  Y3_GPIO_NUM       18
    //  Y2_GPIO_NUM        5
    //  VSYNC_GPIO_NUM    25
    //  HREF_GPIO_NUM     23
    //  PCLK_GPIO_NUM     22
    //************************************************//
    //TFT
    //  TFT_MOSI           1
    //  TFT_SCLK          12
    //  TFT_CS            -1
    //  TFT_DC            13
    //  TFT_RST            3
    //************************************************//
    //SD Card (Not being used)
    //  CLK GPIO          14
    //  CMD GPIO          15
    //  DATA0 GPIO         2
    //  DATA1 GPIO         4 *Not used in 1-bit mode
    //  DATA2 GPIO        12 *Not used in 1-bit mode
    //  DATA3 GPIO        13 *Not used in 1-bit mode
    //************************************************//
    //Free Pins
    //  Flashlight/GPIO    4
    //  GPIO/U2RXD        16
    //  Red LED           33 *Not an available GPIO. Just an indicator LED
    //************************************************//
    
    #define USE_TFT
    #define USE_CAM
    #define USE_JPG
    #define MYDEBUG
    
    
    
    //************************************************//
    //Separate includes
    //************************************************//
    //Camera includes
    #ifdef USE_CAM
      #include "esp_camera.h"
      #include "camera_index.h"
      // #include "jpg_rot.h"
    #endif
    
    //LCD includes
    #ifdef USE_TFT
      #include <SPI.h>
      #ifdef USE_JPG
        #include <TJpg_Decoder.h>
        #include <JPEGDEC.h>
      #endif
      #include <TFT_eSPI.h>          // Hardware-specific library
    #endif
    
    //General includes
    #include "Arduino.h"
    #include "soc/soc.h"           // Disable brownout problems
    #include "soc/rtc_cntl_reg.h"  // Disable brownout problems
    #include "driver/rtc_io.h"     // to hold led pin state constant
    
    //************************************************//
    //Separate feature setups
    //************************************************//
    //TFT setup
    #ifdef USE_TFT
      #define WHITE  0x00FFFFFF
      #define BLACK  0x00000000
      #define RED    0x000000FF
      #define GREEN  0x0000FF00
      #define BLUE   0x00FF0000
      #define YELLOW (RED | GREEN)
      #define CYAN   (BLUE | GREEN)
      #define PURPLE (BLUE | RED)
    
      TFT_eSPI tft = TFT_eSPI();
    
    #endif
    
    //Camera setup
    #ifdef USE_CAM
      camera_fb_t * fb = NULL;
      int i = 0;
      int n = 100; //n is number of frames to average over for fps calculation
      float fps;
      static esp_err_t cam_err;
    
      // CAMERA_MODEL_AI_THINKER
      #define PWDN_GPIO_NUM     32
      #define RESET_GPIO_NUM    -1
      #define XCLK_GPIO_NUM      0
      #define SIOD_GPIO_NUM     26
      #define SIOC_GPIO_NUM     27
      #define Y9_GPIO_NUM       35
      #define Y8_GPIO_NUM       34
      #define Y7_GPIO_NUM       39
      #define Y6_GPIO_NUM       36
      #define Y5_GPIO_NUM       21
      #define Y4_GPIO_NUM       19
      #define Y3_GPIO_NUM       18
      #define Y2_GPIO_NUM        5
      #define VSYNC_GPIO_NUM    25
      #define HREF_GPIO_NUM     23
      #define PCLK_GPIO_NUM     22
    
      #ifdef USE_JPG
        #ifdef USE_TFT
          bool tft_output(int16_t x, int16_t y, uint16_t w, uint16_t h, uint16_t* bitmap){
            if ( y >= tft.height() ) return 0;
            tft.pushImage(x, y, w, h, bitmap);
            return 1;
          }
        #endif
        bool no_tft_output(int16_t x, int16_t y, uint16_t w, uint16_t h, uint16_t* bitmap){
          if ( y >= tft.height() ) return 0;
          return 1;
        }
      #endif
    
      bool setup_camera() { //ture: set up for jpg capture, false set up for rgb565 capture
        //Configure the camera
        camera_config_t config;
        config.ledc_channel = LEDC_CHANNEL_0;
        config.ledc_timer = LEDC_TIMER_0;
        config.pin_d0 = Y2_GPIO_NUM;
        config.pin_d1 = Y3_GPIO_NUM;
        config.pin_d2 = Y4_GPIO_NUM;
        config.pin_d3 = Y5_GPIO_NUM;
        config.pin_d4 = Y6_GPIO_NUM;
        config.pin_d5 = Y7_GPIO_NUM;
        config.pin_d6 = Y8_GPIO_NUM;
        config.pin_d7 = Y9_GPIO_NUM;
        config.pin_xclk = XCLK_GPIO_NUM;
        config.pin_pclk = PCLK_GPIO_NUM;
        config.pin_vsync = VSYNC_GPIO_NUM;
        config.pin_href = HREF_GPIO_NUM;
        config.pin_sscb_sda = SIOD_GPIO_NUM;
        config.pin_sscb_scl = SIOC_GPIO_NUM;
        config.pin_pwdn = PWDN_GPIO_NUM;
        config.pin_reset = RESET_GPIO_NUM;
        config.xclk_freq_hz = 20000000;
        #ifdef USE_JPG
          config.pixel_format = PIXFORMAT_JPEG;
        #else
          config.pixel_format = PIXFORMAT_RGB565;
        #endif
        //init with high specs to pre-allocate larger buffers
        if (psramFound()) {
          //debugln("psram found");
          config.frame_size = FRAMESIZE_240X240;
          config.jpeg_quality = 10;
          config.fb_count = 2;
        } else {
          //debugln("psram not found");
          config.frame_size = FRAMESIZE_240X240;
          config.jpeg_quality = 12;
          config.fb_count = 1;
        }
    
        // camera init
        cam_err = esp_camera_init(&config);
        if (cam_err != ESP_OK) {
          //debugf("Camera init failed with error 0x%x", cam_err);
          return false;
        }
    
        sensor_t * s = esp_camera_sensor_get();
        s->set_framesize(s, FRAMESIZE_240X240);
        s->set_vflip(s, 1);
        debugln("Camera initialized");
    
        #ifdef USE_JPG
          #ifdef USE_TFT
            TJpgDec.setJpgScale(1);
            TJpgDec.setSwapBytes(true);
            TJpgDec.setCallback(tft_output);
          #endif
        #endif
        return true;
      }
    
    
        #include <stddef.h>
        #include "esp_heap_caps.h"
    //#include "esp_jpg_decode.h"  //espressif's decoder
    //#include <TJpg_Decoder.h> //bodmer's jpg decoder
        #include "esp_system.h"
        #if ESP_IDF_VERSION_MAJOR >= 4 // IDF 4+
          #if CONFIG_IDF_TARGET_ESP32 // ESP32/PICO-D4
            #include "esp32/spiram.h"
          #else 
            #error Target CONFIG_IDF_TARGET is not supported
          #endif
        #else // ESP32 Before IDF 4.0
          #include "esp_spiram.h"
        #endif
    
    
    #endif
    
    
    #ifdef USE_TFT
    bool setup_lcd() {
    
      tft.begin();
      tft.setRotation(0);  // 0 & 2 Portrait. 1 & 3 landscape
      tft.fillScreen(TFT_BLUE);
      tft.setCursor(0, 0);
      // Set the font colour to be white with a black background
      tft.setTextColor(TFT_WHITE, TFT_BLACK);
      tft.setTextSize(2);
      debugln("TFT works");
      // delay(3000);
      debugln("LCD initialized");
      return true;
    }
    #endif
    
    // debug functions to send debug info to screen or serial
    #ifdef MYDEBUG
      #ifdef USE_TFT
        template <class T>
        void debug (T msg) {
          tft.print(msg);
        }
    
        template <class T, class T2>
        void debug (T msg, T2 fmt) {
          tft.print(msg, fmt);
        }
    
        template <class T>
        void debugln (T msg) {
          tft.println(msg);
        }
    
        template <class T1, class T2>
        void debugf (T1 msg, T2 var) {
          tft.printf(msg, var);
        }
    
        template <class T1, class T2, class T3>
        void debugf (T1 msg, T2 var, T3 var2) {
          tft.printf(msg, var, var2);
        }
    
        template <class T1, class T2, class T3, class T4>
        void debugf (T1 msg, T2 var, T3 var2, T4 var3) {
          tft.printf(msg, var, var2, var3);
        }
    
        void debughome () {
          tft.setCursor(0,0);
        }
      #else
    
        template <class T>
        void debug (T msg) {
          Serial.print(msg);
        }
    
        template <class T, class T2>
        void debug (T msg, T2 fmt) {
          Serial.print(msg, fmt);
        }
    
        template <class T>
        void debugln (T msg) {
          Serial.println(msg);
        }
    
        template <class T1, class T2>
        void debugf (T1 msg, T2 var) {
          Serial.printf(msg, var);
        }
    
        template <class T1, class T2, class T3>
        void debugf (T1 msg, T2 var, T3 var2) {
          Serial.printf(msg, var, var2);
        }
    
        template <class T1, class T2, class T3, class T4>
        void debugf (T1 msg, T2 var, T3 var2, T4 var3) {
          Serial.printf(msg, var, var2, var3);
        }
    
        void debughome () {
        __asm__ __volatile__ ("nop\n\t"); //no-operation inline assembly
        }
      #endif
    
    #else
      template <class T>
      debug (T msg) {
        __asm__ __volatile__ ("nop\n\t"); //no-operation inline assembly
      }
    
      template <class T>
      debugln (T msg) {
        __asm__ __volatile__ ("nop\n\t"); //no-operation inline assembly
      }
    
      template <class T1, class T2>
      debugf (T1 msg, T2 var) {
        __asm__ __volatile__ ("nop\n\t"); //no-operation inline assembly
      }
    
        template <class T1, class T2, class T3>
        void debugf (T1 msg, T2 var, T3 var2) {
        __asm__ __volatile__ ("nop\n\t"); //no-operation inline assembly
        }
    
        void debughome () {
        __asm__ __volatile__ ("nop\n\t"); //no-operation inline assembly
        }
    #endif
    
    void setup() {
      #ifndef USE_TFT
        Serial.begin(115200);
      #endif
      debugln("-----------------------------------");
      debugln("starting");
    
      pinMode(4, OUTPUT);// initialize io4 as an output for LED flash.
      digitalWrite(4, LOW); // flash off/
      rtc_gpio_hold_en(GPIO_NUM_4); // Hold the state of the pin constant
    
      //Start the LCD
      #ifdef USE_TFT
         if (not setup_lcd()){
           return;
         }
      #endif
    
      // Configure and start the camera
      #ifdef USE_CAM
        if (not setup_camera()){
          return;
        }
      #endif
      debugln("Done configuring peripherals.");
    
      debugln("\r\nInitialisation done.");
    
      delay(2000);
    }
    
    float captureJPG(){
      long start = millis();
      for (int i = 0; i < n; i++){
        fb = esp_camera_fb_get();
        esp_camera_fb_return(fb);
      }
      long end = millis();
      return (float)n * 1000.0 /( (float)end - (float)start);
    }
    
    float captureRGB(){
      long start = millis();
      for (int i = 0; i < n; i++){
        fb = esp_camera_fb_get();
        esp_camera_fb_return(fb);
      }
      long end = millis();
      return (float)n * 1000.0 /( (float)end - (float)start);
    }
    
    #ifdef USE_JPG
      float captureDrawJPG(){
        long start = millis();
        for (int i = 0; i < n; i++){
          fb = esp_camera_fb_get();
          TJpgDec.drawJpg(0, 0, (const uint8_t*)fb->buf, fb->len);
          esp_camera_fb_return(fb);
        }
        long end = millis();
        return (float)n * 1000.0 /( (float)end - (float)start);
      }
    #endif
    
    #ifdef USE_JPG
      float captureDecodeJPG(){
        TJpgDec.setCallback(no_tft_output);
        long start = millis();
        for (int i = 0; i < n; i++){
          fb = esp_camera_fb_get();
          TJpgDec.drawJpg(0, 0, (const uint8_t*)fb->buf, fb->len);
          esp_camera_fb_return(fb);
        }
        long end = millis();
        TJpgDec.setCallback(tft_output);
        return (float)n * 1000.0 /( (float)end - (float)start);
      }
    #endif
    
    float captureDrawRGB(){
      long start = millis();
      for (int i = 0; i < n; i++){
        fb = esp_camera_fb_get();
        tft.pushImage(0, 0, 240, 240, (uint16_t*)fb->buf);
        esp_camera_fb_return(fb);
      }
      long end = millis();
      return (float)n * 1000.0 /( (float)end - (float)start);
    }
    
    void loop(){
      #ifdef USE_JPG
        float jpg_fps = captureJPG();
        float jpg_dec_fps = captureDecodeJPG();
        #ifdef USE_TFT
          float jpgDraw_fps = captureDrawJPG();
          tft.fillScreen(TFT_BLUE);
        #endif
        debughome();
        debugf("Jpg capture fps = %.2f", jpg_fps);
        debugf("Jpg capture and decode fps = %.2f", jpg_dec_fps);
        #ifdef USE_TFT
          debugf("Jpg capture and draw fps = %.2f", jpgDraw_fps);
        #endif
      #else
        float rgb_fps = captureRGB();
        #ifdef USE_TFT
          float rgbDraw_fps = captureDrawRGB();
          tft.fillScreen(TFT_BLUE);
        #endif
        debughome();
        debugf("RGB capture fps = %.2f", rgb_fps);
        #ifdef USE_TFT
          debugf("RGB capture and draw fps = %.2f", rgbDraw_fps);
        #endif
      #endif
      delay(100000);
    }
    
tinkertime
  • 41
  • 1
  • 6
  • 1
    _"With a 40MHz SPI bus, each image could be transferred in approximately 24024016/40000000 = 23.04 ms"_ - how much time does it actually take? – Bruce Abbott Jun 15 '22 at 22:07
  • Have you looked at the SPI bus with an oscilloscope and a protocol decoder? Where is the time spent? – Kuba hasn't forgotten Monica Jun 16 '22 at 04:03
  • Thank you for your comments. I have no appropriate equipment to measured it. I am getting <30% of the potential speed of the bus, but the bus can be set to 60MHz if more is needed. There may be overhead, but it can't be taking more than 2/3 of the bandwidth. If so, would anyone use the SPI bus ever? Also, when I capture RGB images and move them to the screen using SPI, there's no performance hit. ~12.5 fps regardless of draw or not (use the SPI bus or not). SPI bus isn't the issue for RGB. I will do a jpg decode but not display to test decode impact. – tinkertime Jun 16 '22 at 06:04
  • I was really looking for information about whether a faster library is available, I am doing something poorly in my code that chews up time, a configuration issue, or something else that can be fixed in code. – tinkertime Jun 16 '22 at 06:12
  • I would highly recommend picking up a cheap logic analyzer at your online marketplace of choice. They can be had for $10 and are surprisingly capable for the cost. I don't believe they would directly be suitable for reading 40MHz SPI, but they will likely catch part of the transaction and show you how well (or more likely, how poorly) you're utilizing the bus. – Jarrett Jun 16 '22 at 17:50
  • Thanks again, but your comment is basically echoing the above ones. Before going the route of getting new equipment, I would like to explore coding solutions. That may include code to test the throughput of the SPI bus. Also at this point, why does everyone start off by pointing at the SPI bus? – tinkertime Jun 16 '22 at 20:25

1 Answers1

2

By switching to Larry Banks JPEGDEC library, I have achieved a total throughput of 33.27 fps for capturing, decoding and displaying the images.

This library employs an optimized jpg decoder as well as DMA transfers of bundled MCUs, which reduces overhead for the SPI bus. For details about how he does this read his blog post.

tinkertime
  • 41
  • 1
  • 6