• Input:

    • One counter-intuitive fact is that almost all the mainstream SNN frameworks’ input shape is not a series of 1-D spike signals. Actually, the input shape is pretty much like other ANNs, but with an additional T time dimension. Here is the common input shape: T x N x C x H x W, where T means time, N means batch size, C means channel number, and H, W stands for the input image size. Except for the first two dimensions (T, N), the rest of the dimensions are quite flexible and can be modified as needed, and we can simplify the input shape as T x N x X.
    • For the Loihi SNN hardware platform, the input size is a bit different, but it’s a simple permutation: N x X x T.
Read more »

Unsupervised Domain Adaption Algorithms

  • GRL: Gradient Reversal Layer。 目标是让两个domain的distribution在feature extractor眼中无法区分(即match两个domain使其分布趋同)。
  • MMD:Minimize the Maximum Mean Discrepancy between the target and source domain。最小化两个域之间的最大平均差异。作者提出了一个metric来衡量域间差异,通过加loss来抑制这个差异,最终达到让网络最终层输出与域无关的稳定feature。
  • AFN:比较玄学。他们说之所以在target domain上表现不好是因为目标向量的norm相比源域的更小。所以它们就逐渐提高深度embedding的L2 norms来解决这个问题。
Read more »


  • By method: Top-Down; Bottom-Up
  • By time dimension: Frame; Time Sequence
  • By input type: Monocular; Multi-View
  • By human number: Single; Multiple
  • By output type: Skeleton; Mesh (SMPL, SCAPE, DensePose)
Read more »

ImmFusion: Robust mmWave-RGB Fusion for 3D Human Body Reconstruction in All Weather Conditions

  • 2022, Arxiv
  • Question: How to merge mmWave radar with RGB frames to do 3D human mesh reconstruction?
  • Spec: Single person, 3D mesh, RGB + mmWave Radar.
  • Features: Robust in extreme weather/conditions like rain, smoke, low light, and occlusion.
Read more »

DGCN: Dynamic Graph Convolutional Network for Efficient Multi-Person Pose Estimation

image-20221219151203243 image-20221219154249782


  • Multi-person.
  • Image-based. Graph is just used in their DGCM module.
  • Bottom-Up.
Read more »

Paper's full name: A 256x256 6.3pJ/pixel-event Query-driven Dynamic Vision Sensor with Energy-conserving Row-parallel Event Scanning, Link


This paper proposed a novel query-driven DVS (qDVS) hardware. This new hardware combines the advantages of APS and DVS, following a fixed scanning rate to inquire all pixels whether are good to fire an event. The output of qDVS is event frames. Pixels here are responsible for fewer functions, they only need to tell whether they are good to shoot and the polarity, as their address is fixed on the generated event frames. As each pixel has fewer functions, they are able to be made smaller, which results in an overall higher pixel density. Also, since the output of qDVS is already framed, machine learning researchers don't need to do accumulation themselves and hence optimized the processing pipeline.

Read more »


  • SMPL主要含有两组参数,一组是人物的体态信息β,一组是人物的姿态信息θ。
  • SMPL本身是“相对的”,其只包含人物本身的信息,而不包含任何与环境、相机视角、位置等信息。另外,其mesh点记录的值是相对于模板人类模型标准值的。
  • SMPL不包含手、脸和衣服,但后续的其他文章逐渐完善了相应参数。
Read more »


Read more »