What can a first person camera tell us about me?
 
    Unlike a third person camera, e.g., surveillance cameras, the first person camera tells us about me (camera wearer). It continually captures my unscripted interactions with objects, people, and scenes from my perspectives, directly reflecting my personal and relational preferences. Further, in-situ visual measurements of these interactions provide a strong cue to infer momentary sensorimotor behaviors and intent for long term future activities. In this tutorial, we argue that a first person camera is an ideal sensor to capture and model our visual perceptual behaviors.

We address a fundamental question to learn human perceptual behaviors from first person cameras: "what can a first person camera tell us about the wearer?" We will answer this question by characterizing the interactive relationship between the camera wearer and visual scene:
  • Personal/social attention: how does my first person video encode where I am looking? what can first person videos tell us about how we interact with others?
  • Human kinematics (object/pose/action): how does my first person image form in relation with body pose and interacting objects? what are visual semantics and motion induced by my activities?
  • Visual sensorimotor behaviors: what can first person visual sensation tell us about my active motion? what can a first person video tell us about how we control physical force/torque? what can a first person video tell us about my future?
The tutorial will extensively cover fundamentals of first person visual signals and their examples. We will also discuss how the learned perceptual behaviors can in turn apply to design artificial intelligence for robots that will closely co-habit with humans.  
 
   
Invited Speakers
 
   

James Rehg
GATECH

Gregory Rogez
INRIA Rhones-Alpes

Kristen Grauman
UT Austin
 
   
Program
 
   
08:30-08:45 Introduction to first person vision (slide)
  • Why is a first person camera an ideal sensor to measure human behaviors?
  • Why is a first person camera special to me?
  • What can a first person camera tell us about?
08:45-09:25 Social attention: What can first person cameras tell us about our social interactions? (slide)
  • Joint attention: what are we looking?
  • Social formation: what does a geometric social formation afford to us?
  • Group dynamics: how will we move?
09:25-09:55 Personal attention: What can a first person cameras tell us about my personal attention? (slide)
  • Speaker: James Regh
09:55-10:25 Human kinematics I (pose): what can a first person camera tell us about the wearer's pose and interacting objects? (slide)
  • Speaker: Gregory Rogez
  • Title: Understanding everyday hands in action from a wearable RGB-D sensor
  • Abstract: I will discuss the problem of analysing functional manipulations of hand-held objects from a chest-mounted camera. For this problem specification, I will show that RGB-D sensors are particularly informative for extracting near-field interactions of the camera wearer with his/her environment. Despite the recent advances in full-body pose estimation using Kinect-like sensors, reliable monocular hand pose estimation in RGB-D images is still an unsolved problem, especially when the hands interact with objects. The problem is exacerbated when considering a wearable sensor and a first-person camera viewpoint: the occlusions inherent to the particular camera view make the problem even more difficult. I will present an efficient pipeline which 1) generates synthetic exemplars for training using a virtual chest-mounted camera, 2) exploits the depth features for a fast detection and a coarse pose estimation of the hands and 3) performs fine-grained grasp classification exploiting depth and RGB data, and making use of state-of-the-art deep features. I will also provide an insightful analysis of the performance of this algorithm on a new dataset of 12000 RGB-D images covering 71 everyday grasps in natural interactions, illustrating the role of segmentation, object context, and 3D-understanding in functional grasp analysis.
10:25-11:00 Coffee break
11:00-11:40 Human kinematics II (action): What can first person cameras tell us about my actions? (slide)
  • Activity recognition: what am I doing?
  • First+Second person activity recognition: What are you doing to me?
  • Functionality: What can I do? Where can I do it?
11:40-12:10 Visual sensorimotor behaviors I: what can a first person camera tell us about my active motion? (slide)
  • Speaker: Kristen Grauman
12:10-12:20 Visual sensorimotor behaviors II: what can a first person video tell us about how we control physical force/torque? what can a first person video tell us about my future? (slide)
  • What are the key physical factors to drive motion?
  • First person feedback control system
  • Egocentric future prediction
12:20-12:30 Conclusion (slide)
  • Summary
  • First person broader impact
 
   
Organizing Members
 
   

Hyun Soo Park
UPenn/U. of Minnesota

Kris Kitani
CMU

Jianbo Shi
UPenn