Live Demo

Upload an image and see results of consistency-based learning versus various baselines for different vision tasks.


The paper and supplementary material.



Examine the result of consistency-based learning via sample videos.

Visualization Page


Download the trained consistency models and baselines.

Pretrained Models


The code for training and testing consistency models and baselines.

Get started


Learning with Cross-Task Consistency. The upper and lower rows show the results of consistency-based learning and the baseline (individual learning). The consistency-based model yields higher quality (especially at hard-to-predict fine-grained details) and more consistent predictions.

1) What is consistency?
Suppose an object detector detects a chair in a particular region of an image, while a depth estimator returns a flat surface for the same region. This presents an issue -- at least one of them has to be wrong, because they are inconsistent. More concretely, the first prediction domain (objects), and the second prediction domain (depth) are not independent and consequently enforce restraints on each other, often referred to as consistency constraints.

2) Why is it important to consider consistency in learning?
First, desired tasks in machine learning are usually predictions of different aspects of one shared reality (e.g., the scene that underlies an image). Hence inconsistencies among predictions imply contradictions, and therefore, inconsistency is inherently undesirable for downstream uses of the predictions, casting doubt on their validity. Second, consistency-based constraints are informative and can be used in learning to better fit the data or lower the sample complexity required. Also, they may reduce the tendency of neural networks towards learning surface statistics (superficial cues), by enforcing strong and specific constraints rooted in different geometric or physical aspects of one observation.

The video below demonstrates the impact of disregarding consistency in learning as well as the effectiveness of augmenting learning with cross-task consistency constraints. Each window shows surface normals predicted out of various domains, which themselves are predicted out of images (i.e. image→{prediction domain X}→surface normals). The normals in the upper row (learning without consistency) are poor and different/inconsistent with each other for the same underlying image. The lower row shows the same except when learning image→{prediction domain X} was augmented with cross-task consistency constraints. Inferred surface normals look better and more similar to each other regardless of the middle prediction domain, which demonstrates all of the middle domains were made cross-task consistent w.r.t to normals. In the paper, we further extend this concept to many arbitrary domains with arbitrary inference path lengths, using a general and fully computational learning framework. 

3) How can we design a learning system that makes consistent predictions?
This paper proposes a fully computational method that, given an arbitrary dictionary of desired tasks to solve, augments the learning objective in such a way that the predictions are explicitly encouraged to be cross-task consistent. The consistency constraints are completely learned from the data, rather than from human supervision or known analytic relationships (e.g., it is not necessary to encode that surface normals are the 3D derivative of depth, though such derivations could be used if available). Please see the figure below.

Enforcing cross-task consistency. (a) shows the typical multitask setup where predictors between domains are trained independently (or with a shared encoder and separate decoder heads). Both neither explicitly enforce cross-task consistency among predictions nor empirically achieve that. (b) demonstrates the consistency constraint, over three domains. (c) shows how the triangle unit from (b) can be an element of a larger system of domains. Finally, (d) illustrates the generalized case where in such a larger system of domains, consistency can be enforced along any paths of arbitrary length, as long as their beginning and end are the same. When two different paths with the same endpoints are enforced to yield similar results, that implies none of the middle domains introduced conflicting information, as far as the prediction job from the input endpoint to the output endpoint was concerned. This is the general concept behind path-independence or conservativeness. The triangle in (b) is the smallest unit of such paths.

Enforcing cross-task consistency using perceptual losses illustrated with an example.
Top row shows the results of standard training (MSE loss). When the predicted normals after converged training are projected onto other domains various inaccuracies transpire.
Middle row shows the results of training enforcing consistency with the other domains. The results are notably improved, especially in hard to predict fine-grained details.
Bottom row shows the ground truth. (The convention for visualizing normals: redder is righter, greener is more downwards, bluer is more towards the camera).

Consistency Energy

We quantify the amount of inconsistency in a prediction made for a query using an energy-based quantity called Consistency Energy. It is defined to be the sum of squared pairwise inconsistencies. This is also equivalent to the sample variance of the predictions. The consistency energy is an intrinsic quantity of the system, as it requires no ground truth and can be computed without any supervision. As shown below, it strongly correlates with estimation error and sample distributions.

Analysis of consistency energy. Left: The consistency energy shown over the course of training, demonstrating a successful optimization to achieve more consistent results, ending significantly lower than that of independent predictions and multi-task baselines. Middle: Energy is predictive of estimation error, with a Pearson correlation coefficient of 0.67. Right: Out-of-domain samples (blue, red) have an energy distribution significantly higher than in-distribution samples (grey), hence energy can be used as a strong unsupervised method for detecting domain shifts in the data (AUC=0.99).

Interactive Visualizations

In order to convey a more tangible understanding of consistency-based learning versus various baselines, we ran the models frame-by-frame on a youtube video. Visit the visualization page to specify the comparison configuration of your choice and analyze the performance. You can compare the results to several baselines.

Live Demo on User Uploaded Images and Download Trained Models

Download pretrained models.

Pretrained Models

Try the live demo on your query image.

Live Demo


Robust Learning Through Cross-Task Consistency.

Amir Zamir*, Alexander Sax*, Teresa Yeo, Oğuzhan Kar, Nikhil Cheerla, Rohan Suri, Zhangjie Cao, Jitendra Malik, Leonidas Guibas.

CVPR 2020 [oral]
(coming soon)

 title={Robust Learning Through Cross-Task Consistency},
 author={Amir Zamir, Alexander Sax, Teresa Yeo, Oğuzhan Kar, Nikhil Cheerla, Rohan Suri, Zhangjie Cao, Jitendra Malik, Leonidas Guibas},
 journal={arXiv preprint},


Amir Zamir

EPFL, Stanford, UC Berkeley

Alexander (Sasha) Sax

UC Berkeley

Teresa Yeo


Oğuzhan Fatih Kar


Nikhil Cheerla


Rohan Suri


Zhangjie Cao


Jitendra Malik

UC Berkeley

Leonidas Guibas