Library API reference¶
The complete public API, generated from the source docstrings. Everything here is
importable from the top level (from deeperfly import ...). For task-oriented
examples see the library guide; for the array and
coordinate conventions these functions share, see
Conventions & glossary.
Configuration¶
Config ¶
A loaded, validated deeperfly run configuration.
Construct via :meth:from_toml (from a TOML file) or :meth:from_dict (a
parsed mapping). The parsed mapping is :attr:data; :attr:text is the original
TOML text when read from a file (None for a dict), used to snapshot the config
byte-for-byte.
Source code in src/deeperfly/config.py
213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 | |
visualization
property
¶
The raw [visualization] table (consumed by :attr:videos).
from_toml
classmethod
¶
Load a config from a TOML file (preserving its text for snapshots).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str | Path
|
Path to a config TOML file. |
required |
Returns:
| Type | Description |
|---|---|
Config
|
The loaded, validated config. |
Source code in src/deeperfly/config.py
from_dict
classmethod
¶
Wrap an already-parsed mapping (library use, tests).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
dict
|
A parsed config mapping. |
required |
Returns:
| Type | Description |
|---|---|
Config
|
The validated config wrapping |
Source code in src/deeperfly/config.py
default
classmethod
¶
read_for_run
classmethod
¶
Pick the config for one run: -c wins, then the outdir snapshot.
An explicit -c always drives the run (and refreshes the snapshot --
see :meth:save_snapshot). Without -c, the snapshot a previous run
left in <outdir>/config.toml is reused -- so both "pass a new -c"
and "edit the snapshot and re-run" work; either way the per-stage
fingerprints (:mod:deeperfly.pipeline.fingerprint) recompute exactly
the stages whose parameters changed. With neither, the packaged default
is used.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cli_config
|
str | None
|
The |
required |
outdir
|
Path
|
The run's output directory, which may already hold a |
required |
Returns:
| Type | Description |
|---|---|
Config
|
The config that drives this run. |
Source code in src/deeperfly/config.py
stage_flags ¶
Which stages are enabled, from the [pipeline].do_<stage> booleans.
Returns:
| Type | Description |
|---|---|
dict of str to bool
|
|
Source code in src/deeperfly/config.py
camera_group ¶
The configured camera rig ([cameras.*]).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
image_sizes
|
Optional |
None
|
Returns:
| Type | Description |
|---|---|
CameraGroup
|
The configured rig. |
Source code in src/deeperfly/config.py
skeleton ¶
The configured skeleton ([skeleton]), or the default fly skeleton.
frame_transforms ¶
Per-camera frame preprocessing (the [cameras.<name>] preprocess lists).
detection_plan ¶
The 2D detection plan ([[sources]] + [[pose2d.preprocessors]]/[[pose2d.models]]/[[pose2d.pathways]]).
Returns:
| Type | Description |
|---|---|
DetectionPlan
|
The parsed, validated plan mapping footage sources through
preprocessors and models into the skeleton (see
:class: |
Source code in src/deeperfly/config.py
camera_table ¶
Split [cameras] into the shared defaults and the per-camera specs.
Returns:
| Type | Description |
|---|---|
defaults, cameras : dict
|
The |
Source code in src/deeperfly/config.py
source_patterns ¶
Map each footage source to its glob ([[sources]] name -> filename).
Read directly from the [[sources]] table (without building the whole
detection plan) so recording discovery stays cheap. A source with no
filename key uses its own name as the glob pattern.
Returns:
| Type | Description |
|---|---|
dict of str to str
|
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If a source entry has no string |
Source code in src/deeperfly/config.py
snapshot_text ¶
The exact TOML text to snapshot.
Returns:
| Type | Description |
|---|---|
str
|
The original file text. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If this config was built from a dict (no source text to snapshot). |
Source code in src/deeperfly/config.py
save_snapshot ¶
Snapshot the run config into <outdir>/config.toml for reproducibility.
A no-op rewrite when the config already came from there (see
:meth:read_for_run); otherwise it records the -c/default config that
drives this run, so a later run without -c reuses the very same config.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
outdir
|
Path
|
The run's output directory; the snapshot is written to
|
required |
Source code in src/deeperfly/config.py
Cameras¶
Camera
dataclass
¶
A single camera: extrinsics, intrinsics, and lens distortion.
intr is always the 4-vector [fx, fy, cx, cy] (so every camera in a
group has the same intrinsic layout); dist holds OpenCV-ordered
distortion coefficients (possibly empty).
Source code in src/deeperfly/cameras.py
265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 | |
from_spec
classmethod
¶
from_spec(
spec: dict,
name: str | None = None,
image_size: tuple[int, int] | None = None,
transform: "FrameTransform | None" = None,
) -> Camera
Build a camera from a config dict (see :func:resolve_extrinsics).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
spec
|
dict
|
Camera spec dict (extrinsics + intrinsics keys; intrinsics in raw-footage pixels). |
required |
name
|
str | None
|
Optional camera name stored on the result. |
None
|
image_size
|
tuple[int, int] | None
|
Optional raw-footage |
None
|
transform
|
'FrameTransform | None'
|
The camera's preprocess transform; when given and non-identity, the
spec's raw-frame intrinsics are mapped through it (see
:func: |
None
|
Returns:
| Type | Description |
|---|---|
Camera
|
The constructed camera. |
Source code in src/deeperfly/cameras.py
project ¶
Project world points to this camera's image plane.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pts3d
|
Float[ndarray, '*pts 3']
|
World points of shape |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
Image points of shape |
Source code in src/deeperfly/cameras.py
backproject_ray ¶
backproject_ray(
pixel: Float[ndarray, "2"],
) -> tuple[Float[np.ndarray, "3"], Float[np.ndarray, "3"]]
The world-frame viewing ray of an image pixel through this camera.
Inverse of :meth:project: returns (origin, direction) such that
every world point origin + s * direction projects back onto
pixel. origin is the camera center. See
:func:deeperfly.geometry.backproject_ray_one.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pixel
|
Float[ndarray, '2']
|
Image point of shape |
required |
Returns:
| Type | Description |
|---|---|
origin, direction : np.ndarray
|
The camera center and the (unnormalized) world-frame ray direction,
each of shape |
Source code in src/deeperfly/cameras.py
CameraGroup ¶
An ordered collection of named :class:Camera objects.
Source code in src/deeperfly/cameras.py
383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 | |
dists
property
¶
Per-camera distortion, zero-padded to the group-wide max length.
from_config
classmethod
¶
from_config(
config: "Config",
image_sizes: dict[str, tuple[int, int]] | None = None,
) -> CameraGroup
Build a group from a config.
Reads [cameras.defaults] and [cameras.<name>]; per-camera keys
override the defaults. A camera here is a geometric view: its
intrinsics describe its source's raw footage frame, the frame a pathway
maps its detections back into (see
:mod:deeperfly.pose2d.pathways). Detector-input geometry (mirror,
crop, resize) lives in the pathways, not on the view.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
'Config'
|
A :class: |
required |
image_sizes
|
dict[str, tuple[int, int]] | None
|
Maps a view name to its source's raw footage |
None
|
Returns:
| Type | Description |
|---|---|
CameraGroup
|
The configured rig. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the config defines no cameras. |
Source code in src/deeperfly/cameras.py
from_arrays
classmethod
¶
from_arrays(
names: list[str],
rvecs: Float[ndarray, "V 3"],
tvecs: Float[ndarray, "V 3"],
intrs: Float[ndarray, "V 4"],
dists: Float[ndarray, "V K"],
) -> CameraGroup
Build a group from stacked per-camera arrays (e.g. BA output).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
names
|
list[str]
|
Camera names, in order, labelling the leading axis of the arrays. |
required |
rvecs
|
Float[ndarray, 'V 3']
|
Stacked extrinsics of shape |
required |
tvecs
|
Float[ndarray, 'V 3']
|
Stacked extrinsics of shape |
required |
intrs
|
Float[ndarray, 'V 4']
|
Stacked packed intrinsics of shape |
required |
dists
|
Float[ndarray, 'V K']
|
Stacked distortion coefficients of shape |
required |
Returns:
| Type | Description |
|---|---|
CameraGroup
|
The rig assembled from the arrays. |
Source code in src/deeperfly/cameras.py
project ¶
Project world points through every camera.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pts3d
|
Float[ndarray, '*pts 3']
|
World points of shape |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
Image points of shape |
Source code in src/deeperfly/cameras.py
triangulate ¶
triangulate(
pts2d: Float[ndarray, "V *pts 2"],
weights: Float[ndarray, "V *pts"] | None = None,
) -> Float[np.ndarray, "*pts 3"]
Triangulate 3D points from 2D observations and this group's cameras.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pts2d
|
Float[ndarray, 'V *pts 2']
|
2D observations of shape |
required |
weights
|
Float[ndarray, 'V *pts'] | None
|
Optional per-(view, point) weights of shape |
None
|
Returns:
| Type | Description |
|---|---|
ndarray
|
Triangulated points of shape |
Source code in src/deeperfly/cameras.py
Skeleton¶
Skeleton
dataclass
¶
An ordered set of tracked points with limb/bone structure and visibility.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Identifier for the skeleton (e.g. |
point_names |
tuple[str, ...]
|
Human-readable name per tracked point, in order (length |
limb_names, limb_id, bones |
Limb structure derived from the config's |
|
palette |
dict[str, str]
|
Mapping |
Which view sees which point lives in the detection plan (the pathways' |
|
|
``(channel, view, point)`` mappings), not here |
an unobserved ``(view, point)``
|
|
is simply ``NaN`` in the points array. |
|
Source code in src/deeperfly/skeleton.py
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 | |
fly
classmethod
¶
The default 38-point Drosophila skeleton (DeepFly3D 7-camera rig).
from_config
classmethod
¶
Build a skeleton from a config.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
'Config'
|
A :class: |
required |
Returns:
| Type | Description |
|---|---|
Skeleton
|
The skeleton described by the config's |
Raises:
| Type | Description |
|---|---|
ValueError
|
If a |
Source code in src/deeperfly/skeleton.py
bone_index_pairs ¶
Endpoint index arrays (i, j) for vectorized bone-length maths.
Returns:
| Type | Description |
|---|---|
i, j : np.ndarray
|
The first and second endpoint index of each bone (shape |
Source code in src/deeperfly/skeleton.py
Results¶
PoseResult
dataclass
¶
A complete multi-view pose-estimation result for one recording.
Source code in src/deeperfly/results.py
74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 | |
save ¶
Write the result to an HDF5 file (overwriting path).
The library one-shot: pts2d/conf go to pose2d/ and, when a 3D
pose is present, the (possibly cleaned) 2D, 3D and reprojection error go
to triangulation/ -- so :meth:load round-trips the assembled view.
pts2d is duplicated into both groups in that case (it is small next
to the footage).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str | Path
|
Destination |
required |
Source code in src/deeperfly/results.py
load
classmethod
¶
Read the assembled :class:PoseResult back from an HDF5 file.
Assembly prefers the most-derived data present: pts2d from
triangulation, else pictorial_structures, else pose2d; pts3d /
reproj_error from triangulation, else pictorial_structures; cameras
from bundle_adjustment, else the pose2d config rig.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str | Path
|
Path to a |
required |
Returns:
| Type | Description |
|---|---|
PoseResult
|
The assembled result (cameras, skeleton, points and |
Source code in src/deeperfly/results.py
Recordings¶
Recording
dataclass
¶
One unit of work: a camera -> footage-files map and where its results go.
sources maps a camera name to its naturally-sorted footage files (a single
video, or an image sequence), already reconciled to one extension and validated
to share a file and frame count with the other cameras. Empty only for a
directory kept so a resume can reuse a cached result though its footage is
absent (see :func:resolve_recordings).
outdir is this recording's output directory (see :func:plan_outdirs) --
the run's durable identity, holding the config snapshot and cached results.h5.
The input directory is not retained; a resume re-passes the recording, which
re-resolves sources the same way.
Source code in src/deeperfly/recordings.py
resolve_recordings ¶
resolve_recordings(
inputs: list[Path], *, recursive: bool, config: Config
) -> list[tuple[Path, dict[str, list[Path]]]]
Expand the run inputs into the recordings to process.
inputs is one or more input arguments, each a literal path or a wildcard
pattern expanded against the filesystem (:func:_expand_pattern). A recording
is a directory holding footage for every configured camera, resolved to a
camera -> files map by :func:find_recording (which warns and skips a
malformed one). Output directories are resolved separately
(:func:plan_outdirs). The behaviors:
- A single literal path is taken as that one recording -- kept (with empty sources) even when it is not valid footage, so a resume from its cached result still works -- with a warning naming it when it is not a valid recording.
- Several inputs and/or a wildcard run as a batch: only the valid recordings are kept (a wildcard's incidental non-recording matches are dropped silently); nothing valid is a warned error.
- With
--recursiveeach input is a parent directory whose subtree is walked for recordings; an empty result is an error.
De-duplicated by directory (overlapping inputs) keeping first-seen order.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
inputs
|
list[Path]
|
One or more input arguments (literal paths or wildcard patterns). |
required |
recursive
|
bool
|
Whether each input is a parent directory whose subtree is searched. |
required |
config
|
Config
|
The discovery config (recognizes recording directories). |
required |
Returns:
| Type | Description |
|---|---|
list of (Path, dict)
|
|
Raises:
| Type | Description |
|---|---|
SystemExit
|
If no valid recording can be resolved from |
Source code in src/deeperfly/recordings.py
572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 | |
Bundle adjustment¶
bundle_adjust ¶
bundle_adjust(
cameras: CameraGroup,
pts2d: Float[ndarray, "V N 2"],
*,
fixed: Sequence[str] = (),
shared: Sequence[Sequence[str]] = (),
pts3d: Float[ndarray, "N 3"] | None = None,
**solver_kwargs,
) -> tuple[
OptimizeResult, CameraGroup, Float[ndarray, "N 3"]
]
Bundle-adjust a camera group against observed 2D points.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cameras
|
CameraGroup
|
Initial cameras. Their stacked parameters seed the optimization. |
required |
pts2d
|
Float[ndarray, 'V N 2']
|
Observed 2D points of shape |
required |
fixed
|
Sequence[str]
|
Parameter references to hold constant / tie together; see
:func: |
()
|
shared
|
Sequence[str]
|
Parameter references to hold constant / tie together; see
:func: |
()
|
pts3d
|
Float[ndarray, 'N 3'] | None
|
Initial 3D points; triangulated from |
None
|
**solver_kwargs
|
Forwarded to the core solver (e.g. |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
result |
OptimizeResult
|
The raw scipy least-squares result. |
optimized_cameras |
CameraGroup
|
A camera group carrying the refined parameters. |
pts3d |
ndarray
|
The refined 3D points of shape |
Source code in src/deeperfly/bundle_adjustment/__init__.py
bundle_adjust_from_config ¶
bundle_adjust_from_config(
config: "Config", pts2d: Float[ndarray, "V N 2"]
) -> tuple[
OptimizeResult, CameraGroup, Float[ndarray, "N 3"]
]
Run :func:bundle_adjust driven by a TOML config.
The [bundle_adjustment] section supplies fixed / shared and
the flat scipy least_squares kwargs (e.g. max_nfev / loss). The
points_to_use key (which restricts the bundle-adjustment keypoints) is a
pipeline-level concern handled by :func:deeperfly.pipeline.bundle_adjust_cameras,
not here.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
'Config'
|
A :class: |
required |
pts2d
|
Float[ndarray, 'V N 2']
|
Observed 2D points of shape |
required |
Returns:
| Name | Type | Description |
|---|---|---|
result |
OptimizeResult
|
The raw scipy least-squares result. |
optimized_cameras |
CameraGroup
|
A camera group carrying the refined parameters. |
pts3d |
ndarray
|
The refined 3D points of shape |
Source code in src/deeperfly/bundle_adjustment/__init__.py
Pipeline¶
run_from_points2d ¶
run_from_points2d(
cameras: CameraGroup,
skeleton: Skeleton,
pts2d: Float[ndarray, "V T P 2"],
conf: Float[ndarray, "V T P"] | None = None,
*,
do_bundle_adjust: bool = True,
bundle_adjust_kwargs: dict | None = None,
triangulation: str = "ransac",
weigh_by_confidence: bool = False,
do_pictorial: bool = False,
candidates: Candidates | None = None,
ps_kwargs: dict | None = None,
ransac_threshold: float = 15.0,
min_inliers: int = 2,
reproj_threshold: float = 40.0,
max_drops: int = 5,
fps: float = 100.0,
meta: dict | None = None,
) -> PoseResult
Run the full 2D-to-3D pipeline and return a :class:PoseResult.
Steps: (optional) bundle-adjust cameras -> reconstruct 3D. Unobserved points are expected to already be NaN (the detector's pathway scatter leaves them so).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cameras
|
CameraGroup
|
The camera rig (refined in place when |
required |
skeleton
|
Skeleton
|
Skeleton used for the bone-length prior. |
required |
pts2d
|
Float[ndarray, 'V T P 2']
|
Detector 2D observations of shape |
required |
conf
|
Float[ndarray, 'V T P'] | None
|
Per-observation confidences |
None
|
do_bundle_adjust
|
bool
|
Whether to refine the cameras with bundle adjustment first. |
True
|
bundle_adjust_kwargs
|
dict | None
|
Extra keyword arguments forwarded to :func: |
None
|
triangulation
|
str
|
Reconstruction strategy: |
'ransac'
|
weigh_by_confidence
|
bool
|
When |
False
|
do_pictorial
|
bool
|
When |
False
|
candidates
|
Candidates | None
|
The detector's top-K candidate peaks; required when |
None
|
ps_kwargs
|
dict | None
|
Extra keyword arguments forwarded to the pictorial-structures corrector. |
None
|
ransac_threshold
|
float
|
Per-strategy triangulation knobs (see |
15.0
|
min_inliers
|
float
|
Per-strategy triangulation knobs (see |
15.0
|
reproj_threshold
|
float
|
Per-strategy triangulation knobs (see |
15.0
|
max_drops
|
float
|
Per-strategy triangulation knobs (see |
15.0
|
fps
|
float
|
The recording's frame rate, recorded in the result |
100.0
|
meta
|
dict | None
|
Extra key/value pairs merged into the result |
None
|
Returns:
| Type | Description |
|---|---|
PoseResult
|
The bundle-adjusted cameras, committed 2D, triangulated 3D and diagnostics. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
Source code in src/deeperfly/pipeline/core.py
468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 | |
run_recording ¶
run_recording(
config_path: str | None,
outdir: Path,
*,
sources: dict[str, list[Path]] | None = None,
input=None,
overwrite: list[str] | None = None,
progress=None,
) -> None
Run the config's enabled stages for a single recording, reusing cache.
The config is resolved against outdir (see :meth:Config.read_for_run) and
its [pipeline].do_<stage> toggles decide which stages run
(:meth:Config.stage_flags). An enabled stage reuses its cached result when
its parameters are unchanged and its output is present; editing the config
recomputes exactly the affected stages (and everything downstream). The
pose2d cache always feeds downstream; a derived stage's output feeds
downstream only while that stage is enabled.
A stage runs only if its input is available -- footage for pose2d, a 2D
pose for bundle_adjustment / triangulation, cached candidates for
pictorial_structures, a result for visualization; a stage whose
input is missing is skipped with the reason logged.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config_path
|
str | None
|
The |
required |
outdir
|
Path
|
The recording's output directory (config snapshot + cached results). |
required |
sources
|
dict[str, list[Path]] | None
|
The recording's footage (see :func: |
None
|
input
|
dict[str, list[Path]] | None
|
The recording's footage (see :func: |
None
|
overwrite
|
list[str] | None
|
Stage names to force-recompute (see
:func: |
None
|
progress
|
Optional progress factory threaded into the detector and the compositor. |
None
|
Source code in src/deeperfly/pipeline/run.py
41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 | |
2D detection¶
load_detector ¶
Load the PyTorch detector, optionally from checkpoint (a .pth).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
checkpoint
|
Path to a |
None
|
|
**kwargs
|
Forwarded to :func: |
{}
|
Returns:
| Type | Description |
|---|---|
The loaded detector model.
|
|
Source code in src/deeperfly/pose2d/detector.py
detect_2d ¶
detect_2d(
config: Config,
plan,
models: dict,
*,
sources: dict[str, list[Path]] | None = None,
input=None,
want_candidates,
k,
progress=None,
)
Stream 2D detection over decode blocks -> (pts2d, conf, candidates).
Decodes each source in one continuous forward pass (CPU), handing the
detector one [pose2d] batch_size-frame block at a time and
freeing it before the next, so peak frame memory is bounded by the decode
buffer, not the recording length. Each block feeds every pathway on that
source (the front source is decoded once, read by both its pathways).
Per-block results are concatenated along time. End-of-file comes from the
decoder (a short or exhausted block), so it doesn't depend on
:meth:deeperfly.io.FrameReader.count being exact -- that is only the
progress-bar total.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
Config
|
The run config (I/O backends, batch size, decode buffer). |
required |
plan
|
The detection plan (:class: |
required | |
models
|
dict
|
|
required |
sources
|
dict[str, list[Path]] | None
|
The footage to detect over (see
:func: |
None
|
input
|
dict[str, list[Path]] | None
|
The footage to detect over (see
:func: |
None
|
want_candidates
|
Whether to also extract the top-K candidate peaks (for pictorial structures, which are not cached). |
required | |
k
|
Number of candidate peaks per joint when |
required | |
progress
|
Optional progress factory |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
pts2d |
ndarray
|
Detected 2D of shape |
conf |
ndarray
|
Per-point confidence of shape |
candidates |
Candidates or None
|
The top-K candidate set when |
Raises:
| Type | Description |
|---|---|
SystemExit
|
If the detector received no frames. |
Source code in src/deeperfly/pose2d/stream.py
258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 | |
Geometry primitives¶
geometry ¶
Multi-view geometry primitives with JAX.
Conventions:
- Points carry their dimensionality in the last axis:
pts3dhas shape(..., 3);pts2dhas shape(..., 2). - Camera extrinsics are an axis-angle rotation vector
rvecand a translation vectortvec. A 3D world pointXis mapped to camera coordinates asR(rvec) @ X + tvec. - Camera intrinsics
intrare packed as[fx, fy, cx, cy](or[f, cx, cy]withfx = fy = f) (see :func:intr_to_kmat); distortion coefficientsdistsfollow OpenCV's ordering[k1, k2, p1, p2, k3, k4, k5, k6, s1, s2, s3, s4]. - A projection matrix
pmatis the 3x4 productK @ [R | t].
Functions take their primary operand (pts3d, pts2d) first and camera
parameters after, in the canonical order rvecs, tvecs, intrs, dists.
All are JIT- and grad-friendly.
The batched public functions are :func:jax.jit-wrapped thin :func:jax.vmap
wrappers around the *_one single-observation variants, which compose with
:func:jax.vmap / :func:jax.jacfwd for bundle adjustment. deeperfly installs
only CPU JAX -- the tiny arrays don't benefit from a GPU.
rvec_to_rmat_one ¶
Rodrigues' rotation for a single rotation vector.
Single-instance variant of :func:rvec_to_rmat for use under
:func:jax.vmap and :func:jax.jacfwd.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rvec
|
Float[Array, '3']
|
Axis-angle rotation vector of shape |
required |
Returns:
| Type | Description |
|---|---|
Rotation matrix of shape ``(3, 3)``.
|
|
Source code in src/deeperfly/geometry.py
rmat_to_rvec_one ¶
Rotation matrix to axis-angle vector for a single rotation.
Single-instance variant of :func:rmat_to_rvec.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rmat
|
Float[Array, '3 3']
|
Rotation matrix of shape |
required |
Returns:
| Type | Description |
|---|---|
Axis-angle rotation vector of shape ``(3,)``.
|
|
Source code in src/deeperfly/geometry.py
distort_one ¶
Distortion model applied to a single 2D point.
Single-instance variant of :func:distort.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
xy
|
Float[Array, '2']
|
Normalized 2D coordinate of shape |
required |
dist
|
Float[Array, 'K']
|
Distortion coefficients of shape |
required |
Returns:
| Type | Description |
|---|---|
Distorted 2D coordinate of shape ``(2,)``.
|
|
Source code in src/deeperfly/geometry.py
undistort_one ¶
Invert :func:distort_one: distorted normalized coords -> undistorted.
Recovers the undistorted normalized coordinate (x, y) whose
:func:distort_one image is xy_dist by OpenCV's fixed-point iteration
(the same scheme as cv2.undistortPoints): starting from xy_dist,
repeatedly apply x <- (x_d - tangential) * den / num with the radial
factor num / den and the tangential / thin-prism terms evaluated at the
current estimate. The locus of 3D points projecting to a fixed pixel is a
ray through the camera center regardless of distortion (distortion is a
function of the normalized direction only), so this is the step that turns a
clicked pixel into a back-projection direction (see
:func:backproject_ray_one). An empty dist is the identity.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
xy_dist
|
Float[Array, '2']
|
Distorted normalized 2D coordinate of shape |
required |
dist
|
Float[Array, 'K']
|
Distortion coefficients of shape |
required |
Returns:
| Type | Description |
|---|---|
Undistorted normalized 2D coordinate of shape ``(2,)``.
|
|
Source code in src/deeperfly/geometry.py
227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 | |
project_full_one ¶
project_full_one(
pt3d: Float[Array, "3"],
rvec: Float[Array, "3"],
tvec: Float[Array, "3"],
intr: Float[Array, "P"],
dist: Float[Array, "K"],
) -> Float[Array, "2"]
Project a single 3D point through a single camera.
Single-instance variant of :func:project_full designed to be composed
with :func:jax.vmap over observations and :func:jax.jacfwd over the
camera parameters and the 3D point.
Argument order matches :func:project_full: operand (pt3d) first,
then camera parameters in the canonical order
rvec, tvec, intr, dist.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pt3d
|
Float[Array, '3']
|
3D world point of shape |
required |
rvec
|
Float[Array, '3']
|
Axis-angle rotation vector of shape |
required |
tvec
|
Float[Array, '3']
|
Translation vector of shape |
required |
intr
|
Float[Array, 'P']
|
Packed intrinsics of shape |
required |
dist
|
Float[Array, 'K']
|
Distortion coefficients of shape |
required |
Returns:
| Type | Description |
|---|---|
Projected 2D image point of shape ``(2,)``.
|
|
Source code in src/deeperfly/geometry.py
backproject_ray_one ¶
backproject_ray_one(
pixel: Float[Array, "2"],
rvec: Float[Array, "3"],
tvec: Float[Array, "3"],
intr: Float[Array, "P"],
dist: Float[Array, "K"],
) -> tuple[Float[Array, "3"], Float[Array, "3"]]
Back-project an image pixel to its viewing ray in world coordinates.
Inverts :func:project_full_one: strips the intrinsics
(xy_d = ((u - cx) / fx, (v - cy) / fy)), undistorts via
:func:undistort_one to the normalized direction xy, lifts it to the
camera-frame ray direction [x, y, 1] and rotates it into the world. The
ray origin + s * direction (s >= 0) is exactly the set of world
points that :func:project_full_one maps back to pixel through this
camera. Pair with :func:closest_point_on_ray to move a triangulated 3D
point onto the ray while staying as close as possible to its old location.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pixel
|
Float[Array, '2']
|
Image point of shape |
required |
rvec
|
Float[Array, '3']
|
Axis-angle rotation vector of shape |
required |
tvec
|
Float[Array, '3']
|
Translation vector of shape |
required |
intr
|
Float[Array, 'P']
|
Packed intrinsics of shape |
required |
dist
|
Float[Array, 'K']
|
Distortion coefficients of shape |
required |
Returns:
| Type | Description |
|---|---|
origin, direction : Float[Array, "3"]
|
The camera center |
Source code in src/deeperfly/geometry.py
closest_point_on_ray ¶
closest_point_on_ray(
origin: Float[Array, "3"],
direction: Float[Array, "3"],
target: Float[Array, "3"],
) -> Float[Array, "3"]
The point on a ray nearest a target point (orthogonal projection).
Returns origin + s * direction with
s = (target - origin) . direction / (direction . direction) -- the
unique closest point on the infinite line through origin along
direction. Used by the 3D-correction drag: direction / origin
come from :func:backproject_ray_one for the pixel the user dragged to, and
target is the point's pre-drag 3D position, so the result reprojects
exactly onto the dragged pixel while moving the least in 3D.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
origin
|
Float[Array, '3']
|
A point on the ray, shape |
required |
direction
|
Float[Array, '3']
|
The ray direction, shape |
required |
target
|
Float[Array, '3']
|
The point to approach, shape |
required |
Returns:
| Type | Description |
|---|---|
The closest point on the ray, shape ``(3,)``.
|
|
Source code in src/deeperfly/geometry.py
intr_to_kmat ¶
Build 3x3 camera intrinsic matrices from packed intrinsic parameters.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
intr
|
Float[Array, '*batch P']
|
Packed intrinsic parameters of shape |
required |
Returns:
| Type | Description |
|---|---|
Camera matrices ``K`` of shape ``(..., 3, 3)``.
|
|
Source code in src/deeperfly/geometry.py
rvec_to_rmat ¶
Convert axis-angle rotation vectors to rotation matrices (Rodrigues).
Implements R = I + a * W + b * W^2 with W = skew(rvec),
a = sin(theta) / theta and b = (1 - cos(theta)) / theta^2. Working
on the unnormalized axis avoids 0/0 at theta = 0; evaluating a
and b from their Taylor expansions for small theta sidesteps the
catastrophic cancellation in 1 - cos(theta), keeping the result
orthogonal to machine precision even for tiny rotations. W^2 is
expanded as rvec . rvec^T - theta^2 * I to avoid a batched matmul.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rvec
|
Float[Array, '*batch 3']
|
Axis-angle rotation vectors of shape |
required |
Returns:
| Type | Description |
|---|---|
Rotation matrices of shape ``(..., 3, 3)``.
|
|
Source code in src/deeperfly/geometry.py
rmat_to_rvec ¶
Convert rotation matrices to axis-angle rotation vectors.
Vectorized port of OpenCV's Rodrigues (matrix -> vector). The axis is
read off the antisymmetric part R - R^T in the generic case, but that
part vanishes at theta = pi (R becomes symmetric), so near
theta = pi the axis is instead recovered from the symmetric part
(R + I) / 2 with the signs disambiguated from the off-diagonal entries.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rmat
|
Float[Array, '*batch 3 3']
|
Rotation matrices of shape |
required |
Returns:
| Type | Description |
|---|---|
Axis-angle rotation vectors of shape ``(..., 3)``.
|
|
Source code in src/deeperfly/geometry.py
project_pmat ¶
project_pmat(
pts3d: Float[Array, "*pts 3"],
pmats: Float[Array, "*cams 3 4"],
) -> Float[Array, "*cams *pts 2"]
Project 3D world points to 2D image points using 3x4 projection matrices.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pts3d
|
Float[Array, '*pts 3']
|
3D points of shape |
required |
pmats
|
Float[Array, '*cams 3 4']
|
Projection matrices of shape |
required |
Returns:
| Type | Description |
|---|---|
2D image points of shape ``(*cams, *pts, 2)``.
|
|
Source code in src/deeperfly/geometry.py
triangulate_dlt ¶
triangulate_dlt(
pts2d: Float[Array, "V *pts 2"],
pmats: Float[Array, "V 3 4"],
weights: Float[Array, "V *pts"] | None = None,
) -> Float[Array, "*pts 3"]
Triangulate 3D points by direct linear transformation (DLT).
For each point, stacks two rows per view of the linear system
[x * p3 - p1; y * p3 - p2] @ [X, Y, Z, 1]^T = 0 (where pi is the
i-th row of pmat) and solves for the homogeneous coordinates as the
right-singular vector for the smallest singular value. The smallest
right-singular vector of A equals the eigenvector of A^T A for the
smallest eigenvalue, which is faster than SVD for the 4x4 problem. NaN
observations are zeroed out; points with fewer than two valid views are
returned as NaN.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pts2d
|
Float[Array, 'V *pts 2']
|
2D observations of shape |
required |
pmats
|
Float[Array, 'V 3 4']
|
Projection matrices of shape |
required |
weights
|
Float[Array, 'V *pts'] | None
|
Optional per-(view, point) weights of shape |
None
|
Returns:
| Type | Description |
|---|---|
Triangulated 3D points of shape ``(*pts, 3)``.
|
|
Source code in src/deeperfly/geometry.py
distort ¶
Apply OpenCV-style radial + tangential + thin-prism distortion.
For normalized image coordinates (x, y) with r^2 = x^2 + y^2, the
distortion model with up to 12 coefficients
[k1, k2, p1, p2, k3, k4, k5, k6, s1, s2, s3, s4] is
.. code-block:: text
x_d = x * (1 + k1 r^2 + k2 r^4 + k3 r^6)
/ (1 + k4 r^2 + k5 r^4 + k6 r^6)
+ 2 p1 x y + p2 (r^2 + 2 x^2) + s1 r^2 + s2 r^4
y_d = y * (...) / (...)
+ p1 (r^2 + 2 y^2) + 2 p2 x y + s3 r^2 + s4 r^4
Coefficients beyond K are taken as zero, matching cv2.projectPoints.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pts2d
|
Float[Array, 'V *pts 2']
|
Normalized 2D coordinates of shape |
required |
dists
|
Float[Array, 'V K']
|
Distortion coefficients of shape |
required |
Returns:
| Type | Description |
|---|---|
Distorted 2D coordinates of shape ``(V, *pts, 2)``.
|
|
Source code in src/deeperfly/geometry.py
project_full ¶
project_full(
pts3d: Float[Array, "*pts 3"],
rvecs: Float[Array, "V 3"],
tvecs: Float[Array, "V 3"],
intrs: Float[Array, "V P"] | Float[Array, "P"],
dists: Float[Array, "V K"] | Float[Array, "K"],
) -> Float[Array, "V *pts 2"]
Project 3D world points to 2D image points through full camera models.
Composes the pinhole projection X_cam = R(rvec) X + tvec, perspective
division xy = X_cam[:2] / X_cam[2], distortion via :func:distort, and
the affine intrinsics x_pix = fx * x + cx (and analogously for y).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pts3d
|
Float[Array, '*pts 3']
|
3D world points of shape |
required |
rvecs
|
Float[Array, 'V 3']
|
Axis-angle rotation vectors of shape |
required |
tvecs
|
Float[Array, 'V 3']
|
Translation vectors of shape |
required |
intrs
|
Float[Array, 'V P'] | Float[Array, 'P']
|
Packed intrinsics of shape |
required |
dists
|
Float[Array, 'V K'] | Float[Array, 'K']
|
Distortion coefficients of shape |
required |
Returns:
| Type | Description |
|---|---|
Projected 2D image points of shape ``(V, *pts, 2)``.
|
|
Source code in src/deeperfly/geometry.py
Triangulation helpers¶
triangulation ¶
Skeleton-aware triangulation helpers over a :class:CameraGroup.
Thin NumPy-facing wrappers around :mod:deeperfly.geometry and
:class:deeperfly.cameras.CameraGroup. The contract with the geometry layer is
the NaN convention: a 2D observation of NaN means "this camera did not
(or cannot) see this point", so visibility is expressed purely as NaNs -- no
separate mask array travels downstream.
All functions use the view-leading layout: pts2d has shape (V, *pts, 2)
((V, P, 2) for one frame, (V, T, P, 2) for a sequence); triangulated
points come back as (*pts, 3).
triangulate ¶
triangulate(
cameras: CameraGroup,
pts2d: Float[ndarray, "V *pts 2"],
weights: Float[ndarray, "V *pts"] | None = None,
) -> Float[np.ndarray, "*pts 3"]
Triangulate 3D points from 2D observations (NaN-aware DLT).
Points seen by fewer than two cameras come back as NaN. Forwards to
:meth:CameraGroup.triangulate.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cameras
|
CameraGroup
|
The camera rig. |
required |
pts2d
|
Float[ndarray, 'V *pts 2']
|
2D observations of shape |
required |
weights
|
Float[ndarray, 'V *pts'] | None
|
Optional per-(view, point) weights of shape |
None
|
Returns:
| Type | Description |
|---|---|
ndarray
|
Triangulated 3D points of shape |
Source code in src/deeperfly/triangulation.py
reprojection_error ¶
reprojection_error(
cameras: CameraGroup,
pts3d: Float[ndarray, "*pts 3"],
pts2d: Float[ndarray, "V *pts 2"],
) -> Float[np.ndarray, "V *pts"]
Per-(view, point) reprojection error in pixels.
Projects pts3d through every camera and takes the Euclidean distance to
pts2d. Entries are NaN wherever the observation or the 3D point is
NaN (unobserved / un-triangulated), so callers can ignore them with
np.nanmean / np.nanmax.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cameras
|
CameraGroup
|
The camera rig. |
required |
pts3d
|
Float[ndarray, '*pts 3']
|
Triangulated 3D points of shape |
required |
pts2d
|
Float[ndarray, 'V *pts 2']
|
2D observations of shape |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
Reprojection error of shape |
Source code in src/deeperfly/triangulation.py
triangulate_ransac ¶
triangulate_ransac(
cameras: CameraGroup,
pts2d: Float[ndarray, "V *pts 2"],
*,
threshold: float = 15.0,
min_inliers: int = 2,
weights: Float[ndarray, "V *pts"] | None = None,
) -> tuple[
Float[np.ndarray, "*pts 3"], Bool[np.ndarray, "V *pts"]
]
Robustly triangulate 3D points, rejecting gross 2D outliers (RANSAC).
A single badly mislocated detection drags a plain least-squares fit
(:func:triangulate) and inflates every view's reprojection error, hiding
which view was wrong. RANSAC instead searches for the largest set of mutually
consistent views.
The rigs deeperfly targets have only a handful of cameras, so rather than
sampling, this exhaustively enumerates all C(V, 2) two-view hypotheses
(the deterministic limit of RANSAC). For each pair it triangulates a candidate
and counts views reprojecting within threshold pixels (NaN views never
count). The largest consensus wins (ties broken by smaller total inlier
error), and the point is re-triangulated from all its inlier views. Points
with fewer than min_inliers agreeing views come back NaN.
Operates per point over any leading layout ((V, P, 2), (V, T, P, 2)).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cameras
|
CameraGroup
|
The camera rig. |
required |
pts2d
|
Float[ndarray, 'V *pts 2']
|
2D observations of shape |
required |
threshold
|
float
|
Inlier reprojection-error cutoff in pixels (the greedy
:func: |
15.0
|
min_inliers
|
int
|
Minimum agreeing views required to accept a point (>= 2). |
2
|
weights
|
Float[ndarray, 'V *pts'] | None
|
Optional per-(view, point) weights of shape |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
pts3d |
ndarray
|
Triangulated points of shape |
inliers |
ndarray
|
Boolean |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
Source code in src/deeperfly/triangulation.py
85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 | |
Pictorial structures¶
pictorial ¶
Pictorial-structures (PS) multi-view 2D->3D correction (DeepFly3D-style).
The optional, accuracy-oriented alternative to the default reprojection-outlier
rejection in :func:deeperfly.pipeline.reconstruct. Where that path can only
veto a bad detection, PS can recover the correct joint when the detector's
arg-max landed on the wrong heatmap peak (self-occlusion, crossing legs,
left/right confusion).
Following Gunel et al. (DeepFly3D, 2019):
- Keep the top-K candidate peaks per (view, joint), not just the arg-max
(:func:
extract_candidates). - Per joint, build a pool of multi-view-consistent 3D hypotheses by
triangulating candidate pairs across views, refitting from inlier views, and
scoring by summed heatmap confidence (batched per frame in :func:
solve_frame). - Choose one hypothesis per joint by exact dynamic programming along each limb
(:func:
_chain_dp). The fly skeleton's 2D bones form a forest of simple chains, so the MAP over the bone-length-coupled model is exact -- no loopy belief propagation. An optional temporal term penalizes 3D jumps.
Everything is plain NumPy over a :class:~deeperfly.cameras.CameraGroup and
:class:~deeperfly.skeleton.Skeleton. The detector forward and heatmap decode
happen upstream; this module consumes only candidate peaks + bundle-adjusted cameras.
Candidates
dataclass
¶
Top-K detector peaks per (view, point) for a sequence, in image pixels.
xy is (V, T, P, K, 2) and score is (V, T, P, K); padded /
invisible / sub-threshold slots are NaN (xy) and 0 (score).
The arg-max (K = 0) reproduces the single-peak detection, so bundle adjustment
can still use the plain 2D path while PS consumes the full candidate set.
Source code in src/deeperfly/pictorial.py
frame ¶
peak_candidates ¶
peak_candidates(
heatmaps: Float[ndarray, "*chan H_out W_out"],
k: int = DEFAULT_K,
*,
radius: int = DEFAULT_PEAK_RADIUS,
threshold: float = DEFAULT_PEAK_THRESHOLD,
method: str = DEFAULT_SUBPIXEL,
) -> tuple[
Float[np.ndarray, "*chan K 2"],
Float[np.ndarray, "*chan K"],
]
Top-k local-maxima peaks per heatmap channel (normalized (x, y) + score).
A pixel is a peak if it is the maximum of its (2*radius+1) neighborhood and
exceeds threshold; the strongest k are returned, score-ordered, padded
with NaN / 0 when fewer exist. Each is refined to sub-pixel by
method (the same :func:~deeperfly.pose2d.inference.refine_peaks the
single-peak decoder uses), so candidates carry the arg-max's localization.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
heatmaps
|
Float[ndarray, '*chan H_out W_out']
|
Heatmaps of shape |
required |
k
|
int
|
Number of peaks to keep per channel. |
DEFAULT_K
|
radius
|
int
|
NMS / sub-pixel-window half-width, in heatmap pixels. |
DEFAULT_PEAK_RADIUS
|
threshold
|
float
|
Ignore peaks weaker than this. |
DEFAULT_PEAK_THRESHOLD
|
method
|
str
|
Sub-pixel refinement: |
DEFAULT_SUBPIXEL
|
Returns:
| Name | Type | Description |
|---|---|---|
xy |
ndarray
|
Peak coordinates of shape |
score |
ndarray
|
Raw peak values of shape |
Source code in src/deeperfly/pictorial.py
bone_length_targets ¶
bone_length_targets(
cameras: CameraGroup,
pts2d: Float[ndarray, "V F P 2"],
skeleton: Skeleton,
) -> tuple[np.ndarray, np.ndarray, np.ndarray]
Median bone length per skeleton bone, from an initial triangulation.
Shared by bundle adjustment
(:func:deeperfly.pipeline._bone_prior) and PS so the two agree on the
anatomical prior.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cameras
|
CameraGroup
|
The rig used for the initial triangulation. |
required |
pts2d
|
Float[ndarray, 'V F P 2']
|
2D observations of shape |
required |
skeleton
|
Skeleton
|
Skeleton supplying the bone (edge) list. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
i, j : np.ndarray
|
Bone endpoint index arrays (the columns of :attr: |
|
targets |
ndarray
|
Per-bone median target length of shape |
Source code in src/deeperfly/pictorial.py
skeleton_chains ¶
Decompose the 2D bones into ordered simple chains (paths).
Each connected component of :attr:Skeleton.bones is a path (max degree 2),
returned as an ordered joint list walked from an endpoint; isolated points come
back as singletons. :func:_chain_dp runs exact Viterbi over this ordering.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
skeleton
|
Skeleton
|
Skeleton whose bones are decomposed. |
required |
Returns:
| Type | Description |
|---|---|
list of list of int
|
Ordered joint-index chains (singletons for isolated points). |
Source code in src/deeperfly/pictorial.py
solve_frame ¶
solve_frame(
cameras: CameraGroup,
skeleton: Skeleton,
cand_xy: Float[ndarray, "V P K 2"],
cand_score: Float[ndarray, "V P K"],
target_map: dict[tuple[int, int], float],
chains: list[list[int]],
*,
scale: float,
max_hyp: int = DEFAULT_MAX_HYP,
inlier_px: float = DEFAULT_INLIER_PX,
lam: float = DEFAULT_LAMBDA,
huber: float = DEFAULT_HUBER,
mu: float = DEFAULT_MU,
prev_pts3d: Float[ndarray, "P 3"] | None = None,
) -> tuple[
Float[np.ndarray, "P 3"], Float[np.ndarray, "V P 2"]
]
Pictorial-structures correction for one multi-camera frame.
Generates per-joint 3D hypotheses, prunes them, and runs exact chain DP with
the bone-length prior (and an optional temporal term against prev_pts3d).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cameras
|
CameraGroup
|
The bundle-adjusted rig. |
required |
skeleton
|
Skeleton
|
Skeleton (kept for symmetry with the sequence call). |
required |
cand_xy
|
Float[ndarray, 'V P K 2']
|
Per-frame candidates of shape |
required |
cand_score
|
Float[ndarray, 'V P K 2']
|
Per-frame candidates of shape |
required |
target_map
|
dict[tuple[int, int], float]
|
|
required |
chains
|
list[list[int]]
|
Pre-computed skeleton chains (:func: |
required |
scale
|
float
|
Characteristic bone length scaling the prior and NMS radius. |
required |
max_hyp
|
int
|
Pruning and cost knobs (see the module defaults). |
DEFAULT_MAX_HYP
|
inlier_px
|
int
|
Pruning and cost knobs (see the module defaults). |
DEFAULT_MAX_HYP
|
lam
|
int
|
Pruning and cost knobs (see the module defaults). |
DEFAULT_MAX_HYP
|
huber
|
int
|
Pruning and cost knobs (see the module defaults). |
DEFAULT_MAX_HYP
|
mu
|
int
|
Pruning and cost knobs (see the module defaults). |
DEFAULT_MAX_HYP
|
prev_pts3d
|
Float[ndarray, 'P 3'] | None
|
Previous frame's 3D for the temporal term, or |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
pts3d |
ndarray
|
Chosen 3D points of shape |
obs |
ndarray
|
Per-view 2D observations PS committed to |
Source code in src/deeperfly/pictorial.py
505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 | |
reconstruct ¶
reconstruct(
cameras: CameraGroup,
skeleton: Skeleton,
candidates: Candidates,
pts2d_argmax: Float[ndarray, "V T P 2"],
*,
bone_max_frames: int | None = 100,
temporal: bool = False,
max_hyp: int = DEFAULT_MAX_HYP,
inlier_px: float = DEFAULT_INLIER_PX,
lam: float = DEFAULT_LAMBDA,
huber: float = DEFAULT_HUBER,
mu: float = DEFAULT_MU,
) -> tuple[
Float[np.ndarray, "T P 3"],
Float[np.ndarray, "V T P 2"],
Float[np.ndarray, "V T P"],
]
Run PS correction over a whole sequence.
The bone-length prior is estimated once from an arg-max triangulation of up to
bone_max_frames frames; PS then runs per frame (optionally threading the
previous frame's 3D for the temporal term). Same shapes/contract as
:func:deeperfly.pipeline.reconstruct.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cameras
|
CameraGroup
|
The bundle-adjusted rig. |
required |
skeleton
|
Skeleton
|
Skeleton supplying chains, visibility and the bone-length prior. |
required |
candidates
|
Candidates
|
The detector's top-K candidate peaks for the sequence. |
required |
pts2d_argmax
|
Float[ndarray, 'V T P 2']
|
Arg-max 2D of shape |
required |
bone_max_frames
|
int | None
|
Frames subsampled to estimate the prior ( |
100
|
temporal
|
bool
|
Whether to add the inter-frame temporal term. |
False
|
max_hyp
|
int
|
Per-frame pruning and cost knobs. |
DEFAULT_MAX_HYP
|
inlier_px
|
int
|
Per-frame pruning and cost knobs. |
DEFAULT_MAX_HYP
|
lam
|
int
|
Per-frame pruning and cost knobs. |
DEFAULT_MAX_HYP
|
huber
|
int
|
Per-frame pruning and cost knobs. |
DEFAULT_MAX_HYP
|
mu
|
int
|
Per-frame pruning and cost knobs. |
DEFAULT_MAX_HYP
|
Returns:
| Name | Type | Description |
|---|---|---|
pts3d |
ndarray
|
Corrected 3D of shape |
pts2d |
ndarray
|
Committed per-view 2D of shape |
reproj |
ndarray
|
Reprojection error of shape |
Source code in src/deeperfly/pictorial.py
600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 | |
Frame I/O¶
io ¶
Frame I/O: reader classes for video files and image sequences, plus MP4 writing.
Footage is read through a small class hierarchy rooted at
:class:~deeperfly.io.base.FrameReader:
- :class:
~deeperfly.io.video.VideoReader-- frame-accurate decode of a video file to(T, H, W, 3)uint8 RGB NumPy (PyAV, in-process FFmpeg, CPU). - :class:
~deeperfly.io.images.ImageSequenceReader-- parallel decode of an image sequence (OpenCV).
:func:open_reader resolves a source (video file, image directory/glob, or explicit
footage file list) to the right reader once; callers then index (reader[:],
reader[i], reader[[0,3,5]]) or stream (stream_frames / stream_blocks)
against that object. :class:~deeperfly.io.video.VideoWriter encodes frames to H.264,
one frame or one array at a time.
from deeperfly import io reader = io.open_reader("clip.mp4") frames = reader[:] # (T, H, W, 3) uint8, host NumPy with io.VideoWriter("out.mp4", fps=30) as writer: ... writer.write_frames(frames) # a batch or iterable; or write_frame()
Pose overlays and 3D reconstructions are rendered to MP4 by
:mod:deeperfly.visualization.compose (the OpenCV panel compositor), which builds
on these read/write primitives.
FrameReader ¶
Bases: ABC
Reads (T, H, W, 3) uint8 RGB frames from one footage source.
The two concrete readers -- :class:~deeperfly.io.video.VideoReader (PyAV) and
:class:~deeperfly.io.images.ImageSequenceReader (OpenCV) -- resolve their
source kind once, at construction, rather than on every read.
:func:~deeperfly.io.open_reader is the factory that picks the subclass.
All decoding runs on the CPU and yields host (T, H, W, 3) uint8 RGB NumPy.
Index with reader[key] to decode frames into an array:
reader[5]-- single frame,(H, W, 3)reader[[0, 3, 5]]-- explicit indices (random-access),(T, H, W, 3)reader[2:8:2]-- sequential slice,(T, H, W, 3)reader[:]-- full decode,(T, H, W, 3)
Use :meth:stream_frames / :meth:stream_blocks for lazy forward iteration.
Readers can be used as context managers (symmetric with
:class:~deeperfly.io.video.VideoWriter); :meth:close releases any held
resources and is a no-op for the stateless readers, which open and close the
underlying file per operation.
Source code in src/deeperfly/io/base.py
81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 | |
close ¶
stream_frames
abstractmethod
¶
stream_frames(
*,
start: int = 0,
stop: int | None = None,
step: int = 1,
) -> Iterator[Float[np.ndarray, "H W 3"]]
Yield individual (H, W, 3) uint8 RGB frames from one forward pass.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
start
|
int
|
Frame range, like |
0
|
stop
|
int
|
Frame range, like |
0
|
step
|
int
|
Frame range, like |
0
|
Source code in src/deeperfly/io/base.py
stream_blocks
abstractmethod
¶
stream_blocks(
*,
start: int = 0,
stop: int | None = None,
step: int = 1,
block_size: int = 64,
) -> Iterator[Float[np.ndarray, "T H W 3"]]
Yield (T, H, W, 3) uint8 RGB blocks from one forward pass.
Instead of decoding a fixed [start, stop) slice, walk the source forward
and emit frames in groups of up to block_size. A whole recording is
therefore one linear decode -- no per-window re-open or re-seek.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
start
|
int
|
Frame range, like |
0
|
stop
|
int
|
Frame range, like |
0
|
step
|
int
|
Frame range, like |
0
|
block_size
|
int
|
Maximum frames per yielded block. |
64
|
Yields:
| Type | Description |
|---|---|
ndarray
|
|
Source code in src/deeperfly/io/base.py
count
abstractmethod
¶
Best-effort frame count -- None when unknown.
A hint for a progress-bar total only: callers stream frames and detect
end-of-file from the decoder itself, so an off-by-a-few count or None
never affects correctness.
Source code in src/deeperfly/io/base.py
fps ¶
Frame rate in frames/sec, or None when unknown.
Image sequences carry no intrinsic frame rate, so the base implementation
returns None; :class:~deeperfly.io.video.VideoReader overrides it.
Source code in src/deeperfly/io/base.py
ImageSequenceReader ¶
Bases: FrameReader
Reads an ordered image sequence (a directory, glob, or explicit file list).
Decode-thread count is fixed at construction. Frames are decoded in parallel
across threads via OpenCV; the result is host (T, H, W, 3) uint8 RGB.
Image sequences carry no frame rate, so :meth:fps is the inherited None.
Source code in src/deeperfly/io/images.py
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 | |
from_pattern
classmethod
¶
Build a reader for a directory or glob, listing/sorting its files by name.
Source code in src/deeperfly/io/images.py
VideoReader ¶
Bases: FrameReader
Frame-accurate decode of a single video file via PyAV.
Sequential reads walk the file forward; indexing with a list seeks per target
frame (keyframe + decode forward). count / fps read container metadata
-- both cheap, no full pixel decode.
Source code in src/deeperfly/io/video.py
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 | |
count ¶
Frame count from the container header, or None if it is absent.
Some containers (raw / transport streams, some MKV) omit nb_frames;
an exact count then needs a full decode, so this returns None rather
than a duration * fps estimate.
Source code in src/deeperfly/io/video.py
fps ¶
Average frame rate from the container header, or None if unavailable.
Source code in src/deeperfly/io/video.py
VideoWriter ¶
Incremental H.264 (libx264) MP4 encoder, backed by PyAV.
Open it, feed frames, close it (or use it as a context manager).
:meth:write_frame appends one (H, W, 3) frame; :meth:write_frames
appends a whole (T, H, W, 3) array or any iterable of frames / blocks -- so
a long clip can be encoded as it is produced, without ever holding every frame
in memory:
with VideoWriter("out.mp4", fps=30) as writer: ... for frame in render(): # a (H, W, 3) frame ... writer.write_frame(frame)
The container and stream are opened lazily on the first frame (its size sets the
encode dimensions, rounded down to even for yuv420p subsampling); later
frames are cropped to match. Non-uint8 input is clipped to [0, 255].
Source code in src/deeperfly/io/video.py
172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 | |
write_frame ¶
Append a single (H, W, 3) frame (non-uint8 is clipped to [0, 255]).
The first frame's size sets the encode dimensions (rounded down to even for
yuv420p subsampling); later frames are cropped to match.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
frame
|
One |
required |
Source code in src/deeperfly/io/video.py
write_frames ¶
Append many frames: a (T, H, W, 3) batch, or any iterable of frames.
Accepts a NumPy array (each frame along axis 0), a torch / DLPack batch, or any iterable of frames or blocks (e.g. a generator) -- so frames can be encoded as they arrive, without holding the whole clip in memory.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
frames
|
A batch, or an iterable of frames / batches (non- |
required |
Source code in src/deeperfly/io/video.py
close ¶
Flush the encoder and close the file (idempotent).
Source code in src/deeperfly/io/video.py
is_video_file ¶
Whether path is an existing video file (decoded as a video, not an image
directory/glob/sequence).
to_numpy ¶
Collapse decoded frames (NumPy / torch tensor) to a NumPy array.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
frames
|
A NumPy array or a torch tensor (or any array-like). |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
The frames as a host NumPy array. |
Source code in src/deeperfly/io/base.py
to_torch ¶
Hand frames to torch, zero-copy where possible.
A torch.Tensor passes through untouched, any other DLPack-capable array is
wrapped via the DLPack protocol, and NumPy input (what the PyAV reader returns)
is wrapped on the host via zero-copy torch.from_numpy.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
frames
|
A torch tensor, a DLPack-capable array, or a NumPy array. |
required |
Returns:
| Type | Description |
|---|---|
Tensor
|
The frames as a torch tensor (zero-copy where possible). |
Source code in src/deeperfly/io/base.py
list_image_files ¶
Sorted image files for a directory or glob pattern (by name).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pattern
|
str | Path
|
A directory of images, or a glob pattern. |
required |
Returns:
| Type | Description |
|---|---|
list of Path
|
The matching image files, sorted by name. |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If nothing matches |
Source code in src/deeperfly/io/images.py
open_reader ¶
Open the right :class:FrameReader for a footage source.
Dispatches on source (the one place this dispatch lives):
- a single video file (
.mp4/.avi/.mov...) -> a :class:~deeperfly.io.video.VideoReader(PyAV); - a directory or glob of images -> an
:class:
~deeperfly.io.images.ImageSequenceReader(OpenCV,workerssets decode parallelism); - an explicit list of footage files -- one video file, or an ordered image
sequence the caller has already resolved (
deeperfly runresolves each camera's files up front, naturally sorted) -- is read in the given order without re-listing the directory.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
source
|
str | Path | list[Path]
|
A video file, an image directory/glob, or an explicit list of footage files (one video, or an ordered image sequence). |
required |
workers
|
int | None
|
Decode thread count for image sequences. |
None
|
Returns:
| Type | Description |
|---|---|
FrameReader
|
A :class: |
Raises:
| Type | Description |
|---|---|
ValueError
|
If an explicit file list is empty. |