Flush support while creating large images
Hi, I would like to ask if there is any way in pyvips to flush a pipeline during writing, for two reasons:
- write with streaming input
- update image based on already written parts.
(from what I found, it seems so far that it is only support a single write at the end of the pipeline)
What I'm trying to do is to create a very large tiff image (about 165000 x 465000 ) made of several smaller images. The problem is that the images are streaming and one image have to be deleted before next image is available, Another problem is that to write the next part of the image, I need to use data from what (should) have been written by now (which is unavaialable as the pipeline haven't started yet). but it appears that once a pipeline in pyvips is written it cannot be read again.
so I'd like to ask, is there any way to flush the image or initiate the pipeline step by step? and... if so, is there also a way to read from the flushed data?
Many thanks for this library and for your support.
Hi @BLooperZ,
This is supposed to be automatic, I think. Could you give some more details? How many images are you trying to join? What resource is being exhausted?
If you can make a small, sample program that does shows the problem, that would be great.
Thank you so much for the quick response, There are about 3500 images of size 3600 x 2400.
Here are the sample code I have came up with for demonstration:
import os
import pyvips
import numpy as np
INPUT = 'input.jpg'
# from: https://github.com/libvips/pyvips/blob/master/examples/pil-numpy-pyvips.py
from npvips import vips2numpy, numpy2vips
def generate_image(x, y):
im = pyvips.Image.black(100, 100)
im = im.colourspace(pyvips.Interpretation.SRGB)
im = im.draw_circle([(x * 129 + y * 31 + 190) % 256, (x * 39) % 256, (y * 12) % 256], 50, 50, 50, fill=True)
# for demonstration: file is overriden so data is not available on next iteration
im.write_to_file(INPUT)
return INPUT, 100, 100
def transform_tile(im, tile, x, y):
# Assume this was external computation which uses numpy
if x == 0:
return numpy2vips(vips2numpy(tile))
left = vips2numpy(im.crop(x - 50, y, 50, 100))
right = vips2numpy(tile.crop(0, 0, 50, 100))
arr = np.hstack([right, left])
return numpy2vips(arr)
def generate_images(cols, rows):
for row in range(rows):
for col in range(cols):
im_file, width, height = generate_image(row, col)
# tiles are not actually aligned in the general use case, this just simplify thing in this example
yield im_file, (col * width, row * height)
# is this how to create a new image?
im = pyvips.Image.black(1000, 1000)
im = im.colourspace(pyvips.Interpretation.SRGB)
for im_file, (x, y) in generate_images(10, 10):
tile = pyvips.Image.new_from_file(im_file, access=pyvips.Access.SEQUENTIAL)
# this and the next block are mutually exclusive, if both executed the image is completely black
transformed = transform_tile(im, tile, x, y)
im = im.insert(transformed, x, y, expand=True, background=[255.0, 255.0, 255.0])
# EDIT: this line is what i meant by "next block"
# im = im.insert(tile, x, y, expand=True, background=[255.0, 255.0, 255.0])
# # if uncommenting this, the circles will also become completely black
# # console prints: vips warning: linecache: error reading tile 0x88: VipsJpeg: out of order read at line 100
# im.write_to_file(
# 'im.tif',
# compression='jpeg',
# tile=True, tile_width=512, tile_height=512,
# pyramid=True, bigtiff=True
# )
im.write_to_file(
'im.tif',
compression='jpeg',
tile=True, tile_width=512, tile_height=512,
pyramid=True, bigtiff=True
)
While written the example I also had some related discovery, it seems that if the image is read (vips2numpy reads a buffer) the pipeline does make a progress (resulting with circles of different colors) but then the image must be created again or it become unusable (doing insert make the image black)
if the image have not been read, all circles are of the same (last) color.
I think the solution is, therefore to read a cropped image to make the pipeline flush, but I can't say I understand why it works.
If you have images coming from numpy, they will all need to be in RAM. That's a lot of pixels :( 3600 * 2400 * 3500 == 30GB of memory.
You'll probably get a stack overflow as well if you do 3500 x = x.insert(y ..) in a row. That's making a very deep pipeline.
I'd guess you'll need to go via the filesystem -- write your 3500 images to a set of files and then assemble them, perhaps in sections.
Thank you, The input itself is indeed in filesystem but get overriden, do you mean it have to be copied? (I would like to avoid that) My question is, is it possible to use the sink itself or some other temporary file(/s), to serve as "working image" (to read current image state, and to write the changes incrementally)? (and then write the final state of the image to the target file if it wasn't written already)
(or perhaps that what you meant by going through the filesystem?)
If you're using numpy, you will need all the pixels in memory, there's no way around that.
You could perhaps swap numpy for pyvips, that should help a bit. What kind of processing are you doing?
Sorry, but I don't know exactly what is happening during processing, might be some kind of blending. I'd rather ignore the processing itself for now. Perhaps it is possible to save the results to temporary files (as sections?) and join them afterwards? I see there is support for temporary files, but do not understand when and how to use them, and could not find an example. Thank you.
Yes, i would sort your tiles by y and write the image in a series of sections. You'll need to add an alpha so you know which parts are unused. Save the intermediates as vips format (.v), then use composite to join them up.
I made you a tiny demo:
#!/usr/bin/python3
import sys
import random
import pyvips
# we generate the image in a series of sections, then join the sections up in a
# second pass
image_width = 150000
image_height = 500000
n_images = 10000
tile_height = 2048
# the top and height of this section come from argv
section_top = int(sys.argv[3])
section_height = int(sys.argv[4])
# the [x, y] position of each tile
# you'll have a fixed set of positions you reuse, we just use a fixed seed here
random.seed(0)
positions = [[random.randint(0, image_width), random.randint(0, image_height)]
for _ in range(n_images)]
# find the set of tiles we paint for this section
clipped_tiles = list(filter(lambda p:
p[1] > section_top - tile_height and \
p[1] < section_top + section_height,
positions))
print(f'generating {len(clipped_tiles)} tiles ...')
# now you know the tiles that need painting for this section, use numpy to
# generate just that subset
# we reuse a single tile for simplicity
tile = pyvips.Image.new_from_file(sys.argv[1])
print(f'building pipeline ...')
# the background we paint into
image = pyvips.Image.black(image_width, section_height, bands=3)
image = image.copy(interpretation='srgb')
for x, y in clipped_tiles:
image = image.insert(tile, x, y - section_top)
previous_progress = -1
def eval_handler(image, progress):
global previous_progress
if progress.percent > previous_progress:
print(f'{progress.percent}%, {progress.eta}s to go')
previous_progress = progress.percent
# feedback eval progress
image.set_progress(True)
image.signal_connect('eval', eval_handler)
print(f'writing {sys.argv[2]} ...')
image.write_to_file(sys.argv[2])
Run this repeatedly to generate a series of sections:
./sectionjoin.py ~/pics/k2.jpg s01.v 0 10000
./sectionjoin.py ~/pics/k2.jpg s02.v 10000 10000
./sectionjoin.py ~/pics/k2.jpg s03.v 20000 10000
...
Then join the sections up:
vips arrayjoin "$(echo s*.v)" output.tif
You could do it all in one python file, but it's probably safer to just run it several times. You'll need a lot of disc space. You can probably make fewer and taller sections, it depends how much RAM you have. I'd use a bash loop (obviously) for section generation.
Thank you, I'll try it