Fix regression using e30b4ee7
This PR addresses 2 things:
-
13b39f7b: remove the different reading mode using PyMuPDF and keep only the one the more generic one i.e. the one being able to generate a toc from: a pdf toc with or without links AND pdf without toc. In
pdfstruct/pdf_processor/utils.py
margin values [1, 2] have been changed in order to be closer original BBOX given by PyMuPDF. These values are also smaller and therefore require less CPU ressources. Nevertheless keep in mind that those values might have to be adapted for some pdf files. -
Fixing regression detected while running on e30b4ee7: this regression appeared when moving
pdfstruct
to static types. The culprit was this line: https://git.lab.sspcloud.fr/liriae/pdfstruct/-/blob/main/pdfstruct/page.py?ref_type=heads#L559 caused by https://git.lab.sspcloud.fr/liriae/pdfstruct/-/blob/v1.0.0/pdfstruct/page.py?ref_type=tags#L551 in fact collected is changing type.
Therefore I went back to the initial implementation and put the minimal set of type annotations.