node-word-extractor issues

Add a way to detect Numbering indicator/ Bullet point

12

Thanks for making such a great lib, I just wonder is there a way we can know that some text is prefix with Numbering indicator/ Bullet point? For example: I...

tungduonghgg123

chore(deps): bump minimist from 1.2.5 to 1.2.6

Bumps [minimist](https://github.com/substack/minimist) from 1.2.5 to 1.2.6. Commits 7efb22a 1.2.6 ef88b93 security notice for additional prototype pollution issue c2b9819 isConstructorOrProto adapted from PR bc8ecee test from prototype pollution PR See full...

dependabot[bot]

dependencies

chore(deps): bump tmpl from 1.0.4 to 1.0.5

Bumps [tmpl](https://github.com/daaku/nodejs-tmpl) from 1.0.4 to 1.0.5. Commits See full diff in compare view [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=tmpl&package-manager=npm_and_yarn&previous-version=1.0.4&new-version=1.0.5)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter...

dependabot[bot]

dependencies

Add method to read form data

2

A second common use case, added by request in `Text::Extract::Word`, was to read form data from protected Word files. Again, the code for this still exists in the Perl component,...

morungos

enhancement

Broken multi-byte letters at the borders of 4096-byte chunks

3

The **handleEntry()** function in the **open-office-extractor.js** file has an instruction to read streams by 4096-byte chunks: `const chunk = readStream.read(0x1000);` Given the text in the *.docx file is not in...

Oliverity

Trying to avoid a multi-byte character breaking.

2

On the issue https://github.com/morungos/node-word-extractor/issues/54. Unfortunately, it seems we can only get the encoding from the XML heading if we read the stream, and we better assume the encoding before that...

Oliverity

Add way to iterate, fetch, and count pages

1

Hello! I was wondering if it would be possible to add some paging functionality. This issue could serve as three related requests: 1. A way to iterate through pages 2....

a1icja

chore(deps): bump ws from 7.4.6 to 7.5.10

Bumps [ws](https://github.com/websockets/ws) from 7.4.6 to 7.5.10. Release notes Sourced from ws's releases. 7.5.10 Bug fixes Backported e55e5106 to the 7.x release line (22c28763). 7.5.9 Bug fixes Backported bc8bd34e to the...

dependabot[bot]

dependencies

Cannot read properties of null (reading 'open') while using Uint8Array as params for extract method.

1

If i pass Uint8Array data in below function like in example below const extracted = await extractor.extract(data); I got this error error TypeError: Cannot read properties of null (reading 'open')

weponary

How do i extract HTML

How do i extract word to HTML

kawlaw

node-word-extractor
node-word-extractor copied to clipboard

Metadata

Add a way to detect Numbering indicator/ Bullet point

chore(deps): bump minimist from 1.2.5 to 1.2.6

chore(deps): bump tmpl from 1.0.4 to 1.0.5

Add method to read form data

Broken multi-byte letters at the borders of 4096-byte chunks

Trying to avoid a multi-byte character breaking.

Add way to iterate, fetch, and count pages

chore(deps): bump ws from 7.4.6 to 7.5.10

Cannot read properties of null (reading 'open') while using Uint8Array as params for extract method.

How do i extract HTML

← Metadata

Owner

Metadata

node-word-extractor node-word-extractor copied to clipboard

Metadata

← Metadata

Owner

Metadata

node-word-extractor
node-word-extractor copied to clipboard