maven-wagon icon indicating copy to clipboard operation
maven-wagon copied to clipboard

[MNG-6975] Wagon-HTTP, set content-type when determinable from file extension

Open chrisbeckey opened this issue 5 years ago • 20 comments

Set the HTTP content-type header,

  • using the file extension to determine value, when uploading a file.

  • using the header when uploading a Stream

chrisbeckey avatar Aug 06 '20 14:08 chrisbeckey

regarding: "Is it really 1.11? If yes, then this is a bug in Maven." Just to be sure I tested it with "1.11" and with "11" and they both worked. I then tried with "12" (I'm running JDK 11) and it failed with the expected missing package. I think it works either way.

regarding: "per-server config rather than global." I agree, is there an example you can point me to that I can start from?

On Thu, Aug 6, 2020 at 11:17 AM Michael Osipov [email protected] wrote:

This email was sent from an external source so please treat with caution.

@michael-o commented on this pull request.


In wagon-providers/pom.xml https://github.com/apache/maven-wagon/pull/72#discussion_r466472750 :

@@ -78,4 +78,20 @@ under the License. test

  •  <id>jdk11</id>
    
  •  <activation>
    
  •    <jdk>[1.11,)</jdk>
    

Is it really 1.11? If yes, then this is a bug in Maven.

In wagon-providers/wagon-http-shared/src/main/java/org/apache/maven/wagon/shared/http/AbstractHttpClientWagon.java https://github.com/apache/maven-wagon/pull/72#discussion_r466474091 :

@@ -279,6 +296,13 @@ public boolean isStreaming() private static final boolean SSL_ALLOW_ALL = Boolean.valueOf( System.getProperty( "maven.wagon.http.ssl.allowall", "false" ) );

  • /**
  • * If enabled, then the content-type HTTP header will be set using the file extension to determine the type,
    
  • * <b>disabled by default</b>
    
  • * This flag is only effective when uploading from a File, not directly from a Stream.
    
  • */
    
  • private static final boolean SET_CONTENT_TYPE_FROM_FILE_EXTENSION =

I wonder whether this should be a per-server config rather than global. In general, sys props are either globally or for a technical reason. WDYT?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/apache/maven-wagon/pull/72#pullrequestreview-462598973, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMYZJ7EINPE3U7SSVUIEOODR7LCITANCNFSM4PWVNA4A .

About Ascential plc: Ascential (LSE:ASCL.L) is a specialist information, data and analytics company that helps the world’s most ambitious businesses win in the digital economy. Our information, insights, connections, data and digital tools solve customer problems in three disciplines: Product Design via global trend forecasting service WGSN; Marketing via global benchmark for creative excellence and effectiveness Cannes Lions and WARC and strategic advisory firm MediaLink; and Sales via ecommerce-driven data, insights and advisory service Edge by Ascential, leading managed services provider for Amazon Flywheel Digital, the world's premier payments and Fin Tech congress Money20/20, global retail industry summit World Retail Congress and retail news outlet Retail Week. Ascential also powers political, construction and environmental intelligence brands DeHavilland, Glenigan and Groundsure. The information in or attached to this email is confidential and may be legally privileged. If you are not the intended recipient of this message, any use, disclosure, copying, distribution or any action taken in reliance on it is prohibited and may be unlawful. If you have received this message in error, please notify the sender immediately by return email and delete this message and any copies from your computer and network. Ascential does not warrant that this email and any attachments are free from viruses and accepts no liability for any loss resulting from infected email transmissions. Ascential reserves the right to monitor all email through its networks. Any view expressed may be those of the originator and not necessarily of Ascential plc. Please be advised all phone calls may be recorded for training and quality purposes and by accepting and/or making calls to us you acknowledge and agree to calls being recorded. Ascential plc, number 9934451 (England and Wales). Registered Office: The Prow, 1 Wilder Walk, London W1B 5AP.

chrisbeckey avatar Aug 06 '20 15:08 chrisbeckey

@chrisbeckey

Just to be sure I tested it with "1.11" and with "11" and they both worked.

Yeah, but why it did?

I think that 1.11 works only because when

[DEBUG] Detected Java String: '11.0.8'
[DEBUG] Normalized Java String: '11.0.8'

normalized string is tested against range [1.11,) - it is accepted due to 11 (of 11.0.8) being greater than 1 of (1.11).

I am pretty sure it shall be [11,), as you see, with [1.11,) for example JDK 10 is also accepted (as 10 > 1).

pzygielo avatar Aug 06 '20 16:08 pzygielo

Regarding the version 1.11 vs 11: in following up on the suggestion by @elharo to use URLConnection.guess... the need for javax.activation has been negated. Update to the PR is coming soon.

chrisbeckey avatar Aug 06 '20 16:08 chrisbeckey

Updated to use URLConnection.guess***, which also allows this to work with streaming. Added a second property to control behavior when an error in content identification occurs. Default to creating content-type and not failing if determining content-type fails. Thanks to @elharo for the suggestion, this is better than before.

chrisbeckey avatar Aug 06 '20 16:08 chrisbeckey

The stream guessing is almost pointless, look at its code: https://github.com/battleblow/openjdk-jdk8u/blob/bsd-port/jdk/src/share/classes/java/net/URLConnection.java

Path based approach looks better:

#sun.net.www MIME content-types table
#
# Property fields:
#
#   <description> ::= 'description' '=' <descriptive string>
#    <extensions> ::= 'file_extensions' '=' <comma-delimited list, include '.'>
#         <image> ::= 'icon' '=' <filename of icon image>
#        <action> ::= 'browser' | 'application' | 'save' | 'unknown'
#   <application> ::= 'application' '=' <command line template>
#

#
# The "we don't know anything about this data" type(s).
# Used internally to mark unrecognized types.
#
content/unknown: description=Unknown Content
unknown/unknown: description=Unknown Data Type

#
# The template we should use for temporary files when launching an application
# to view a document of given type.
#
temp.file.template: c:\\temp\\%s

#
# The "real" types.
#
application/octet-stream: \
	description=Generic Binary Stream;\
	file_extensions=.saveme,.dump,.hqx,.arc,.obj,.lib,.bin,.exe,.zip,.gz

application/oda: \
	description=ODA Document;\
	file_extensions=.oda

application/pdf: \
	description=Adobe PDF Format;\
	file_extensions=.pdf

application/postscript: \
	description=Postscript File;\
	file_extensions=.eps,.ai,.ps;\
	icon=ps

application/rtf: \
	description=Wordpad Document;\
	file_extensions=.rtf;\
	action=application;\
	application=wordpad.exe %s

application/x-dvi: \
	description=TeX DVI File;\
	file_extensions=.dvi

application/x-hdf: \
	description=Hierarchical Data Format;\
	file_extensions=.hdf;\
	action=save

application/x-latex: \
	description=LaTeX Source;\
	file_extensions=.latex

application/x-netcdf: \
	description=Unidata netCDF Data Format;\
	file_extensions=.nc,.cdf;\
	action=save

application/x-tex: \
	description=TeX Source;\
	file_extensions=.tex

application/x-texinfo: \
	description=Gnu Texinfo;\
	file_extensions=.texinfo,.texi

application/x-troff: \
	description=Troff Source;\
	file_extensions=.t,.tr,.roff

application/x-troff-man: \
	description=Troff Manpage Source;\
	file_extensions=.man

application/x-troff-me: \
	description=Troff ME Macros;\
	file_extensions=.me

application/x-troff-ms: \
	description=Troff MS Macros;\
	file_extensions=.ms

application/x-wais-source: \
	description=Wais Source;\
	file_extensions=.src,.wsrc

application/zip: \
	description=Zip File;\
	file_extensions=.zip;\
	icon=zip;\
	action=save

application/x-bcpio: \
	description=Old Binary CPIO Archive;\
	file_extensions=.bcpio;\
	action=save

application/x-cpio: \
	description=Unix CPIO Archive;\
	file_extensions=.cpio;\
	action=save

application/x-gtar: \
	description=Gnu Tar Archive;\
	file_extensions=.gtar;\
	icon=tar;\
	action=save

application/x-shar: \
	description=Shell Archive;\
	file_extensions=.sh,.shar;\
	action=save

application/x-sv4cpio: \
	description=SVR4 CPIO Archive;\
	file_extensions=.sv4cpio;\
	action=save

application/x-sv4crc: \
	description=SVR4 CPIO with CRC;\
	file_extensions=.sv4crc;\
	action=save

application/x-tar: \
	description=Tar Archive;\
	file_extensions=.tar;\
	icon=tar;\
	action=save

application/x-ustar: \
	description=US Tar Archive;\
	file_extensions=.ustar;\
	action=save

audio/basic: \
	description=Basic Audio;\
	file_extensions=.snd,.au;\
	icon=audio

audio/x-aiff: \
	description=Audio Interchange Format File;\
	file_extensions=.aifc,.aif,.aiff;\
	icon=aiff

audio/x-wav: \
	description=Wav Audio;\
	file_extensions=.wav;\
	icon=wav;\
	action=application;\
	application=mplayer.exe %s

image/gif: \
	description=GIF Image;\
	file_extensions=.gif;\
	icon=gif;\
	action=browser

image/ief: \
	description=Image Exchange Format;\
	file_extensions=.ief

image/jpeg: \
	description=JPEG Image;\
	file_extensions=.jfif,.jfif-tbnl,.jpe,.jpg,.jpeg;\
	icon=jpeg;\
	action=browser

image/tiff: \
	description=TIFF Image;\
	file_extensions=.tif,.tiff;\
	icon=tiff

image/vnd.fpx: \
	description=FlashPix Image;\
	file_extensions=.fpx,.fpix

image/x-cmu-rast: \
	description=CMU Raster Image;\
	file_extensions=.ras

image/x-portable-anymap: \
	description=PBM Anymap Image;\
	file_extensions=.pnm

image/x-portable-bitmap: \
	description=PBM Bitmap Image;\
	file_extensions=.pbm

image/x-portable-graymap: \
	description=PBM Graymap Image;\
	file_extensions=.pgm

image/x-portable-pixmap: \
	description=PBM Pixmap Image;\
	file_extensions=.ppm

image/x-rgb: \
	description=RGB Image;\
	file_extensions=.rgb

image/x-xbitmap: \
	description=X Bitmap Image;\
	file_extensions=.xbm,.xpm

image/x-xwindowdump: \
	description=X Window Dump Image;\
	file_extensions=.xwd

image/png: \
	description=PNG Image;\
	file_extensions=.png;\
	icon=png;\
	action=browser

image/bmp: \
	description=Bitmap Image;\
	file_extensions=.bmp;

text/html: \
	description=HTML Document;\
	file_extensions=.htm,.html;\
	icon=html

text/plain: \
	description=Plain Text;\
	file_extensions=.text,.c,.cc,.c++,.h,.pl,.txt,.java,.el;\
	icon=text;\
	action=browser

text/tab-separated-values: \
	description=Tab Separated Values Text;\
	file_extensions=.tsv

text/x-setext: \
	description=Structure Enhanced Text;\
	file_extensions=.etx

video/mpeg: \
	description=MPEG Video Clip;\
	file_extensions=.mpg,.mpe,.mpeg;\
	icon=mpeg

video/quicktime: \
	description=QuickTime Video Clip;\
	file_extensions=.mov,.qt

application/x-troff-msvideo: \
	description=AVI Video;\
	file_extensions=.avi;\
	icon=avi;\
	action=application;\
	application=mplayer.exe %s

video/x-sgi-movie: \
	description=SGI Movie;\
	file_extensions=.movie,.mv

message/rfc822: \
	description=Internet Email Message;\
	file_extensions=.mime

application/xml: \
	description=XML document;\
	file_extensions=.xml

You can even supply a custom file with -Dcontent.types.user.table=.....

michael-o avatar Aug 06 '20 17:08 michael-o

@chrisbeckey What is your stance on my previous comment?

michael-o avatar Aug 06 '20 20:08 michael-o

@michael-o I'm not sure I understand the comment. "guessContentTypeFromStream" is invoked if the source is an InputStream, not when reading from a File, there is no extension available to determine the content from.

chrisbeckey avatar Aug 07 '20 12:08 chrisbeckey

@chrisbeckey I was referring to this block: https://github.com/battleblow/openjdk-jdk8u/blob/bsd-port/jdk/src/share/classes/java/net/URLConnection.java#L1434-L1562

michael-o avatar Aug 07 '20 12:08 michael-o

@michael-o Is there a better way to determine content from a Stream (not from a file name, which is handled separately)?

chrisbeckey avatar Aug 07 '20 13:08 chrisbeckey

@chrisbeckey Not with external libraries.

michael-o avatar Aug 07 '20 13:08 michael-o

@michael-o assuming you meant "Not WITHOUT external libraries." ? IMHO, URLConnection functions over the activation library has the benefit of simplicity. Changing the POM to add the JDK11 profile (and the activation within that) has maintenance implications that are not commensurate with the benefit, considering that this feature has not been previously requested. In effect using URLConnection will address the majority of use cases with about a dozen lines of code and little expected maintenance. Of course, reasonable opinions may differ ...

chrisbeckey avatar Aug 07 '20 14:08 chrisbeckey

I will pick this up by the end of the week.

michael-o avatar Aug 12 '20 16:08 michael-o

There are still statements which have not been responded to. Until then this PR will remain open.

michael-o avatar Aug 17 '20 09:08 michael-o

Still interested in completing this?

michael-o avatar Nov 19 '20 09:11 michael-o

As a note: My (minor) issue could have benefited from this: https://issues.apache.org/jira/browse/MDEPLOY-289

The summary is our enterprise (amazon) used custom maven repository proxies. It would be nice to not default/guess content type when missing.

booniepepper avatar Dec 08 '21 19:12 booniepepper

Closed by accident

michael-o avatar Dec 09 '21 10:12 michael-o

I am still interested in seen this land.

In our case the WAF complained about the absence of a content-type header when uploading a JAR. I guess it would have been fine seeing application/octet-stream - had there been a header at all. (sorry, if there's a separate issue for that, didn't find one)

[DEBUG] http-outgoing-0 >> PUT /artifactory/the/rest/is/redacted.jar HTTP/1.1
[DEBUG] http-outgoing-0 >> Cache-control: no-cache
[DEBUG] http-outgoing-0 >> Pragma: no-cache
[DEBUG] http-outgoing-0 >> User-Agent: Apache-Maven/3.8.5 (Java 11.0.9.1; Mac OS X 10.16)
[DEBUG] http-outgoing-0 >> Content-Length: 17856154
[DEBUG] http-outgoing-0 >> Host: my.artifactory.host
[DEBUG] http-outgoing-0 >> Connection: Keep-Alive
[DEBUG] http-outgoing-0 >> Expect: 100-continue
[DEBUG] http-outgoing-0 >> Cookie: SCDID_S=redacted$$
[DEBUG] http-outgoing-0 >> Accept-Encoding: gzip,deflate
[DEBUG] http-outgoing-0 >> Authorization: Basic *redacted*
[DEBUG] http-outgoing-0 << HTTP/1.1 100 Continue

P.S. Maybe the description of this PR should mention that it'll fix https://issues.apache.org/jira/browse/WAGON-599.

marcelstoer avatar Sep 21 '22 15:09 marcelstoer

This is a bug in your WAF: https://www.rfc-editor.org/rfc/rfc9110.html#name-content-type Nothing wrong with the request

michael-o avatar Sep 21 '22 16:09 michael-o

I didn't claim there be anything "wrong" with the request. Of course we can adjust the security profile in the WAF such that it allows PUT requests without content-type (which we did). Still, I see this PR as a useful addition to wagon.

marcelstoer avatar Sep 21 '22 17:09 marcelstoer