Package com.xmlmind.util
Class LoadText
- java.lang.Object
-
- com.xmlmind.util.LoadText
-
public final class LoadText extends Object
A utility class allowing to load a text file. For example, a CSS stylesheet starting with a BOM, a@charsetor no special encoding specification.Unlike
FileUtil.loadString(java.io.File)andURLUtil.loadString(java.net.URL), this utility class implements the detection of the encoding.Note that the detection of the encoding always succeeds because it uses a fallback value.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classLoadText.EmacsStyleDetectorDetects an encoding by parsing-*- coding: ENCODING -*-.static classLoadText.EncodingEncoding returned byguessEncoding(byte[], int, int).static interfaceLoadText.EncodingDetectorDetects an encoding by parsing an ASCII encoding specification (example:@charset "UTF-8";).static classLoadText.EncodingDetectorBaseA base class which checks for validity the encoding returned byLoadText.EncodingDetectorBase.doDetectEncoding(java.lang.String).static classLoadText.HTMLCharsetDetectorDetects an encoding by parsing<meta charset="ENCODING" >or<meta http-equiv="Content-Type" content="text/html; charset=ENCODING">.static classLoadText.KeywordBasedDetectorDetects an encoding by parsingKEYWORD "ENCODING";, for example@charset "ENCODING";.static classLoadText.XMLEncodingDetectorDetects an encoding by parsing<?xml encoding="ENCODING"?>.
-
Field Summary
Fields Modifier and Type Field Description static LoadText.EncodingDetector[]ALL_ENCODING_DETECTORSA ready-to-use array containing allLoadText.EncodingDetectors.static byte[]BOM_UTF16_BETheUTF-16BEBOM (Byte Order Mark).static byte[]BOM_UTF16_LETheUTF-16LEBOM (Byte Order Mark).static byte[]BOM_UTF8TheUTF-8BOM (Byte Order Mark).static LoadText.KeywordBasedDetectorCSS_CHARSET_DETECTORA ready-to-use instance ofKeywordBasedDetector("@charset")(CSS stylesheets).static LoadText.EmacsStyleDetectorEMACS_STYLE_DETECTORA ready-to-use instance ofLoadText.EmacsStyleDetector.static LoadText.HTMLCharsetDetectorHTML_CHARSET_DETECTORA ready-to-use instance ofLoadText.HTMLCharsetDetector.static LoadText.XMLEncodingDetectorXML_ENCODING_DETECTORA ready-to-use instance ofLoadText.XMLEncodingDetector.
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static StringcheckEncoding(String encoding)Returns the canonical name ofencodingif valid;nullotherwise.static ReadercreateReader(InputStream in, String fallbackEncoding, String[] encoding, LoadText.EncodingDetector... detectors)Creates a reader which can be used to read the contents of specified text source.static StringdetectEncoding(byte[] bytes, int byteCount, int[] bomLength, LoadText.EncodingDetector... detectors)Detect encoding by examining specified bytes which have been read at the very start of a text file.static LoadText.EncodingguessEncoding(byte[] bytes, int offset, int length)Guess the encoding of a text file by examining its first few bytes.static StringloadChars(Reader in)Load the characters contained in specified source.static StringloadText(File file, String fallbackEncoding, String[] encoding, LoadText.EncodingDetector... detectors)static StringloadText(InputStream in, String fallbackEncoding, String[] encoding, LoadText.EncodingDetector... detectors)Loads the contents of specified text source.static StringloadText(URL url, boolean followRedirects, int timeout, String fallbackEncoding, String[] encoding, LoadText.EncodingDetector... detectors)Loads the contents of specified text file.static StringloadText(URL url, String fallbackEncoding, String[] encoding, LoadText.EncodingDetector... detectors)
-
-
-
Field Detail
-
XML_ENCODING_DETECTOR
public static final LoadText.XMLEncodingDetector XML_ENCODING_DETECTOR
A ready-to-use instance ofLoadText.XMLEncodingDetector.
-
CSS_CHARSET_DETECTOR
public static final LoadText.KeywordBasedDetector CSS_CHARSET_DETECTOR
A ready-to-use instance ofKeywordBasedDetector("@charset")(CSS stylesheets).
-
HTML_CHARSET_DETECTOR
public static final LoadText.HTMLCharsetDetector HTML_CHARSET_DETECTOR
A ready-to-use instance ofLoadText.HTMLCharsetDetector.
-
EMACS_STYLE_DETECTOR
public static final LoadText.EmacsStyleDetector EMACS_STYLE_DETECTOR
A ready-to-use instance ofLoadText.EmacsStyleDetector.
-
ALL_ENCODING_DETECTORS
public static final LoadText.EncodingDetector[] ALL_ENCODING_DETECTORS
A ready-to-use array containing allLoadText.EncodingDetectors.
-
BOM_UTF16_BE
public static final byte[] BOM_UTF16_BE
TheUTF-16BEBOM (Byte Order Mark).
-
BOM_UTF16_LE
public static final byte[] BOM_UTF16_LE
TheUTF-16LEBOM (Byte Order Mark).
-
BOM_UTF8
public static final byte[] BOM_UTF8
TheUTF-8BOM (Byte Order Mark).
-
-
Method Detail
-
loadText
public static String loadText(File file, String fallbackEncoding, String[] encoding, LoadText.EncodingDetector... detectors) throws IOException
- Throws:
IOException
-
loadText
public static String loadText(URL url, String fallbackEncoding, String[] encoding, LoadText.EncodingDetector... detectors) throws IOException
- Throws:
IOException
-
loadText
public static String loadText(URL url, boolean followRedirects, int timeout, String fallbackEncoding, String[] encoding, LoadText.EncodingDetector... detectors) throws IOException
Loads the contents of specified text file.- Parameters:
url- the location of the text file.followRedirects- iftrue, follow redirections, ("301: Moved Permanently", "302: Temporary Redirect") including very commonhttptohttpsones. No effect unlessurlis anhttp/httpsURL.timeout- specifies both connect and read timeout values in milliseconds. 0 means: infinite timeout. A negative value means: default value.fallbackEncoding- the fallback encoding. May benullin which case a sensible value (generallySystemUtil.defaultEncoding) is automatically determined.encoding- the encoding actually used to load the text is copied there. May benull.detectors- unless a BOM is found, specified encoding detectors are used to parse the first lines of the file, possibly containing an encoding specification like@charset "UTF-8";.- Returns:
- the contents of the text file
- Throws:
IOException- if there is an I/O problem
-
loadText
public static String loadText(InputStream in, String fallbackEncoding, String[] encoding, LoadText.EncodingDetector... detectors) throws IOException
Loads the contents of specified text source.This method implements the detection of the encoding.
Note that the detection of the encoding always works because it uses a fallback value.
- Parameters:
in- the text source.fallbackEncoding- the fallback encoding. May benullin which case a sensible value (generallySystemUtil.defaultEncoding) is automatically determined.encoding- the encoding actually used to load the text is copied there. May benull.detectors- unless a BOM is found, specified encoding detectors are used to parse the first lines of the file, possibly containing an encoding specification like@charset "UTF-8";.- Returns:
- the contents of the text source
- Throws:
IOException- if there is an I/O problem
-
loadChars
public static String loadChars(Reader in) throws IOException
Load the characters contained in specified source.- Parameters:
in- the character source- Returns:
- the contents of the character source
- Throws:
IOException- if there is an I/O problem
-
createReader
public static Reader createReader(InputStream in, String fallbackEncoding, String[] encoding, LoadText.EncodingDetector... detectors) throws IOException
Creates a reader which can be used to read the contents of specified text source.- Parameters:
in- the text source.fallbackEncoding- the fallback encoding. May benullin which case a sensible value (generallySystemUtil.defaultEncoding) is automatically determined.encoding- the encoding actually used to load the text is copied there. May benull.detectors- unless a BOM is found, specified encoding detectors are used to parse the first lines of the file, possibly containing an encoding specification like@charset "UTF-8";.- Returns:
- a reader allowing to read the contents of the text source. This reader will automatically skip the BOM if any.
- Throws:
IOException- if there is an I/O problem
-
checkEncoding
public static final String checkEncoding(String encoding)
Returns the canonical name ofencodingif valid;nullotherwise.
-
detectEncoding
public static String detectEncoding(byte[] bytes, int byteCount, int[] bomLength, LoadText.EncodingDetector... detectors)
Detect encoding by examining specified bytes which have been read at the very start of a text file.- Parameters:
bytes- bytes read at the beginning of a text file.byteCount- number of bytes read at the beginning of a text file.bomLength- the length of the BOM is stored as the first element of this array. This allows to skip the BOM. May benull.detectors- unless a BOM is found, specified encoding detectors are used to parse the first lines of the file, possibly containing an encoding specification like@charset "UTF-8";.- Returns:
- encoding if detected;
nullotherwise
-
guessEncoding
public static LoadText.Encoding guessEncoding(byte[] bytes, int offset, int length)
Guess the encoding of a text file by examining its first few bytes.- Parameters:
bytes- byte buffer.offset- byte buffer offset.length- byte buffer length. At least 4 for this function to work.- Returns:
- encoding if detected;
Encoding.UNKNOWNotherwise
-
-