{"id":336,"date":"2008-07-12T21:29:00","date_gmt":"2008-07-12T19:29:00","guid":{"rendered":"https:\/\/bob-team.de\/wordpress\/?p=336"},"modified":"2008-07-12T21:38:37","modified_gmt":"2008-07-12T19:38:37","slug":"nutch-und-utf-8","status":"publish","type":"post","link":"https:\/\/bob-team.de\/wordpress\/2008\/07\/12\/nutch-und-utf-8\/","title":{"rendered":"Nutch und UTF-8"},"content":{"rendered":"<p><a href='https:\/\/bob-team.de\/wordpress\/wp-content\/uploads\/2008\/07\/nutch_utf8.png'><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/bob-team.de\/wordpress\/wp-content\/uploads\/2008\/07\/nutch_utf8.png\" alt=\"\" title=\"nutch_utf8\" width=\"425\" height=\"131\" class=\"alignnone size-full wp-image-337\" srcset=\"https:\/\/bob-team.de\/wordpress\/wp-content\/uploads\/2008\/07\/nutch_utf8.png 425w, https:\/\/bob-team.de\/wordpress\/wp-content\/uploads\/2008\/07\/nutch_utf8-300x92.png 300w\" sizes=\"auto, (max-width: 425px) 100vw, 425px\" \/><\/a><\/p>\n<p>Innerhalb der <em>Nutch<\/em>-Oberfl\u00e4che wird die GET Methode benutzt, um die Suchanfrage an den Server zu \u00fcbermitteln. Der <em>Tomcat<\/em> interpretiert die URL per Standard als ISO 8859-1. Dadurch gehen unter anderem deutsche Umlaute verloren.<\/p>\n<p><!--more--><\/p>\n<p>F\u00fchrt man <em>Nutch<\/em> in einer UTF-8 Umgebung aus, muss der <em>Connector<\/em>-Eintrag in der Datei <em>$TOMCAT\/conf\/server.xml<\/em> wie folgt angepasst werden. <\/p>\n<p><code>&lt;Connector<br \/>\n&nbsp;&nbsp;&nbsp;port=&quot;8080&quot;<br \/>\n&nbsp;&nbsp;&nbsp;redirectPort=&quot;8443&quot;<br \/>\n&nbsp;&nbsp;&nbsp;minSpareThreads=&quot;25&quot;<br \/>\n&nbsp;&nbsp;&nbsp;connectionTimeout=&quot;20000&quot;<br \/>\n&nbsp;&nbsp;&nbsp;maxSpareThreads=&quot;75&quot;<br \/>\n&nbsp;&nbsp;&nbsp;maxThreads=&quot;150&quot;<br \/>\n&nbsp;&nbsp;&nbsp;<font color=\"green\">URIEncoding=&quot;UTF-8&quot;<\/font>&gt;<br \/>\n&lt;\/Connector&gt;<\/code><\/p>\n<p>Weitere Infos im <a href=\"http:\/\/wiki.apache.org\/nutch\/GettingNutchRunningWithUtf8\"><em>Nutch-<\/em><\/a> und <a href=\"http:\/\/wiki.apache.org\/tomcat\/FAQ\/Connectors#Q8\"><em>Tomcat-Wiki<\/em><\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Innerhalb der Nutch-Oberfl\u00e4che wird die GET Methode benutzt, um die Suchanfrage an den Server zu \u00fcbermitteln. Der Tomcat interpretiert die URL per Standard als ISO 8859-1. Dadurch gehen unter anderem deutsche Umlaute verloren.<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[35,21],"class_list":["post-336","post","type-post","status-publish","format-standard","hentry","category-uncategorized","tag-nutch","tag-tomcat","entry"],"_links":{"self":[{"href":"https:\/\/bob-team.de\/wordpress\/wp-json\/wp\/v2\/posts\/336","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/bob-team.de\/wordpress\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/bob-team.de\/wordpress\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/bob-team.de\/wordpress\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/bob-team.de\/wordpress\/wp-json\/wp\/v2\/comments?post=336"}],"version-history":[{"count":0,"href":"https:\/\/bob-team.de\/wordpress\/wp-json\/wp\/v2\/posts\/336\/revisions"}],"wp:attachment":[{"href":"https:\/\/bob-team.de\/wordpress\/wp-json\/wp\/v2\/media?parent=336"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/bob-team.de\/wordpress\/wp-json\/wp\/v2\/categories?post=336"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/bob-team.de\/wordpress\/wp-json\/wp\/v2\/tags?post=336"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}