<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Java on ln --help</title>
    <link>https://blog.mei-home.net/tags/java/</link>
    <description>Recent content in Java on ln --help</description>
    <generator>Hugo -- 0.147.2</generator>
    <language>en</language>
    <lastBuildDate>Sun, 07 Jun 2026 00:25:36 +0200</lastBuildDate>
    <atom:link href="https://blog.mei-home.net/tags/java/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Yacy Part 1: Deployment</title>
      <link>https://blog.mei-home.net/posts/yacy-part-1-deployment/</link>
      <pubDate>Sun, 07 Jun 2026 00:25:36 +0200</pubDate>
      <guid>https://blog.mei-home.net/posts/yacy-part-1-deployment/</guid>
      <description>Deploying the YaCy search engine</description>
      <content:encoded><![CDATA[<p>Welcome to the newest rabbit hole I&rsquo;ve found myself in. This post starts a new
series where I&rsquo;m taking a look at the <a href="https://yacy.net/">YaCy</a>
self-hosted, distributed peer to peer search engine. And probably web crawling
and search ranking algorithms.</p>
<p>In this post, I will concentrate on how I deployed YaCy into my Kubernetes
cluster, and a few pieces about my first steps with it. You won&rsquo;t find answers
to questions like &ldquo;how good is it as a Google replacement?&rdquo; in this post. There&rsquo;s
a lot more work ahead for me to actually make that judgment.</p>
<p>You can find this post and any future ones in the series under the <a href="https://blog.mei-home.net/tags/series-yacy/">YaCy tag</a>.</p>
<p>A fair warning before I continue: The project accepts slopcoded contributions.</p>
<p>It currently doesn&rsquo;t look like there&rsquo;s a large team behind it, but there&rsquo;s a
<a href="https://community.searchlab.eu/">community forum</a> with some activity, although
new signups are currently broken due to an issue with the mail server.
The last release was in March, and PRs are regularly getting reviewed and merged.</p>
<h2 id="what-is-yacy">What is YaCy?</h2>
<p>While I&rsquo;ve said this post is mostly about deployment, it&rsquo;s probably a good idea
to tell you all a bit about what YaCy is, so you know whether you actually want
to read on.</p>
<p>YaCy is a self-hosted, peer to peer search engine. It has entirely its own index
and does not rely on the likes of Bing or Google. On its main page, it presents
a simple search mask:</p>
<figure>
    <img loading="lazy" src="search-root.png"
         alt="A screenshot of YaCy&#39;s homepage. In the middle, it shows the YaCy logo, and beneath it an individualized greeting reading &#39;Meiers Search&#39;, which I gave it. Below that is the search mask, with an input field and a search button next to it. Below it are two radio buttons, one for text, one for images, and a link to &#39;more options&#39;. In the top left corner of the page is a login button. In the top right button is a dropdown called &#39;Search Interfaces&#39;, a button with a question mark containing links to the docs and bug tracker and finally a button labeled &#39;Administration&#39; which gets us to the admin pages."/> <figcaption>
            <p>YaCy&rsquo;s main search page.</p>
        </figcaption>
</figure>

<p>When searching, the results page should also be pretty familiar:
<figure>
    <img loading="lazy" src="search-example.png"
         alt="A screenshot of YaCy&#39;s search results page. At the very top is search field, showing the query &#39;migrating from nomad to k8s&#39;, with a &#39;search again&#39; button next to it. On the left side is a separate area with options for the search. At the top, the user can switch between &#39;Peer-to-Peer&#39; and &#39;Privacy&#39;, and below it are a few options for the search ranking. Current selected are &#39;Context Ranking&#39; and &#39;Documents&#39;. Other options are &#39;Sort by Date&#39;, &#39;Images&#39;, and a choice to filter for only &#39;http&#39; or only &#39;https&#39; pages. Below that is a small word cloud, containing related words which don&#39;t show up in the search query yet but are related, like &#39;github&#39;, &#39;ubuntu&#39; or &#39;cncf&#39;. Then follow some more filtering options in dropdowns. The first one is &#39;Domain&#39;, which allows filtering by specific domains to search. Next is &#39;Authors&#39;, then &#39;Filetype&#39; and finally Language. Next back to the main area, which at the top contains some general infos about the search. It shows 178k results for the search, with 178k from local and 74 from remote sources, specifically 13 YaCy peers. Then there&#39;s finally the search results themselves. They very much look like Google many, many years ago. First comes the title of the page, followed by the full link. Then comes the last modified date of the page and a link to citations. Only ten results are shown on the page, but at the bottom are buttons to show the next pages of results. The results themselves mostly show posts to this blog&#39;s Nomad to k8s series, see link in the main text. Besides that are also articles from dev.to about migrating applications from VMs to k8s as well as an Ubuntu docs page with the title &#39;Migrating From The Livepatch Machine Charm to the K8s Charm&#39;."/> <figcaption>
            <p>YaCy search result example</p>
        </figcaption>
</figure>
</p>
<p>This result was a bit unexpected right now. I hadn&rsquo;t actually crawled my own blog
yet, and it still found my posts. Looks like somebody has been pointing a YaCy
crawl at it at some point. So this is actually what I would call a good-ish search
result. It mostly found a series of blog posts about exactly what I was interested
in - <a href="https://blog.mei-home.net/tags/k8s-migration/">migrating from Nomad to Kubernetes</a>,
plus a few other results also related to migrating from something to Kubernetes.</p>
<p>The way YaCy works is that there is an index held locally in an embedded
<a href="https://solr.apache.org/">Apache Solr</a> instance, which is also used for searching.
This search index is filled by the instance&rsquo;s own crawling of websites. I will
go into more detail on crawling in a future post. By the way, if you&rsquo;ve got any
good blog posts or articles which explain how web crawling works these days, what
to look out for and how to behave properly, I&rsquo;d be very happy to hear about them,
for example <a href="@mmeier@social.mei-home.net">via the Fediverse</a>.</p>
<p>The P2P aspect of YaCy is used in two different ways. The first one is during
searching. In the screenshot above, on the left side, you can choose between
&lsquo;Peer-to-Peer&rsquo; or &lsquo;Privacy&rsquo; mode. Privacy mode here means to only search the local
instance&rsquo;s index. The Peer-to-Peer mode searches the local index and goes out to
other instances to do a remote search. The second way is via a constant gossip
protocol which exchanges pieces of the local index with other instances, both
sending and receiving. This is always ongoing in the background, without user
intervention. This way, you will end up with a lot more entries in your local
index than just what you yourself crawled, and the remote search adds to that
on top.</p>
<p>I&rsquo;m also of a mind to look into this a bit more deeply, because the official docs
and what exactly is exchanged is not too detailed, and I want to look at the code
a bit more.</p>
<p>Let&rsquo;s end this section with a bit of a general vibe: The project does work. I
do get search results from pages I&rsquo;ve crawled myself, and I&rsquo;m also getting results
for pages which I definitely have not crawled myself. The network is active,
showing about 600 peers seen over the past week, and I&rsquo;m getting quite a few
remote searches in. I&rsquo;ve also had some success with a few of my searches. The
example above was a pleasant surprise, getting served my own blog for a relevant
search query. But there have also been other queries which were not too useful.
I&rsquo;ve for example just done a quick search for <a href="https://cloudnative-pg.io/">CloudNativePG</a>.
This did show CNPG&rsquo;s GitHub page, but the home page was not in the index at all.</p>
<p>There are in the main two areas I will want to research more deeply. One being
crawling. Most important to me is to make sure that YaCy&rsquo;s crawler really
respects all the rules around web crawling, like respecting robots.txt and
keeping the per-site request rates low. I think it already does that, but I will
need some testing. Then there&rsquo;s the question of what to crawl? How deep to crawl?
What&rsquo;s the right way to get a breadth-first crawl going, instead of just indexing
pages I already know? But without filling the index with too much garbage?</p>
<p>Then there&rsquo;s search ranking. It doesn&rsquo;t come out in the example search from
above, but the ranking is really not great sometimes. But it&rsquo;s also highly configurable.
And there is an extended ranking called CitationRank, similar to Google&rsquo;s
PageRank. I really want to dig into that and how it&rsquo;s implemented.</p>
<p>One nice thing to note: YaCy implements the necessary APIs to be used as a
search provider in Firefox.</p>
<h2 id="deploying-yacy">Deploying YaCy</h2>
<p>YaCy is a Java application and comes with multiple ways of deploying it, both
with and without Docker. It has a few warts, though.</p>
<p>Before I get to my Kubernetes deployment, a quick note: You can also run it
locally on your own desktop machine. It works perfectly nice there, even without
an externally open port. You won&rsquo;t be fully participating in the P2P network,
but you will be able to do remote searches. And when you&rsquo;re triggering crawls,
your resulting index will even be shared with other peers. But your instance
won&rsquo;t be serving other peer&rsquo;s remote searches, and you won&rsquo;t be able to receive
index updates from other peers via the background gossip protocol.</p>
<p>Let&rsquo;s start with the Docker images. At the moment, the newest versioned releases
for the Docker image <a href="https://hub.docker.com/r/yacy/yacy_search_server/tags">on Dockerhub</a>
are from 12 months ago, even though there was a YaCy release in March. The only
current images are in the <code>latest</code> tag, which I don&rsquo;t really like. So my first
step was building the YaCy image myself. Here is the Containerfile:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-Dockerfile" data-lang="Dockerfile"><span style="display:flex;"><span><span style="color:#75715e">## builder image</span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#66d9ef">ARG</span> alpine_ver<span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#66d9ef">ARG</span> jdk_ver<span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#66d9ef">ARG</span> wkhtmltopdf_ver<span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#66d9ef">FROM</span><span style="color:#e6db74"> eclipse-temurin:${jdk_ver}-jdk-alpine-${alpine_ver} AS builder</span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#75715e"># Install needed packages not in base image</span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#66d9ef">RUN</span> apk add --no-cache curl git apache-ant<span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#75715e"># set current working dir &amp; copy sources</span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#66d9ef">WORKDIR</span><span style="color:#e6db74"> /opt</span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#66d9ef">COPY</span> . /opt/yacy_search_server/<span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#66d9ef">RUN</span> ant compile -f /opt/yacy_search_server/build.xml <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>    <span style="color:#f92672">&amp;&amp;</span> rm -fr /opt/yacy_search_server/.git<span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#75715e"># Set initial admin password: &#34;yacy&#34; (encoded with custom yacy md5 function net.yacy.cora.order.Digest.encodeMD5Hex())</span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#66d9ef">RUN</span> sed -i <span style="color:#e6db74">&#34;/adminAccountBase64MD5=/c\adminAccountBase64MD5=MD5:8cffbc0d66567a0987a4aba1ec46d63c&#34;</span> /opt/yacy_search_server/defaults/yacy.init <span style="color:#f92672">&amp;&amp;</span> <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>	sed -i <span style="color:#e6db74">&#34;/adminAccountForLocalhost=/c\adminAccountForLocalhost=false&#34;</span> /opt/yacy_search_server/defaults/yacy.init <span style="color:#f92672">&amp;&amp;</span> <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>	sed -i <span style="color:#e6db74">&#34;/server.https=false/c\server.https=true&#34;</span> /opt/yacy_search_server/defaults/yacy.init<span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#75715e">## build final image</span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#66d9ef">FROM</span><span style="color:#e6db74"> surnet/alpine-wkhtmltopdf:${wkhtmltopdf_ver} AS wkhtmltopdf</span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#66d9ef">FROM</span><span style="color:#e6db74"> eclipse-temurin:${jdk_ver}-jre-alpine-${alpine_ver} AS app</span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#66d9ef">RUN</span> apk add --no-cache <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>	imagemagick <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>	xvfb <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>	ghostscript <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>	<span style="color:#75715e"># Install dependencies for wkhtmltopdf</span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>	libstdc++ <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>	libx11 <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>	libxrender <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>	libxext <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>	libssl3 <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>	ca-certificates <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>	fontconfig <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>	freetype <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>	ttf-dejavu <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>	ttf-droid <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>	ttf-freefont <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>	ttf-liberation <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>	<span style="color:#75715e"># more fonts</span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>	<span style="color:#f92672">&amp;&amp;</span> apk add --no-cache --virtual .build-deps <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>	msttcorefonts-installer <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>	<span style="color:#75715e"># Install microsoft fonts</span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>	<span style="color:#f92672">&amp;&amp;</span> update-ms-fonts <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>	<span style="color:#f92672">&amp;&amp;</span> fc-cache -f <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>	<span style="color:#75715e"># Clean up when done</span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>	<span style="color:#f92672">&amp;&amp;</span> rm -rf /tmp/* <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>	<span style="color:#f92672">&amp;&amp;</span> apk del .build-deps<span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#75715e"># Copy wkhtmltopdf files from docker-wkhtmltopdf image</span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#66d9ef">COPY</span> --from<span style="color:#f92672">=</span>wkhtmltopdf /bin/wkhtmltopdf /bin/wkhtmltopdf<span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#75715e"># copy YaCy to app image</span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#66d9ef">RUN</span> addgroup yacy <span style="color:#f92672">&amp;&amp;</span> adduser -S -G yacy -H -D yacy<span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#66d9ef">WORKDIR</span><span style="color:#e6db74"> /opt</span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#66d9ef">COPY</span> --chown<span style="color:#f92672">=</span>yacy:yacy --from<span style="color:#f92672">=</span>builder /opt/yacy_search_server /opt/yacy_search_server<span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#75715e"># Expose HTTP and HTTPS default ports</span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#66d9ef">EXPOSE</span><span style="color:#e6db74"> 8090 8443</span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#75715e"># Set data volume: yacy data and configuration will persist even after container stop or destruction</span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#66d9ef">VOLUME</span> [<span style="color:#e6db74">&#34;/opt/yacy_search_server/DATA&#34;</span>]<span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#75715e"># Next commands run as yacy as non-root user for improved security</span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#66d9ef">USER</span><span style="color:#e6db74"> yacy</span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#75715e"># Start yacy as a foreground process (-f) to display console logs and to wait for yacy process</span><span style="color:#960050;background-color:#1e0010">
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span><span style="color:#66d9ef">CMD</span> [<span style="color:#e6db74">&#34;/bin/sh&#34;</span>,<span style="color:#e6db74">&#34;/opt/yacy_search_server/startYACY.sh&#34;</span>,<span style="color:#e6db74">&#34;-f&#34;</span>]<span style="color:#960050;background-color:#1e0010">
</span></span></span></code></pre></div><p>It&rsquo;s a light adaption of the <a href="https://github.com/yacy/yacy_search_server/blob/master/docker/Dockerfile.alpine">official Alpine Dockerfile</a>,
with the only change being that I introduced configurable versions for the JDK,
Alpine and other tooling. I build this via my internal pipeline. If you&rsquo;re
interested, have a look at <a href="https://blog.mei-home.net/posts/improving-container-image-build-perf-with-buildah/">this post</a>.</p>
<p>With that done, I could create the Kubernetes deployment:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">apiVersion</span>: <span style="color:#ae81ff">apps/v1</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">kind</span>: <span style="color:#ae81ff">Deployment</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">name</span>: <span style="color:#ae81ff">yacy</span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">replicas</span>: <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">selector</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">matchLabels</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">homelab/app</span>: <span style="color:#ae81ff">yacy</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">strategy</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">type</span>: <span style="color:#e6db74">&#34;Recreate&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">template</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">metadata</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">labels</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">homelab/app</span>: <span style="color:#ae81ff">yacy</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">spec</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">automountServiceAccountToken</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">securityContext</span>:
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">fsGroup</span>: <span style="color:#ae81ff">1000</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">runAsNonRoot</span>: <span style="color:#66d9ef">true</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">runAsUser</span>: <span style="color:#ae81ff">100</span>
</span></span><span style="display:flex;"><span>        <span style="color:#f92672">runAsGroup</span>: <span style="color:#ae81ff">1000</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">containers</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">yacy</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">securityContext</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">allowPrivilegeEscalation</span>: <span style="color:#66d9ef">false</span>
</span></span><span style="display:flex;"><span>            <span style="color:#75715e"># Can&#39;t be done because htroot/ is written to and is outside DATA/ dir</span>
</span></span><span style="display:flex;"><span>            <span style="color:#75715e"># At the same time, this dir contains files already, so can&#39;t just be remapped</span>
</span></span><span style="display:flex;"><span>            <span style="color:#75715e">#readOnlyRootFilesystem: true</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">capabilities</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">drop</span>:
</span></span><span style="display:flex;"><span>                - <span style="color:#ae81ff">ALL</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">image</span>: <span style="color:#ae81ff">containers.homelab.example/homelab/yacy:{{ .Values.appVersion }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">volumeMounts</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">data</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">mountPath</span>: {{ <span style="color:#ae81ff">.Values.mountDir }}</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">resources</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">limits</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">2000m</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">3200M</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">requests</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">cpu</span>: <span style="color:#ae81ff">2000m</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">memory</span>: <span style="color:#ae81ff">3200M</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">env</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">YACY_PORT</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;{{ .Values.ports.bind }}&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">YACY_PORT_PUBLIC</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;{{ .Values.ports.public }}&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">YACY_STATICIP</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;{{ .Values.domain }}&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">YACY_JAVASTART_XMX</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;Xmx3000m&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">YACY_UPNP_ENABLED</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;false&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">YACY_SERVER_HTTPS</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;false&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">YACY_NETWORK_UNIT_PROTOCOL_HTTPS_PREFERRED</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;true&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">YACY_UPDATE_PROCESS</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;manual&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">YACY_PROMOTESEARCHPAGEGREETING</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;Meier&#39;s Search&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">YACY_PROXYCLIENT</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">YACY_ADMINACCOUNTFORLOCALHOST</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;false&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">YACY_SCAN_ENABLED</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;false&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">YACY_BROWSERPOPUPTRIGGER</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;false&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">YACY_TRAY_ICON_ENABLED</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;false&#34;</span>
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">YACY_NETWORK_UNIT_AGENT</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">value</span>: <span style="color:#e6db74">&#34;mei-home-search&#34;</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">livenessProbe</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">httpGet</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">port</span>: {{ <span style="color:#ae81ff">.Values.ports.bind }}</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">path</span>: <span style="color:#e6db74">&#34;/&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">initialDelaySeconds</span>: <span style="color:#ae81ff">30</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">periodSeconds</span>: <span style="color:#ae81ff">30</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">startupProbe</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">httpGet</span>:
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">port</span>: {{ <span style="color:#ae81ff">.Values.ports.bind }}</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">path</span>: <span style="color:#e6db74">&#34;/&#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">periodSeconds</span>: <span style="color:#ae81ff">10</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">failureThreshold</span>: <span style="color:#ae81ff">24</span>
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">initialDelaySeconds</span>: <span style="color:#ae81ff">60</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">ports</span>:
</span></span><span style="display:flex;"><span>            - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">yacy-http</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">containerPort</span>: {{ <span style="color:#ae81ff">.Values.ports.bind }}</span>
</span></span><span style="display:flex;"><span>              <span style="color:#f92672">protocol</span>: <span style="color:#ae81ff">TCP</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">volumes</span>:
</span></span><span style="display:flex;"><span>        - <span style="color:#f92672">name</span>: <span style="color:#ae81ff">data</span>
</span></span><span style="display:flex;"><span>          <span style="color:#f92672">persistentVolumeClaim</span>:
</span></span><span style="display:flex;"><span>            <span style="color:#f92672">claimName</span>: <span style="color:#ae81ff">yacy-volume</span>
</span></span></code></pre></div><p>So there&rsquo;s a few things to say about the config. First, note that the Container
cannot be configured with <code>readOnlyRootFilesystem: false</code>. This is because files
are written to directories which already contain other files as part of the
image, so it can&rsquo;t be re-mounted.</p>
<p>Another thing worth mentioning is the rather long startup probe duration. This
is due to the fact that sometimes, YaCy does some cleanup/compression during
startup. Especially after a crash instead of a proper shutdown. This can take
quite a while, especially when you&rsquo;re working with a Pi 4 class CPU.</p>
<p>Then there&rsquo;s a general problem with the setup of the environment variables.
The issue is that the env variables correspond to settings in the YaCy config
file. To set those via env variables, you have to prepend <code>YACY_</code> to the front
of the config key&rsquo;s name and also upper case it and replace <code>.</code> with <code>_</code>. This
works for values like <code>some.setting.name</code>, but fails for settings like <code>someSettingName</code>.
That&rsquo;s due to how a check to see whether a config key corresponding to the
env variable name exists works, lower casing the whole env variable name and
then searching for it in the configs.
See <a href="https://github.com/yacy/yacy_search_server/issues/794">this issue</a> and my
accompanying fix <a href="https://github.com/yacy/yacy_search_server/pull/797">here</a>.</p>
<p>Also note that the <code>YACY_PORT_PUBLIC</code> setting is not currently supported upstream,
it&rsquo;s a fix for an issue I&rsquo;ve discovered earlier. I will go into more detail
in the next section.</p>
<p>The <code>YACY_PORT</code> and <code>YACY_PORT_PUBLIC</code> settings set the port YaCy binds to and
the port it communicates to other peers for connections, respectively. I&rsquo;m
disabling UPnP as well as HTTPS, as I don&rsquo;t have UPnP enabled in my firewall
and YaCy is fronted by a reverse proxy terminating TLS. The <code>YACY_STATICIP</code>
setting defines the address reported to other peers trying to contact this
one. Despite its name, it also happily takes a domain, not just an IP.</p>
<p>The <code>YACY_PROMOTESEARCHPAGEGREETING</code>
setting configures the subheading shown under the YaCy logo on the search page.</p>
<p><code>YACY_ADMINACCOUNTFORLOCALHOST</code> is an important setting. By default, YaCy
launches with a configuration which allows all connections coming from local to
do anything they want without any further authentication. Not a good default in
my view, but likely intended to make it easier to handle when running an instance
locally.</p>
<p><code>YACY_SCAN_ENABLED</code> is disabled here, as that setting would scan the local network
for other YaCy instances. Not useful, as this is the only instance I&rsquo;m running.
Well, right now at least.</p>
<p>Both <code>YACY_BROWSERPOPUPTRIGGER</code> and <code>YACY_TRAY_ICON_ENABLED</code> are irrelevant for
deployments on Kubernetes, as they enable features only useful for local desktop
deployments.</p>
<p>And finally, <code>YACY_NETWORK_UNIT_AGENT</code> defines the name of the peer in the YaCy P2P
network. If not given, a name will be generated randomly.</p>
<p>One note on the volume I&rsquo;m attaching here: This is a Ceph RBD from an SSD pool.
I figured that Solr would likely not be happy with a network attached HDD
volume. I haven&rsquo;t had any performance or stability issues with this.</p>
<p>It&rsquo;s also worth noting that YaCy can be configured via a config file as well,
but it&rsquo;s not really cloud native. There is the <a href="https://github.com/yacy/yacy_search_server/blob/Release_1.941/defaults/yacy.init">yacy.init</a>
file to begin with. During first startup, that&rsquo;s copied to get the initial
config file. It should work perfectly well to override this with e.g. a ConfigMap
to change some settings.
Any changes after that initial setup are more complicated without using env
variables though. That&rsquo;s because the real config file under <code>DATA/SETTINGS/yacy.conf</code>
is also written to by YaCy when changes are made via the web UI. Those would be
lost upon restart with a ConfigMap.</p>
<h2 id="external-accessibility-and-peering">External accessibility and peering</h2>
<p>Before getting into the details, it&rsquo;s worth noting that YaCy has different peer
levels an instance can have, ranging from &ldquo;Virgin&rdquo; (yes, I know &#x1f614;), where
it hasn&rsquo;t had contact to any outside instance, to &ldquo;Principal&rdquo;. Virgin instances
are for example instances which are really only for local use, e.g. providing
search for an intranet and its sites only. In this mode, the instance doesn&rsquo;t
connect to any other peers and doesn&rsquo;t participate in the global search index.</p>
<p>The next level is &ldquo;Junior&rdquo;. Here, the instance can connect to external peers
at least outgoing. This allows usage of P2P search and outgoing transfers of
pieces of the local index, but not receipt of index pieces from other peers.</p>
<p>Next comes the &ldquo;Senior&rdquo; mode. This is what my instance is currently running in.
It means full participation in the P2P index, being able to be contacted by
external peers. If you&rsquo;re curious, my instance is <code>mei-home-search</code>.</p>
<p>The final level, &ldquo;Principal&rdquo;, is a Senior instance which also provides an initial
seed list. Some of those are hardcoded into the YaCy binary as a starting point
for new instances. This is only required during initial setup. Afterwards, each
instance keeps its own seed list and uses that after restarts.</p>
<p>To take part in the peer to peer aspect of YaCy, external peers need to have
access to my instance. I was a bit apprehensive about just hanging the entire
thing out in public. But I found that just making the <code>/yacy/</code> path available
seems to be enough to make peering work.</p>
<p>So the next thing to look at is the address and port YaCy hands to other peers
for the P2P connection. Here, the YaCy docs in the <a href="https://github.com/yacy/yacy_search_server/blob/Release_1.941/defaults/yacy.init">yacy.init file</a>
are a bit confusing and don&rsquo;t really work, at least for me. They document three
different ports:</p>
<pre tabindex="0"><code># port number where the server should bind to
port = 8090
[...]
#sometimes you may want yacy to bind to another port, than the one reachable from outside.
#then set bindPort to the port yacy should bind on, and port to the port, visible from outside
#to run yacy on port 8090, reachable from port 80, set bindPort=8090, port=80 and use
#iptables -t nat -A PREROUTING -p tcp -s 192.168.24.0/16 --dport 80 -j DNAT --to 192.168.24.1:8090
#(of course you need to customize the ips)
bindPort =
[...]
#publicPort if you use a different port to access YaCy than the one it listens on, you can use this setting
publicPort=
</code></pre><p>The <code>port</code> config is the expected configuration for the port YaCy actually binds
to. Reading the <code>bindPort</code> config, you might expect that you could set the
<code>bindPort</code> instead, and then YaCy would bind to that and only set the <code>port</code> as
the port communicated to external peers for connections. I tried it with a setting
like this:</p>
<pre tabindex="0"><code>port = 443
bindPort = 8090
</code></pre><p>This lead to errors during startup, because now YaCy was trying to bind to <code>443</code>,
which failed because it&rsquo;s not running as root.
After some searching, I found that <code>bindPort</code> doesn&rsquo;t show up anywhere in the
code. It seems to simply be unused. I&rsquo;ve created <a href="https://github.com/yacy/yacy_search_server/pull/793">a PR</a>
to remove it.</p>
<p>Then there&rsquo;s the <code>publicPort</code> setting. I couldn&rsquo;t use it via env variables due
to the aforementioned issues with camelCase settings. But I tried setting it
through the UI as well as manually editing the config file. Neither worked.
External requests still shattered on my firewall, trying to access port <code>8090</code>,
or any other port I set in the <code>port</code> setting. But what I wanted here was a way
to set the listening port of YaCy itself separate from the port that YaCy tells
other peers to connect to. I also didn&rsquo;t want to set the <code>port</code> setting to <code>443</code>,
because that would have meant extended permissions for the YaCy container.</p>
<p>I could have opened port 8090, but I also didn&rsquo;t want to do that. I already have
ports 80 and 443 open and wanted to use them. So I looked into the code instead.
See <a href="https://github.com/yacy/yacy_search_server/issues/791">this issue</a> and the
<a href="https://github.com/yacy/yacy_search_server/pull/792">accompanying pull request</a>.
With that (as of yet unmerged) PR, there is now a new <code>port.public</code> setting, which
only configures which port is send to other peers for external connections.</p>
<p>With that set to <code>80</code>, I was hoping everything to work now. But other peers were still
unable to reach mine. This time though, the issue was entirely of my own making.
In my Bastion Traefik, I had two open ports, one NAT&rsquo;ed to my external port 80 and
one to external 443. But to again keep permissions for that Traefik instance
restricted, those ports on the bastion host were not 80 and 443, but higher
ports. But I used Traefik&rsquo;s <a href="https://doc.traefik.io/traefik/reference/install-configuration/entrypoints/#opt-http-redirections-entryPoint-to">entrypoint redirection</a> to point the HTTP entrypoint to the HTTPS entrypoint. This,
of course, did never actually work. As this setting would reply to any request
to the HTTP port with a permanent redirect. But not to port 443, but to the port
where the internal HTTPS endpoint was listening. Which isn&rsquo;t accessible publicly.</p>
<p>That took me quite a while to figure out.</p>
<p>But once I had finally configured that correctly, my peer started peering properly.</p>
<p>There&rsquo;s still something hinky though. YaCy regularly reports the peering status
in the logs, here&rsquo;s an example from my peer:</p>
<pre tabindex="0"><code>PeerPing: I am accessible for 31 peer(s), not accessible for 24 peer(s).
</code></pre><p>So peering definitely works for some peers. I know that because I&rsquo;m receiving
remote search queries with no issue. But some other peers still cannot connect to me.
And I just can&rsquo;t figure out why not. Something to look into at a later date.</p>
<h2 id="resource-consumption">Resource consumption</h2>
<p>Before I end this post, a short look at the resource needs is in order. My usage
has been rather restricted up to now, but I have done at least a few crawls already,
of a few random pages. I&rsquo;ve for example crawled <a href="kubernetes.io">kubernetes.io</a>,
the German newspaper <a href="faz.net">faz.net</a> and the official pages of a few cities
I&rsquo;ve lived in the past, just to get a feeling. In total, that lead to a
disk usage of about 13 GB. The top was at 16 GB, but I&rsquo;m not sure why it&rsquo;s suddenly
so much reduced.</p>
<p>When it comes to networking, the need is not too much. The few crawls I&rsquo;ve done
up to now haven&rsquo;t even saturated my &ldquo;my country is a bit shit at the internet&rdquo;
250 MBit/s connection.</p>
<p>Then there&rsquo;s CPU usage. Here is the usage of the YaCy Pod since its launch:
<figure>
    <img loading="lazy" src="cpu-usage.png"
         alt="A screenshot of a Grafana time series chart. It shows the YaCy container&#39;s CPU use, with core usage on the Y axis and time on the X axis, going from 00:00 on 2026-05-27 to 2026-06-06 at 14:00. For most of the time, the utilization moves in the band around 0.2 at most. There are two marked phases with higher use. The first one being from 2026-06-01 around 00:00 to 2026-06-02 around 00:00. In this phase, the utilization varies a lot, but stay above 1.4 and hovers around 2.1 for the most part. In the second phase, from 2026-06-04 00:00 to 08:30, it maxes out at 2.0, visibly throttled."/> <figcaption>
            <p>YaCy&rsquo;s CPU utilization.</p>
        </figcaption>
</figure>
</p>
<p>Note that that this is the Kubernetes way of measuring CPU, meaning a utilization
of 1.0 means one core fully used. Also, this is running on Raspberry Pi CM4. The
two phases with higher utilization than 0.2 were while I was running crawls.
Everything else is normal use, and the smaller/shorter spikes are restarts. So
CPU power seems to be mostly used during crawling and ingestion of the results
into the local index.</p>
<p>Here is the memory usage:
<figure>
    <img loading="lazy" src="mem-usage.png"
         alt="A screenshot of a Grafana time series chart. It shows the memory consumption in GB, ranging from 00:00 on 2026-05-27 to 14:00 at 2026-06-06. For the first few days, until 2026-05-30 around 17:00, it stayed somewhere around 400 MB to 600 MB. Afterwards, it slowly climbs up, until it reaches up to 3.80 to over 4 GB from 2026-06-01 00:00 to 2026-06-02 00:00. After that, it settles back down to around 1.6 GB, with variations of about 400 MB around that value. Towards the end of the chart, the variations become higher, now around over 1 GB, with the average being somewhere around 2 GB, but never reaching over 3GB."/> <figcaption>
            <p>YaCy&rsquo;s memory consumption.</p>
        </figcaption>
</figure>
</p>
<p>As is typical for a garbage collected language like Java, the memory consumption
fluctuates a lot. Once I had gathered a bit of an index through my first crawls,
the average consumption rose quite a bit. I only reached stability without OOM
after increasing the limit all the way up to 3 GB and setting the JVM&rsquo;s <code>Xmx</code>
option to 3000 MB. I expect to have to increase the volume once the size of the
local index increases when I crawl more sites.</p>
<p>After I&rsquo;ve used YaCy properly for a while, I will likely write up another post
on its scaling behavior, because I&rsquo;m rather curious about that. But for now it
seems to run quite happily in that 3 GB limit.</p>
<h2 id="whats-next">What&rsquo;s next</h2>
<p>The next part will be a deep dive into crawling. I&rsquo;ve already found that just
choosing a starting point and just doing a depth 3 crawl isn&rsquo;t really that great.
I&rsquo;ve also found that you pretty much need to first explore the page you&rsquo;re crawling
a bit. Two examples: The <a href="https://faz.net">faz.net</a> page has a gigantic
<code>/kaufkompass/</code> category full of products which is really not worth crawling.
And when crawling a GitHub project, you will likely want to exclude <code>/tree</code>,
<code>/commits</code> and <code>/blobs</code>.</p>
<p>There also seem to be features for importing e.g. <a href="https://en.wikipedia.org/wiki/ZIM_(file_format)">ZIM files</a>
or RSS feeds.</p>
<p>And there&rsquo;s the question of how to get better search result ranking.</p>
<p>And finally,
there&rsquo;s quite some interesting metrics available, like crawled pages, number of
peers, number of remote searches and so on. I&rsquo;m feeling very tempted to either
implement Prometheus metrics directly in YaCy, or writing an external exporter
which scrapes YaCy&rsquo;s existing APIs.</p>
]]></content:encoded>
    </item>
  </channel>
</rss>
