ENGEBA

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

stefan-it authored a paper about 1 month ago

SindBERT, the Sailor: Charting the Seas of Turkish NLP

stefan-it authored a paper about 2 months ago

The German Commons - 154 Billion Tokens of Openly Licensed Text for German Language Models

gaydmi authored a paper 3 months ago

Llama-GENBA-10B: A Trilingual Large Language Model for German, English and Bavarian

View all activity

stefan-it

authored a paper about 1 month ago

SindBERT, the Sailor: Charting the Seas of Turkish NLP

Paper • 2510.21364 • Published Oct 24 • 1

stefan-it

authored a paper about 2 months ago

The German Commons - 154 Billion Tokens of Openly Licensed Text for German Language Models

Paper • 2510.13996 • Published Oct 15 • 8

gaydmi

authored a paper 3 months ago

Llama-GENBA-10B: A Trilingual Large Language Model for German, English and Bavarian

Paper • 2509.05668 • Published Sep 6 • 5

stefan-it

authored a paper 3 months ago

Llama-GENBA-10B: A Trilingual Large Language Model for German, English and Bavarian

Paper • 2509.05668 • Published Sep 6 • 5

gaydmi

authored a paper 8 months ago

TUMLU: A Unified and Native Language Understanding Benchmark for Turkic Languages

Paper • 2502.11020 • Published Feb 16 • 8

stefan-it

posted an update 9 months ago

Post

3876

Wohoo 🥳 I have finished my 2025 GPU workstation build and I am very excited to train new awesome open source models on it.

I built my last GPU workstation 5 years ago featuring an AMD Ryzen 5900X, 64GB of G.SKILL Trident Z RGB on an ASRock X570 Taichi cooled by an Alphacool Eisbär 420. GPU was a Zotac RTX 3090 AMP Extreme. Unfortunately, I was never satisfied with the case - some Fractal Define 7, as it is definitely too small, airflow is not optimal as I had to open the front door all the time and it also arrived with a partly damaged side panel.

For my new build, I've used the following components: an outstanding new AMD Ryzen 9950X3D with 64GB of Corsair Dominator Titanium (what a name). As a huge Noctua fan - warm greetings to my Austrian neighbors - I am using the brand new Noctua NH-D15 G2 on an ASRock X870E Taichi in an amazing Lian Li LANCOOL III chassis. One joke that only NVIDIA Blackwell users will understand: you definitely need a tempered glass panel to check if your GPU cables/connectors start melting 😂 And the best is yet to come: I returned my previously bought Zotac RTX 5090 Solid to the eBay seller (because of... missing ROPs, only NVIDIA Blackwell users will again understand) and bought a Zotac 5090 AMP Extreme INFINITY (yes, the long name indicates that this is the flagship model from Zotac) from a more trustworthy source (NBB in Germany).

I am so happy to start training and fine-tuning new open source models - stay tuned!!!

2 replies

stefan-it

posted an update 9 months ago

Post

1031

🇹🇷 😍 I'm very happy to finally announce my new Turkish LM called "BERT5urk":

stefan-it/bert5urk

It is a 1.42B T5-based model, trained with UL2 pretraining objective on the Turkish part of the awesome HuggingFaceFW/fineweb-2 dataset.

Feel free to check it out!

1 reply

stefan-it

posted an update 10 months ago

Post

3302

After running some 3DMark and FurMark benchmarks on Windows to make sure that my new 5090 is not causing melting cables [1] and some nice shots with a thermal camera (I don't think that's too much), running some fine-tuning experiments with my favorite Flair & Transformers libraries are very easy to perform.

Important steps:

Good idea is to start with a fresh Ubuntu 24.04 installation with latest CUDA 12.8 and the open NVIDIA driver - follow more advices from [2]:

sudo apt -y install cuda-toolkit-12-8 nvidia-open

I tried update from an existing Ubuntu installation with an older CUDA and driver version and it resulted in a non-startable system.

If you are using PyTorch 2.6 with built CUDA 12.6 it will result in:

NVIDIA Graphics Device with CUDA capability sm_120 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_50 sm_60 sm_70 sm_75 sm_80 sm_86 sm_90.

But no worries! For PyTorch you need just to use a nightly 2.7 version that was built with CUDA 12.8. This can easily done via:

pip install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu128

After that the latest Flair version can be installed and fine-tuning will work!

References:

[1]: https://www.reddit.com/r/nvidia/comments/1inpox7/rtx_50_series_12vhpwr_megathread/
[2]: https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=24.04&target_type=deb_network

1 reply

stefan-it

posted an update 10 months ago

Post

5183

She arrived 😍

[Expect more models soon...]

2 replies

stefan-it

posted an update about 1 year ago

Post

1586

My latest project is the outcome of the last 2+ years working with TPUs from the amazing TPU Research Cloud (TRC) program and training Encoder-only LMs with the TensorFlow Model Garden library.

👉 Link: https://github.com/stefan-it/model-garden-lms

An overview of some features:

- Cheatsheet for setting-up a TPU VM Pod (with all necessary dependencies) to pretrain LMs with TF Model Garden
- Conversion scripts that convert TF Model Garden weights to Hugging Face Transformers-compatible models
- Supported architectures include BERT, BERT with Token Dropping and TEAMS

I also released BERT-based models pretrained on the great Hugging Face FineWeb and FineWeb-Edu datasets (10BT subset). With more to come!

👉 Model Hub Link:

model-garden-lms

If you find these resources useful, please give them a like!

Made from Bavarian Oberland with ❤️ and 🥨.

stefan-it

authored 6 papers over 2 years ago

AI & ML interests

Recent Activity

Team members 7

engeba's activity