Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I love SQLite, I love the idea, I love having something mature and lightweight, but like the author I discovered it's overkill or dare I say insufficient for certain use cases.

I was building a client side dictionary app with search functionality and thought that using sqlite's wasm port would be the perfect solution.

I used SQLite for a couple years and it started to wear on me. The database files were bigger than they should be, they didn't compress super well, the loading time was a bit slow. Linear search wasn't very fast either. Editing SQLite files and doing joins and stuff was slow and frustrating as well. I wanted something simpler.

I didn't need atomics, I didn't need writes, so I just handrolled a set of indexes from the source tsv files and compressed those with zstd and decompressed them via wasm on every load. Decompressing and loading was now faster than straight loading via SQLite and since my module was 52kb of wasm instead of 800kb for SQLite I could load as many instances of the module as I wanted with no regrets. I use stringzilla for linear scans and it's ridiculously fast.

SQLite is great, but it's not the solution for every problem.



The author of stringzilla, Ash Vardanian, has a bunch of really cool talks/lectures/demos. Highly recommend checking him out if you like that kind of thing :)


Yeah, I probably should since I want to add glob or regex support later.

It's a little out of the scope of stringzilla but I should still be able to use it for the heavy lifting


> The database files […] didn't compress super well

That’s a good thing. If they were compressible that would indicate inefficiency in the SQLite format.


Well, I went and looked at it again and it seems like SQLite databases although larger than my custom format do compress to a 10% smaller size provided there's no indexes. But my custom format uses indexes so it's not really a good comparison. I guess I should see what it's like with indexes.

But at the end of the day SQLite just doesn't meet the needs of my project due to it's size and lack of free decompression support. Passing a decompression stage before giving it to SQLite is unwieldy and requires a separate decompression module and incurs a copy operation. I don't know enough about SQLite to attempt to compile and integrate the zstd decompressor into it.


I think for applications, files that already have indexes and can be memory mapped directly are preferred to files that need to be in memory, require index re-creation and de-compression every single time.


what was your data size? i am surprised 800kb made a difference? using stringzilla was smart approach,my guess is it being unusually faster made all the difference.


Usually I'm dealing with about 20mb of compressed data, almost 100mb uncompressed. Even with only a couple mb of data SQLite still has a startup time of a couple hundred milliseconds on my phone. But that's a couple hundred milliseconds when loading a database that's already decompressed. When loading 100mb SQLite usually took a second or so which I didn't really like for a pwa.

It took me quite a few attempts to get something faster than SQLite. My new format loads instantly because I'm just casting the data to a struct. The only thing that takes time is decompressing, but that's still faster than loading the uncompressed via SQLite. My phone loads 100mb from 20mb compressed in about 400ms.

But writing my own format gives other benefits like being able to extract all the HTML tags and capital letters beforehand for fast and sensible search and reconstructing it on render. It's also just way easier for me to edit tsvs with markers for what parts are indexed and have that transformed into an indexed format with 3 indexes.

Also, with SQLite I was just running one module, but with my new format I'm running about 20 instances of it because it keeps the data nicer, more manageable and makes everything very parallel. Though I keep the number of web workers to 2 because it doesn't seem to benefit much to increase it more.

https://github.com/tnelsond/peakslab


This is really cool. I'm working on stuff that is somewhat aligned with this - offline knowledge base/educational platform focused on things like appropriate technologies for rural people in the developing world. Storing in the browser and, more importantly, searching it is definitely one of the major challenges. (it's also just a much more dynamic app)

My main question about this is whether it can be dynamically/incrementally updated within the browser? Eg new material is available or edits have been made, so sync it from backend and it gets merged in.

I've been working on using rxdb to sync and store in browser - it can use its own indexeddb abstraction, sqlite or it's own OPFS-based DB. It can also load any of these into memory in its memory-mapped mechanism. I've also made a mechanism to load everything into flexsearch in a sharedworker, so that you can do full text search fairly performantly.

It's a lot of complexity though. I'd be curious to hear any of your thoughts. Or even to chat if you're open to it!


I'm not sure I follow exactly, but if I understand you mean that when the database file is updated that the app updates? Right now on app load it updates the service worker and shows the files in cache first. If there's a newer file it fetches it in the background, it then sends a message to the client that there is a new file. I haven't implemented the next part yet but it should be able to invalidate the current file and load the new file without refreshing the page. Right now the new files will load the refresh after the new service worker is activated.

But the page still had to be refreshed to load the new service worker. I'm looking into ways to cut the time to loading the new files down because right now you have to refresh the page 3 times for the new files to take over.

The .peak files aren't designed to be a database that you can just add to during runtime, they're rather static and highly efficient in that context. But it's easy to edit the source files and generate a new .peak file from that.

You can take a folder of any kind of files and run peakgen on it and it will create a compressed .slab file that you can search and fetch results from just like the .peak files. I first saw that done with SQLite and I really liked it, so I knew I could do it too.

If you want to chat you can shoot me an email.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: