my foodcoop has a new supplier. The supplier does not offer BNN files
(yet), but we would like to have some kind of “assisted” article
synchronization. There is hope that the supplier will offer some kind of
interface in the future. However, I am not sure how a good solution
should look like.
Of course I could upload CSV files to the foodsoft directly, but I
prefer the interface of the sharedlists solution: It is convenient to
search for individual articles in the sharedlists database or to
synchronize existing articles. So given that I would like to use the
sharedlists backend: What are the options?
I like how “sync_bnn_files” works (fetching list by FTP), but that is
limited to the BNN format, right? I find it troublesome that the BNN
format requires the use of short identifiers for manufacturer
(“Hersteller”) and category (“Warengruppe”): Our new supplier has
manufacturers and categories that are not present in the BNN list [1].
Currently, sharedlists understands columns 11 and 17 of the BNN format
[2-3]. Those are limited to those identifiers that are present in the
BNN list. Implementing alternate short identifiers for each supplier
seems cumbersome. I guess that is what the “midgard” thing does [4].
In addition, I wonder if there are any legal restrictions for using the
BNN format: Should the supplier be a member of the BNN?
As an alternative to the BNN format, I can find those other “import
filters” in sharedlists. As far as I understand, those filters are
limited to e-mail synchronization, right? For some reason, that sounds
complicated to setup. I guess I prefer the FTP-sync option.
Do you have any good hints how I can proceed? Here are my wishes:
use the sharedlists interface from the foodsoft
allow for custom manufacturers and categories (not in the BNN list)
do not break the legal rules of BNN-format usage (if any)
If you point me to a good solution I can imagine to contribute a PR that
implements the functionality we need (if necessary).
That sounds like a good idea.
Yes, there are multiple import formats, and they are supported in all places where articles are uploaded. With one exception: BNN sync expects the BNN file format.
If you’d like to use FTP for synchronisation, I think we’d either add a new FTP sync method, or generalise the BNN sync to a general FTP sync (while keeping it easy to setup BNN).
Why not use the Foodsoft CSV file format (see upload in the Foodsoft articles screen)? I think it would be useful if sharedlists could import (&export?) that.
i built my own application for shared lists. it isn’t in the best of condition in terms of QA since i didn’t release it, but i would be happy to share it. it is written in node.js - and yes, i’m sorry to add another technology to the mess. here is a video of it: https://photos.app.goo.gl/MMn8ptB6vhip61ex9
however the biggest challenge is how you want to parse/configure reading each spreadsheet. the reason i built it was because i wanted to pull out things like unit quantities from strings in the catalogue. eg, “GRAPEFRUIT RUBY RED BAGGED 10x4#” will extract that this is a case of 10 bags each weighing 4LB. it is messy though - suppliers are inconsistent in how they list things and i constantly tweak the regular expressions to parse their catalogue.
to make this application more flexible it really needs screens to let the user adjust the parsing of sheets. there are tools that do this and are configurable but they are way to heavy/overkill, see the whole space of ETL (extract transform load) tools, like https://www.jaspersoft.com/data-integration
also, re getting the spreadsheets to parse - i also looked at writing google apps scripts to pull the excel files from the emails that the suppliers send. it was fun for a challenge but the time it takes to update the sheets by upload was so short that i didn’t bother in the end.
thanks for sharing your experience. Now that I am dealing with the
article synchronization myself, I realize what big mess it can be to
parse data from an unstructured data source. In fact, I would like to
avoid that task: I wish to receive well structured data from the
supplier and feed that into a shared-supplier database as smoothly as
possible. One remaining issue is the (sub)division of the article unit
into smaller pieces (when importing the article into the foodsoft), but
that is another topic.
The video had some playback issues on my computer, but I could follow it
more or less. I can imagine that your tool is very useful in situations
where suppliers do not offer well structured data. I do not want to
spend that much effort, though. Maybe other people that are not aware of
your tool could make use of it and/or contribute. Have you ever
considered to release it as an open source tool?
If you’d like to use FTP for synchronisation, I think we’d either add a
new FTP sync method, or generalise the BNN sync to a general FTP sync
(while keeping it easy to setup BNN).
OK, I guess I will try to implement one of those two options.
Why not use the Foodsoft CSV file format (see upload in the Foodsoft
articles screen)? I think it would be useful if sharedlists could import
(&export?) that.
Yes, that format fulfills all my basic needs.
Do you know the reason behind the two empty columns just before the
category? I am just curious about that.
Should we expect any problems if we plan to extend that format at some
point in the future? Imagine I would like to add more article fields
(ingredients, allergenes, organic certification, …) in the foodsoft.
Could we append those new columns?
Just to wrap this up: Based on the input from this thread, I enabled FTP
synchronization for non-BNN files. Currently, we are discussing the
following pull request: https://github.com/foodcoops/sharedlists/pull/17