Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie

Alternative to Excel & Power Pivot for Large Data Volumes

Options
  • 18-10-2023 1:28pm
    #1
    Moderators, Politics Moderators Posts: 38,887 Mod ✭✭✭✭


    Over the last 18 months, I've developed a tool in work which takes loads of ERP data (BOM, Production Orders, Master Data, etc) from SAP ECC & BW along with a few flat file sources. I use Excel with power pivot as the front end. Going forwards, the plan will be to include our non-SAP businesses within this tool (but as yet not been discussed). This tool provides data for our procurement teams that they didn't have prior to this and has effectively become business critical.

    Currently I load the data monthly into a Snowflake DB (this process is currently being automated) and various views are made available to Excel where the data is brought together within the data model in Excel. There is simply too much data to include all of it within one SQL view. I have reduced the number of data rows (and columns) within the views to the bare minimum so there's isn't really much I can do to improve them.

    However, as the volumes of data involved (and I'm limiting it to two years) are so large, it makes refreshing and changing the pivots difficult, especially for Excel 32bit users. I've had to create individual files for each of our SAP based companies whereas ideally, I'd be able to view them all together. These individual files with data range from 30Mb to 90Mb (which really are far too big for Excel!). My Excel templates have about 10 pivot tables feeding from the data model.

    Because my own laptop is running 32Bit Excel, I tend to do my changes & data loading via a remote desktop on a server that has 32Gb Ram and Excel 64bit but this is not going to be made available to dozens of staff globally.

    I need to look at alternative solutions that are more stable at handling large volumes of data but allow for the flexibility of Excel's pivot tables where users can manipulate the tables by adding & removing fields.

    We have Power BI but there are cell limits with this and it doesn't give us the flexibility of user manipulated reports. We also have Tableau although a different project within the busines (which was badly managed before being pulled) has tarnished the Tableau name.

    I have already considered our excisting BW environment but this is due to be decommissioned shortly 🙄

    Any suggestions for something "off the shelf" that offers the benefits of Power Pivot (and potentially it's data modelling abilities) but allows for a ton of data and is quick? I can't actually think of anything (nor can anyone I talk to) 😕



Comments

  • Registered Users Posts: 6,665 ✭✭✭10-10-20


    I thought that 32-bit was for non-Intel/AMD CPU's and less than 4GB of RAM, these days? Why would you knowingly run 32-bit in an organisation?



  • Moderators, Politics Moderators Posts: 38,887 Mod ✭✭✭✭Seth Brundle


    I don't know that is my honest answer. It seems to be the only version rolled out with our Office 365 account (presumably a choice was made) and this may because of various SAP/BW /BPC add-ins.



  • Moderators, Politics Moderators Posts: 38,887 Mod ✭✭✭✭Seth Brundle


    On top of that, I'm not convinced that Excel 64 bit is the way to go. You still have the other Excel limitations e.g. a million-ish rows per sheet which limits the volume of data I can load.



  • Registered Users Posts: 668 ✭✭✭PeaSea


    When you say "loads", how much are we talking about here in terms of number of rows and GB of data ? I've done quite large Power BI solutions which sounds as if it is what you are looking for. I'm not sure what you mean re "cell limits" and you can give users blank Power BI files that are linked to database views so they can knock themselves out and create their own reports. But any reporting tool requires a certain amount of savvy when it comes to thinking in terms of data and solutions, so don't expect miracles either.

    As for "There is simply too much data to include all of it within one SQL view" I think you might be falling into the trap of trying to include everything at once in one mega-view instead of coming at it from the other end - what do we need to see so what data do we need to get there ? So start thinking about summary views built for those who need them and detail views for those who need that. Most solutions I've done include views on views - so for example a view of sales, then multiple views on top of the sales view for each user that needs them, repeat for all departments. And then have multiple PBI files, one for each purpose.



  • Moderators, Politics Moderators Posts: 38,887 Mod ✭✭✭✭Seth Brundle


    When you say "loads", how much are we talking about here in terms of number of rows and GB of data ?

    The "loads" term refers to the data taken from the ERP into Snowflake but I've managed to separate a lot of chaff from the wheat. The data downloaded from Snowflake is over half a dozen Excel tables, the largest of which has about 800,000 rows with about 100 columns. The other tables are much smaller than that. As I said, the files reach ~90Mb.

    I've done quite large Power BI solutions which sounds as if it is what you are looking for.

    My understanding is that Power BI has limits on the number of rows linked to an external data source query...

    I'm not sure what you mean re "cell limits"

    should have said row limits as above

    and you can give users blank Power BI files that are linked to database views so they can knock themselves out and create their own reports. But any reporting tool requires a certain amount of savvy when it comes to thinking in terms of data and solutions, so don't expect miracles either.

    Understood & agreed.

    As for "There is simply too much data to include all of it within one SQL view" I think you might be falling into the trap of trying to include everything at once in one mega-view instead of coming at it from the other end - what do we need to see so what data do we need to get there ? So start thinking about summary views built for those who need them and detail views for those who need that.

    I don't want all data in one view, I was just trying to point out that it had been considered. As it stands, the principal view cannot really be normalised further. It contains BOM (Bill Of Materials) data so there's no repetition there.

    Most solutions I've done include views on views - so for example a view of sales, then multiple views on top of the sales view for each user that needs them, repeat for all departments. And then have multiple PBI files, one for each purpose.

    But where I'm trying to get to is to have all companies consolidated rather than using separate files (one for each entity). The current set up has too much data in the principal (BOM) view. I anm satisfied regarding the backend structure. The issue I seem to be having is in trying to present it within a frontend.



  • Advertisement
  • Registered Users Posts: 5 Spectral.Instance


    A table in a PowerPivot data model can contain just under 2 billion rows

    https://support.microsoft.com/en-us/office/data-model-specification-and-limits-19aa79f8-e6e8-45a8-9be2-b58778fd68ef

    so the size of an Excel worksheet is not an impediment in loading large volumes of data to the data model.

    The size of an Excel worksheet is an impediment if you want to have a PivotTable itself with over a million rows but, respectfully, a PivotTable of such magnitude is not fit for purpose when its unlikely to be summarising or aggregating anything.

    If users need to see the detail then the data loaded to the data model can be viewed in the PowertPivot window , rather than being loaded to worksheet tables since that results in an element of duplication which does nothing for the file size.

    While I'm sure you know that saving your file as an xlsb will produce a smaller file than an xlsx containing the same data it may also be the case that your files are too large from the PivotCache perspective. If I understand correctly, you may have up to 10 pivot tables in a single file, so it is equally possible that you may have 10 pivotcaches in the file when 1 would be sufficient; checking the number of pivotcaches involves a single line of VBA: pressing Alt+F11 opens the VBA editor, which may or may not show a window titled "Immediate" (if not, then you should click "View" on the VBA window menu bar, and then "Immediate Window") - in this window type

    ?ActiveWorkbook.PivotCaches.Count

    and press Enter - if the result is greater than 1 then your file is too big and correcting for this would involve rebuilding your template (essentially only using the Insert -> Pivot Table -> From DataModel option once), and then using the clipboard to make 9 copies of that pivot table where you need them, and then editing each of the 10 instances separately to produce the 10 desired pivottables).

    I've never really used PowerBI but, since it is supposed to be more "pro big data" than Excel, it would seem more appropriate - have you considered dumping the result set of your external data source query to a flat file, and then having PowerBI connect to that instead ?

    It also seems more appropriate from the "publishing perspective" - you have (apparently) full access to a powerful server which can't be made available to the dozens of viewers who might need to view the data, but that server does sound like an appropriate host to which to publish the data in PowerBI, so that the dozens of users who needed it could have "web-view" access to the server.

    Finally, since I don't have fond memories of SAP sales orders (which I appreciate you did not mention), I'd suggest 'cross-referencing' your BOM data with procurement/fab-assembly data to see if you can filter out any BOM designed to consume components which have never actually been procured/assemblies never actually fabricated as that, while making the queries more complex, might produce a smaller results set.



Advertisement