Technology and Financial Reporting

Started by Waltzing, Jun 13, 2023, 10:53 AM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Waltzing

for reading PDF the internal objects of PDF's would have to be named with a industry standard naming convention.

cant see it happening as there is no industry standard naming conventions for financial reporting company sets of accounts.

That would require the accounting society to understand technology.

https://opensource.adobe.com/dc-acrobat-sdk-docs/acrobatsdk/

The only way it could happen is for the companies to send data to the NZX in a standard format and for that to be a requirement of Listing on the exchange. It would be part of a wider push to standardise the publishing of financial data transaction formats.

In that regard FERG raises subject matter that has been pushed here for a while here by WZ.  I have been regularly meeting with a very far sited semi retired ACA to create a CONSOLIDATED cross column format for financial private company sets of accounts as Multi dimensional data structures have been available for reporting and tabulation now for over 30 years in many software languages.

FERG raises the right questions.

Waltzing

Processing information from PDF reports is possible using VBA code on the internet.

The information inside these PDF documents may be able to determined by either giving CHAT GPT the job or by arrange the data in an indexed table and the user investor selects the information and guiding it into a standard model using an old fashion click of the mouse.

In MID winter this may be something you want to do but in summer are you going to spend your sunday afternoon sort through a list of non index objects from a PDF clicking your mouse while everyone else in on the golf course or surfing?

May be Chat GPT will read and sort it out for you?

The future is in play and one thing we know for sure and that is the FAT Lady hasnt sung yet.

Ferg

#2
Great thread Waltzing.  I have seen you commenting on this elsewhere quite a long time ago, and I am intrigued by the concept.  I am over tapping financial statements into spreadsheets given other priorities.  I have previously written VBA programmes to not only scrape data from the internet but also programmes that use API calls to grab historical stock prices from Yahoo..so interpreting financial statements should be a doddle.

Regarding reading pdf versions of annual accounts, it sounds like your stumbling block is the lack of naming conventions within reoprts.....is that correct?  A way around this might be to have a dictionary of common terms that resolve to a standard definition e.g. Trade Receivables, Receivables, Debtors and Trade Debtors all resolve to debtors for instance.  Any term not found could be added to the dictionary and marked as unresolved until the user adds a definition.  And I imagine the pdf documents use bookmarks of some sort to note the position of standard reports such as the Balance Sheet & P&L etc....is that correct?

P.S.  It sounds like you want to set up something a bit like Xero for importing, interpreting, storing and then analysing.  Point and click.

Waltzing

#3
 
Yes a  Dictionary or Indexed List structure for searching is the solution as you rightly say.

But for it to be useful it needs to be created in a configurable solution where it is runtime configurable and not created inside MS office but Drives products like MS office from the outside.


PS: XZ Product is a Internet Multi User platform specific for business and created for the KFC market.

It is not a company that develops low level multi platform scripting engines.

Im expecting to solve this problem as the result of other developments that just happen to do this process by default .

After all its just a few loops and some data returned into a list of data Variants. Variants arnt the MS version im thinking of but a new generation of Smart Variants that can store data and execute code.

More later as this new technology as it gets rolled out from the workshop.



 


Ferg

Interesting.  So it appears pdf documents don't use bookmark objects like Word documents.  That makes it a bit harder.   Text search can find things like "Balance Sheet As At" or similar to find the page that contains the Balance Sheet etc....but controlling objects even line by line....I'm not even sure where to start.

Meanwhile copy/paste into Word and/or Excel is possibly worse.  That would require a lot of interpretation and culling just to trim it back to the underlying raw data...unformatted.  Each report possibly has differing formats.  Oh dear...might be a long winter.

BTW, assuming I am understanding you correctly, you can use MS VBA to drive other MS programmes from the outside.  For instance I have an Excel model that drives web scraping and reformatting in another Excel workbook, before pushing into a Word template for automated non-financial reporting for a USA client.  It has been running maybe 15 years now.  Working on some minor tweaks tonight.

Waltzing

#5
FERG you can open a PDF document in WORD and then access the WORD Document by VBA.

There  is code out there that demos that and we dont know what CHAT AI will do yet.

Also dont forget ABODE has just released the BETA of FIRE FLY.

You can drive MS Office VIA LATE OLE BINDING from any computer language and you can also access PDF Docs from DOT NET in C SHARP.

We actually have code that does just that.

You an access EXCEL also from Chsarp and we have Set of Classes in VB DOT NET for doing that. It uses delegates to call back to standard WIN32 Programs.

Delegates are bit technical of course and we be understood by members here unless they are advanced users of DOT NET.

What FIREFLY can do is new and havnt tried to access it yet.

But we have a new Document Management product under development that we think will be of interest in Europe when linked to FIREFLY and also VBA and our NEW SMART VARIANTS scripting engine.


 

Waltzing

#6
Example of putting the PDF into WORD document and then pasting into EXCEL. Finds the specific section you want to Find.
In fact you could automate  a lot of this to be done by a software robot.

 ' open Word application and load doc
    Set WApp = CreateObject("Word.Application")
    ' WApp.Visible = True
    Set WDoc = WApp.Documents.Open(FName)

    ' go home and search
    WApp.Selection.HomeKey Unit:=6
    WApp.Selection.Find.ClearFormatting
    WApp.Selection.Find.Execute "FINANCIAL PROFIT LOSS REPORT"

    ' move cursor from find to final data item
    WApp.Selection.MoveDown Unit:=5, Count:=1
    WApp.Selection.MoveRight Unit:=2, Count:=2

    ' the miracle happens here
    WApp.Selection.MoveRight Unit:=2, Count:=1, Extend:=1

    ' grab and put into excel       
    Set WDR = WApp.Selection
    EXCELSHEETRANGE(1, 1) = WDR ' place at Excel cursor

    'repeat
    WApp.Selection.HomeKey Unit:=6
    WApp.Selection.Find.ClearFormatting
    WApp.Selection.Find.Execute "Period of Report:"
    WApp.Selection.MoveRight Unit:=2, Count:=8
    WApp.Selection.MoveRight Unit:=2, Count:=3, Extend:=1

    Set WDR = WApp.Selection
    EXCELSHEETRANGE(1, 2) = WDR ' place in cell right of Excel cursor

    WDoc.Close
    WApp.Quit

Ferg

#7
Interesting - I forgot about opening pdf documents in Word.  Handling data in Word using VBA is much easier when using bookmarks.  Another way to hard code the page you want to find is to manually insert bookmarks on the relevant financial reporting pages and then run your code against that document between bookmarks. They work really well with VBA and data manipulation/extraction - but they have to be set manually first time around, or they could be set automatically based on (multi) text search which matches a heading style.  All of that can be run and controlled from an Excel s/sheet per your sample code.

This method:
QuoteWApp.Selection.Find.Execute "FINANCIAL PROFIT LOSS REPORT"
assumes each wording format is the same between entities.  I imagine in most cases you can't avoid manually setting heading markers the first time round.

It also looks like Word interprets various headings from the pdf document using the Headings style, which is another way to programmatically find a reference to the info we want.

That said, once you have a dictionary of definitions for how a company presents it's data, then subsequent data extraction attempts could be fully auto.  Just have to note the format differences between HY and FY report versions for each company.  Easy as.  ;D

What is also nice is the financial data gets dropped into a table when converted to Word from pdf - which means you programmatically know the bounds of where the data is located.  Very interesting and thanks for the reminder about opening pdf documents in Word.  It's going to be a long winter......

Waltzing

#8
 ADOBE has a new version coming out with AI (our engineering just laughs) but lets call it that...

FIREFLY needs investigating to see what it can do...

MS OFFICE will need a major upgrade to embed this stuff....

EXCEL is a pain and for ADDINS was no where near as good and LOTUS was.

 We do a lot of weird stuff in excel with text meta data expressions that get read before the spreadsheets are read or written.

Note: VBA is a lot faster than the slow access via  VB DOT NET.

WE have DUMPED DOT NET and gone back to traditional CPP on linux and WIN32/64 binaries on Windows.

we call DOT NET, DOT NOT!!!!




Waltzing

NOW FERG your seeing the BRILLANCE of the ACCOUNTING SOCIETY in not standardising information in PUBLIC COMPANY ACCOUNTS even though SAP (Standard for Accounting) have been around for a LONG LONG TIME....

But not enforced by AUDITORS and Company law should have stipulated it for all Public Company Accounts and also standards for data transaction files...

Maybe AI will enforce it...

Waltzing

Really what you want is for CHAT GPT to learn and do the job for you....


Ferg

#11
I haven't used dot net in a while.  99.99% of what I do in using VBA.  It's easy.  Bookmarks in Word are life changing.

Here is sample code for handling bookmarks within Word VBA.  I prefer early binding and full variable declarations, so you need to tick the various options under VBA > Tools > References for VBfA, MS Word xx Object Library & MS Office xx Object Library.  This assumes bookmarks exist within the document, grabs their name and position, bubble sorts in an array and then outputs their names and positions with the document into the immediate window.
 
QuoteOption Explicit
Option Compare Text

'Global variables:
Dim MyBookmarks() As String
Dim bkMrk As Bookmark

Sub GetBookmarks()

Dim LC As Long, LL As Long, IL As Long

'get the bookmark details
With ThisDocument

    .Bookmarks.ShowHidden = False
    LL = .Bookmarks.Count 'Note: assumes count > 1
    ReDim MyBookmarks(2, LL)
   
    For Each bkMrk In .Bookmarks
        If bkMrk.Name = "AllBookmarks" Then
            LL = LL - 1
            bkMrk.Delete
        ElseIf Left$(bkMrk.Name, 1) = "_" Then
            LL = LL - 1
        Else
            LC = LC + 1
            MyBookmarks(1, LC) = bkMrk.Name
            MyBookmarks(2, LC) = bkMrk.Start
        End If
    Next bkMrk
End With

'Bubble sort the bookmarks into the document sequence
' (VBA defaults to alphabetical sequence for bookmarks, so this needs to be corrected)
For LC = 2 To LL
    If CLng(MyBookmarks(2, LC)) < CLng(MyBookmarks(2, LC - 1)) Then
        For IL = LC - 1 To 1 Step -1
            If CLng(MyBookmarks(2, IL)) > CLng(MyBookmarks(2, IL + 1)) Then
                MyBookmarks(1, 0) = MyBookmarks(1, IL)
                MyBookmarks(2, 0) = MyBookmarks(2, IL)
                MyBookmarks(1, IL) = MyBookmarks(1, IL + 1)
                MyBookmarks(2, IL) = MyBookmarks(2, IL + 1)
                MyBookmarks(1, IL + 1) = MyBookmarks(1, 0)
                MyBookmarks(2, IL + 1) = MyBookmarks(2, 0)
            Else
                Exit For
            End If
        Next IL
    End If
Next LC

Debug.Print "Bookmark details:"
For LC = 1 To LL
    Debug.Print MyBookmarks(1, LC) & " : " & MyBookmarks(2, LC)
Next LC

End Sub

Hopefully I haven't forgotten anything important with that copy/paste...

More on Word bookmarks here:
https://learn.microsoft.com/en-us/office/vba/api/Word.bookmarks

I kid you not - bookmarks in Word are life changing when using VBA.  That reference also has the methods for adding, checking etc.

Ferg

Quote from: Waltzing on Jun 16, 2023, 09:47 PMNOW FERG your seeing the BRILLANCE of the ACCOUNTING SOCIETY in not standardising information in PUBLIC COMPANY ACCOUNTS even though SAP (Standard for Accounting) have been around for a LONG LONG TIME....

But not enforced by AUDITORS and Company law should have stipulated it for all Public Company Accounts and also standards for data transaction files...

Maybe AI will enforce it...


Agreed. Not just report headings, but also line items within each report. A self-maintaining dictionary object in the s/s MF would handle future versions of reports from existing reporting entities, and any gaps for new entities could either be held in limbo and/or defined later.  You could have a lookup table per company that assigns found phrases to desired phrases e.g. whatever their description -> becomes -> trade debtors.  I can't think of any entities that would use the same words for differing definitions, so you could have a global mapping dictionary, otherwise you need one per entity being analysed.

Ferg

Quote from: Ferg on Jun 16, 2023, 10:05 PMany gaps for new entities could either be held in limbo and/or defined later.  You could have a lookup table per company that assigns found phrases to desired phrases e.g. whatever their description -> becomes -> trade debtors.

This needs explaining. Models are usually data -> transformation -> presentation.  Data is the cleaned raw data per financials.  Transformation restates their description to desired description.  Presentation is the pretty report at the end.  Anything not immediately matched from the raw data could be given "unassigned" (i.e. it is in limbo) in the mapping table, and once that table has the correct mapping, it will self correct on the report once the mapping is populated for items that cannot be interpreted on the fly.  Given the template contains the mapping, it will be there for next time.

Waltzing

#14
The only Developer to do a Bubble sort in VBA outside PASCAL!!!!

Classic !!!

BUBBLE SORTS!!!

Whats really needed is  a product for companies to use to publish financials.... and drive ADOBE from a dataset Dictionary.

Many folks here reading for code wont know what a CLng is ...

For accounting types that a LONG data type..

https://learn.microsoft.com/en-us/office/vba/language/concepts/getting-started/type-conversion-functions

it was always going a very very long winter...