Tomasz Wroblewski
2013-Jan-22 12:08 UTC
[PATCH] Fix acpi_dmar_zap/reinstate() (fixes S3 regression)
Hello all, Changeset 23013:65d26504e843 (ACPI: large cleanup) modified acpi_dmar_reinstate() and acpi_dmar_zap() to use global "dmar_table" pointer instead of fetching it dynamically via acpi_get_table (and it removed the get_dmar() function which was used to this). This global pointer is initialised once when parsing the acpi table. However, this seems incorrect due to how acpi_get_table() and underlying __acpi_map_table() is implemented. It pretty much always returns the same virtual pointer but instead remaps the fixmap underlying that virtual pointer. So storing this virtual pointer in global variable is incorrect because the pointer is invalidated next time acpi_get_table() is called. Therefore acpi_dmar_reinstate()/zap() actually modify not the DMAR table but the table which was last accessed by acpi_get_table (which happens to not be DMAR at least on some Lenovos we tested but likely more platforms). This causes the affected table corruption, its checksum corruption, and failure to resume from S3 second consecutive time. Attached patch restores the previous behaviour before cset 23013:65d26504e843 of fetching the dmar_table pointer dynamically. Some __init annotations needed to be removed as the acpi_get_table() function is now again used on suspend/resume path Signed-off-by: Tomasz Wroblewski <tomasz.wroblewski@citrix.com> _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Jan Beulich
2013-Jan-22 12:58 UTC
Re: [PATCH] Fix acpi_dmar_zap/reinstate() (fixes S3 regression)
>>> On 22.01.13 at 13:08, Tomasz Wroblewski <tomasz.wroblewski@citrix.com> wrote: > Changeset 23013:65d26504e843 (ACPI: large cleanup) modified > acpi_dmar_reinstate() and acpi_dmar_zap() to use global "dmar_table" > pointer instead of fetching it dynamically via acpi_get_table (and it > removed the get_dmar() function which was used to this). This global > pointer is initialised once when parsing the acpi table. > > However, this seems incorrect due to how acpi_get_table() and underlying > __acpi_map_table() is implemented. It pretty much always returns the > same virtual pointer but instead remaps the fixmap underlying that > virtual pointer. So storing this virtual pointer in global variable is > incorrect because the pointer is invalidated next time acpi_get_table() > is called. Therefore acpi_dmar_reinstate()/zap() actually modify not > the DMAR table but the table which was last accessed by acpi_get_table > (which happens to not be DMAR at least on some Lenovos we tested but > likely more platforms). This causes the affected table corruption, its > checksum corruption, and failure to resume from S3 second consecutive time. > > Attached patch restores the previous behaviour before cset > 23013:65d26504e843 of fetching the dmar_table pointer dynamically. Some > __init annotations needed to be removed as the acpi_get_table() function > is now again used on suspend/resume pathI recognize the need of fixing this, but not this way. We have ioremap() now, and hence the patch could be using this, without re-running the whole acpi_get_table(), but just using the stored physical address of the table (retrieving of which would be the only real code addition needed here). For the older trees with non-functional ioremap(), I''d prefer simply adding the table range to the 1:1 mapping (thus making ioremap() work for that range, should use of that be needed; if not needed, that''s certainly worth considering this even for -unstable). Also, with your change not even attempting to fix the underlying issue of the ACPI code storing a pointer to the mapped table in acpi_gbl_root_table_list.tables[].pointer, I can''t even see how your patch is supposed to work. I''d expect the stored pointer to get re-used by acpi_get_table()/acpi_tb_verify_table(), and hence this shouldn''t win you anything. Jan
Tomasz Wroblewski
2013-Jan-22 13:36 UTC
Re: [PATCH] Fix acpi_dmar_zap/reinstate() (fixes S3 regression)
On 22/01/13 13:58, Jan Beulich wrote: Jan, thanks for your comments> I recognize the need of fixing this, but not this way. We have > ioremap() now, and hence the patch could be using this, > without re-running the whole acpi_get_table(), but just using > the stored physical address of the table (retrieving of which > would be the only real code addition needed here). > > For the older trees with non-functional ioremap(), I''d prefer > simply adding the table range to the 1:1 mapping (thus making > ioremap() work for that range, should use of that be needed; > if not needed, that''s certainly worth considering this even for > -unstable). > > Also, with your change not even attempting to fix the underlying > issue of the ACPI code storing a pointer to the mapped table in > acpi_gbl_root_table_list.tables[].pointer, I can''t even see how > your patch is supposed to work. I''d expect the stored pointer to > get re-used by acpi_get_table()/acpi_tb_verify_table(), and hence > this shouldn''t win you anything. >It works because the existing code in acpi_get_table actually sets acpi_gbl_root_table_list.tables[i].pointer = NULL every call anyway (after setting *out_table pointer). I''ll have a go at trying this with ioremap() and retrievable table''s physical pointer instead.
Jan Beulich
2013-Jan-22 14:13 UTC
Re: [PATCH] Fix acpi_dmar_zap/reinstate() (fixes S3 regression)
>>> On 22.01.13 at 14:36, Tomasz Wroblewski <tomasz.wroblewski@citrix.com> wrote: > On 22/01/13 13:58, Jan Beulich wrote: > > Jan, thanks for your comments > >> I recognize the need of fixing this, but not this way. We have >> ioremap() now, and hence the patch could be using this, >> without re-running the whole acpi_get_table(), but just using >> the stored physical address of the table (retrieving of which >> would be the only real code addition needed here). >> >> For the older trees with non-functional ioremap(), I''d prefer >> simply adding the table range to the 1:1 mapping (thus making >> ioremap() work for that range, should use of that be needed; >> if not needed, that''s certainly worth considering this even for >> -unstable). >> >> Also, with your change not even attempting to fix the underlying >> issue of the ACPI code storing a pointer to the mapped table in >> acpi_gbl_root_table_list.tables[].pointer, I can''t even see how >> your patch is supposed to work. I''d expect the stored pointer to >> get re-used by acpi_get_table()/acpi_tb_verify_table(), and hence >> this shouldn''t win you anything. >> > It works because the existing code in acpi_get_table actually sets > acpi_gbl_root_table_list.tables[i].pointer = NULL every call anyway > (after setting *out_table pointer).Oh, indeed, I had overlooked that.> I''ll have a go at trying this with ioremap() and retrievable table''s > physical pointer instead.Thanks. Jan
Tomasz Wroblewski
2013-Jan-22 15:27 UTC
Re: [PATCH] Fix acpi_dmar_zap/reinstate() (fixes S3 regression)
On 22/01/13 13:58, Jan Beulich wrote: Jan, attaching updated patch which uses ioremap() instead of acpi_get_table. Tested it across some S3 iterations on xen tip from today and Lenovo T520 and seems to work well (although there are some unrelated scheduler issues on resume which we have patched separately on our end; intending to submit these patches as well later on). Doesn''t work on older trees from before your "implement vmap()" commit. You mentioned that this would be fixable on older trees via adding the acpi table range to the 1:1 mapping - how would one go about this as I''m not sure where is the relevant code located? Thanks! _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Jan Beulich
2013-Jan-22 15:55 UTC
Re: [PATCH] Fix acpi_dmar_zap/reinstate() (fixes S3 regression)
>>> On 22.01.13 at 16:27, Tomasz Wroblewski <tomasz.wroblewski@citrix.com> wrote: > attaching updated patch which uses ioremap() instead of acpi_get_table.First of all, I envisioned to keep the static variable "dmar_table", and simply intialize it when the hypervisor boots. That, as a result, will move all of your new code additions to .init.text (which in turn will make it desirable to pick a different ACPI source file to add the new function in). Second, adding a function to be called from outside ACPI to acpi/actables.h is wrong, which you could also have noticed by the fact that you had the add the inclusion of that header. Along the same lines, publicly accessible table interfaces are expected to be in drivers/acpi/tables.c or drivers/acpi/tables/tbxface.c. Probably the latter is better suited. Finally, when adding code to ACPI source files, please make sure you match the coding conventions found in those files, not the general Xen ones (and certainly not a mixture of both).> Tested it across some S3 iterations on xen tip from today and Lenovo > T520 and seems to work well (although there are some unrelated scheduler > issues on resume which we have patched separately on our end; intending > to submit these patches as well later on). Doesn''t work on older trees > from before your "implement vmap()" commit. You mentioned that this > would be fixable on older trees via adding the acpi table range to the > 1:1 mapping - how would one go about this as I''m not sure where is the > relevant code located?That ought to be a simple call to map_pages_to_xen(), equivalent to the respective ones in __start_xen(). Perhaps something like map_pages_to_xen(__va(addr), PFN_DOWN(addr), <nr>, PAGE_HYPERVISOR); You''d then simply set "dmar_table" to __va(addr). Jan
Tomasz Wroblewski
2013-Jan-22 17:22 UTC
[PATCH v3] Fix acpi_dmar_zap/reinstate() (fixes S3 regression)
Thanks, attaching version 3 of the patch using your suggestions, which simplify it considerably. This one uses map_pages_to_xen in favour of ioremap, so it works on both tip as well as older trees from before ioremap() rework (tested both). Changes from v2/v1: - keeping global dmar_table pointer and initializing it in acpi_dmar_init() - use map_pages_to_xen to add a static mapping (instead of ioremap) - moved acpi_get_table_physical_location to tbxface.c. Added instance argument to make it symmetric with acpi_get_table. - coding convention fixes Commit message: Fix S3 regression introduced by cs 23013:65d26504e843 (ACPI: large cleanup). The dmar virtual pointer returned from acpi_get_table cannot be safely stored away and used later, as the underlying acpi_os_map_memory / __acpi_map_table functions overwrite the mapping causing it to point to different tables than dmar (last fetched table is used). This subsequently causes acpi_dmar_reinstate() and acpi_dmar_zap() to write data to wrong table, causing its corruption and problems with consecutive s3 resumes. Added a new function to fetch ACPI table physical address, and establishing separate static mapping for dmar_table pointer instead of using acpi_get_table(). Signed-off-by: Tomasz Wroblewski <tomasz.wroblewski@citrix.com> _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Jan Beulich
2013-Jan-23 08:47 UTC
Re: [PATCH v3] Fix acpi_dmar_zap/reinstate() (fixes S3 regression)
>>> On 22.01.13 at 18:22, Tomasz Wroblewski <tomasz.wroblewski@citrix.com> wrote: > Thanks, attaching version 3 of the patch using your suggestions, which > simplify it considerably. This one uses map_pages_to_xen in favour of > ioremap, so it works on both tip as well as older trees from before > ioremap() rework (tested both). > > Changes from v2/v1: > - keeping global dmar_table pointer and initializing it in acpi_dmar_init() > - use map_pages_to_xen to add a static mapping (instead of ioremap) > - moved acpi_get_table_physical_location to tbxface.c. Added instance > argument to make it symmetric with acpi_get_table. > - coding convention fixesThis looks much better; I''ll slightly edit it before committing though, so please double check the result. Jan> Commit message: > > Fix S3 regression introduced by cs 23013:65d26504e843 (ACPI: large > cleanup). The dmar virtual pointer returned from acpi_get_table cannot > be safely stored away and used later, as the underlying > acpi_os_map_memory / __acpi_map_table functions overwrite the mapping > causing it to point to different tables than dmar (last fetched table is > used). This subsequently causes acpi_dmar_reinstate() and > acpi_dmar_zap() to write data to wrong table, causing its corruption and > problems with consecutive s3 resumes. > > Added a new function to fetch ACPI table physical address, and > establishing separate static mapping for dmar_table pointer instead of > using acpi_get_table(). > > Signed-off-by: Tomasz Wroblewski <tomasz.wroblewski@citrix.com>
Tomasz Wroblewski
2013-Jan-23 09:02 UTC
[PATCH v4] Fix acpi_dmar_zap/reinstate() (fixes S3 regression)
On 23/01/13 09:47, Jan Beulich wrote:> This looks much better; I''ll slightly edit it before committing > though, so please double check the result. > >I just noticed I made a mistake calculating the number of pages to map, which should be just PFN_UP(dmar_len) (although in practice both calculations ten to yield 1 page anyway); but resubmiting v4 with that fixed Signed-off-by: Tomasz Wroblewski <tomasz.wroblewski@citrix.com> _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Jan Beulich
2013-Jan-23 09:26 UTC
Re: [PATCH v4] Fix acpi_dmar_zap/reinstate() (fixes S3 regression)
>>> On 23.01.13 at 10:02, Tomasz Wroblewski <tomasz.wroblewski@citrix.com> wrote: > On 23/01/13 09:47, Jan Beulich wrote: >> This looks much better; I''ll slightly edit it before committing >> though, so please double check the result. >> >> > I just noticed I made a mistake calculating the number of pages to map, > which should be just PFN_UP(dmar_len) (although in practice both > calculations ten to yield 1 page anyway); but resubmiting v4 with that fixedI fixed this already (and properly - the way you did it doesn''t account for the table crossing a page boundary) as part of the indicated editing I would do. Thanks anyway, Jan> Signed-off-by: Tomasz Wroblewski <tomasz.wroblewski@citrix.com>